Operations 6 min read

Envoy Outlier Detection and Ejection Mechanism Overview

The article explains Envoy's outlier detection and ejection process, detailing how unhealthy upstream hosts are identified and temporarily removed based on consecutive 5xx errors, gateway failures, or success‑rate thresholds, and describes the logging format and configuration options for these health‑check mechanisms.

Architects Research Society
Architects Research Society
Architects Research Society
Envoy Outlier Detection and Ejection Mechanism Overview

Outlier detection and ejection dynamically determine whether certain upstream hosts are executing processes that differ from the rest and remove them from the normal load‑balancing pool. Performance can be measured along axes such as consecutive failures, success rate, or latency. Outlier detection is a form of passive health checking, while Envoy also supports active health checks; both can be combined for a comprehensive upstream health‑check solution.

Ejection Algorithm

The ejection algorithm operates either inline (e.g., after consecutive 5xx responses) or at specified intervals (e.g., based on periodic success‑rate checks) as follows:

The host is identified as an outlier.

Envoy checks that the number of ejected hosts is below the allowed threshold (configured via outlier_detection.max_ejection_percent ). If the threshold would be exceeded, the host is not ejected.

The host is ejected for a few milliseconds. Ejection marks the host as unhealthy so it is not used by the load balancer unless the load balancer is in an emergency state. The ejection duration equals outlier_detection.base_ejection_time_ms multiplied by the number of times the host has been ejected, causing progressively longer ejections for repeatedly failing hosts.

The ejected host is automatically reintegrated after the ejection time expires. Generally, outlier detection is used together with active health checks for a full health‑check solution.

Detection Types

Envoy supports the following outlier detection types:

Consecutive 5xx

If an upstream host returns a series of consecutive 5xx responses (actual 5xx status codes or events that cause the HTTP router to generate a 5xx, such as resets or connection failures), it is ejected. The required number of consecutive 5xx is controlled by outlier_detection.consecutive_5xx .

Consecutive Gateway Failures

If an upstream host returns a series of consecutive gateway errors (502, 503, or 504), it is ejected. The required count is set by outlier_detection.consecutive_gateway_failure .

Success Rate

Success‑rate based ejection aggregates success‑rate data from each host in the cluster and periodically ejects hosts that statistically deviate. Hosts with request volume below outlier_detection.success_rate_request_volume are excluded from calculation, and detection is skipped if the number of hosts with sufficient volume is below outlier_detection.success_rate_minimum_hosts .

Ejection Event Logging

Envoy can emit JSON‑formatted ejection event logs, which are useful for operations because global statistics do not reveal which hosts were ejected and why. Each log line follows this structure:

{
  "time": "...",
  "secs_since_last_action": "...",
  "cluster": "...",
  "upstream_url": "...",
  "action": "...",
  "type": "...",
  "num_ejections": "...",
  "enforced": "...",
  "host_success_rate": "...",
  "cluster_success_rate_average": "...",
  "cluster_success_rate_ejection_threshold": "..."
}

The fields describe the event timestamp, seconds since the previous action, the cluster containing the ejected host, the host URL, the action taken (eject or uneject), the ejection type (5xx, GatewayFailure, or SuccessRate), the number of times the host has been ejected, whether the ejection was enforced, and success‑rate metrics when applicable.

Configuration Reference

Global cluster manager configuration

Per‑cluster configuration

Runtime settings

Statistics reference

operationsLoad BalancingEnvoyoutlier detectionhealth checkejection
Architects Research Society
Written by

Architects Research Society

A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.