High Availability Traffic Governance: Circuit Breakers, Isolation, Retries, Timeouts, and Rate Limiting
This article explains how to achieve high‑availability in microservice systems through traffic governance techniques such as circuit breakers, various isolation strategies, retry mechanisms, timeout controls, and rate‑limiting, illustrating each concept with examples, formulas, and pseudo‑code.
Overview The article discusses the importance of the “three‑high” (high performance, high availability, easy scalability) for system health and introduces traffic governance as a key practice to maintain these goals.
Availability Metrics Defines MTBF and MTTR and provides the formula Availability = MTBF / (MTBF + MTTR) × 100%.
Traffic Governance Objectives Lists purposes such as network performance optimization, service quality assurance, fault tolerance, security, and cost efficiency.
Circuit Breaker Describes traditional circuit breaker states (Closed, Open, Half‑Open) and the Google SRE adaptive throttling algorithm, including the probability p calculation.
Isolation Strategies Covers dynamic/static isolation, read/write isolation (CQRS), core isolation, hotspot isolation, user isolation, and process/thread/cluster/machine‑room isolation.
Retry Mechanisms Explains synchronous and asynchronous retries, maximum attempts, back‑off strategies (linear, jitter, exponential, exponential‑jitter) and the risk of retry storms, with mitigation techniques such as retry windows and chain‑level limits.
Timeout Management Discusses fixed vs EMA dynamic timeout, timeout propagation across services, and implementation using context.
Rate Limiting Summarizes client‑side and server‑side limiting, common algorithms (sliding window, token bucket, leaky bucket) and overload detection criteria.
Conclusion Emphasizes that traffic governance is one of many strategies (e.g., redundancy, caching, load balancing) needed for long‑term high‑availability systems.
/* pseudo code */
ConnectWithBackoff()
current_backoff = INITIAL_BACKOFF
current_deadline = now() + INITIAL_BACKOFF
while (TryConnect(Max(current_deadline, now() + MIN_CONNECT_TIMEOUT)) != SUCCESS)
SleepUntil(current_deadline)
current_backoff = Min(current_backoff * MULTIPLIER, MAX_BACKOFF)
current_deadline = now() + current_backoff + UniformRandom(-JITTER * current_backoff, JITTER * current_backoff)Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.