Operations 10 min read

From the Battle of Red Cliffs to Service Avalanche: Understanding Circuit Breaker and Resilience in Microservices

This article uses the historic Battle of Red Cliffs as an analogy to explain service avalanche in micro‑service architectures, analyzes its causes, presents real‑world scenarios, and details circuit‑breaker concepts, algorithms, recovery strategies, and practical mitigation techniques.

Wukong Talks Architecture
Wukong Talks Architecture
Wukong Talks Architecture
From the Battle of Red Cliffs to Service Avalanche: Understanding Circuit Breaker and Resilience in Microservices

The post starts with a literary reference to the Battle of Red Cliffs and draws a parallel between the chained ships used by Cao Cao and the "service avalanche" problem that can occur in distributed systems.

It explains that in a micro‑service environment, each service calls others via APIs; as the number of services grows, dependencies increase, and a failure in one service can cascade, much like a fire spreading through linked ships, leading to a complete system outage.

Real‑world causes of service avalanche are listed, including provider unavailability, hardware failures, bugs, cache breakdown, flash‑sale traffic spikes, user retries, and aggressive retry mechanisms.

The article then outlines prevention strategies: before a problem occurs – rate limiting, proactive degradation, isolation; after a problem occurs – circuit breaking and passive degradation, focusing on circuit‑breaker mechanisms.

It introduces the circuit‑breaker concept, likening it to an electrical fuse that opens when current (request load) exceeds a threshold, preventing damage. The principle is that if a service becomes slow or times out, further calls are blocked until recovery.

The circuit‑breaker algorithm is described: a switch tracks open/closed/half‑open states, a time window counts requests, successes, and failures, and when failure ratios exceed a threshold the breaker opens. Recovery involves moving to a half‑open state after a cooldown, allowing limited test requests to verify service health before fully closing the breaker.

Statistical failure‑rate windows and recovery windows are illustrated, showing how request counts are reset after the window expires and how the system dynamically adjusts cooldown periods.

Implementation options are discussed, noting that while a custom breaker can be built, mature open‑source solutions such as Alibaba's Sentinel or Netflix's Hystrix (now deprecated) are recommended.

Finally, the article offers analogical “counter‑measures” for the Red Cliffs scenario—using ropes instead of chains, segmenting ships, or setting checkpoints—mirroring technical solutions like circuit breaking, resource isolation, and pre‑flight checks.

microservicesOperationssystem designResiliencecircuit breakerservice avalanche
Wukong Talks Architecture
Written by

Wukong Talks Architecture

Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.