Red Cliffs Battle: Lessons on Service Avalanche and Circuit Breakers
Using the historic Red Cliffs battle as a metaphor, this article explains how linked services can cause a cascading failure—service avalanche—in microservice architectures, and details prevention techniques such as rate limiting, isolation, and especially circuit breaker mechanisms with their principles and recovery algorithms.
滚滚长江东逝水,浪花淘尽英雄。 是非成败转头空。青山依旧在,几度夕阳红。 -- 来自《三国演义》
This article uses the
赤壁之战(Battle of Red Cliffs) to illustrate how
服务雪崩(service avalanche) can occur and be mitigated.
1. Restoring the Red Cliffs Battle
After unifying the north, Cao Cao moved south, defeated Liu Bei, and occupied Jingxiang, then aimed to eliminate Sun Quan in the east. Liu Bei and Sun Quan formed an alliance to resist Cao Cao's 800,000 troops.
Cao Cao's army, mostly from the north, lacked experience in naval warfare, and many soldiers suffered seasickness. He ordered the ships to be linked together with iron chains at the stern to reduce wave impact.
We then examine the dialogue among Zhou Yu, Huang Gai, and Zhuge Liang:
黄盖 :Cao Cao is foolish; linking the ships means if one catches fire, all will burn. We should use fire attack to bring down the enemy. 周瑜 :How can we get close to their ships? 黄盖 :I will feign surrender, bring a few ships loaded with oil‑soaked straw, and ignite them when near Cao Cao's fleet. 周瑜 :Brilliant! But where does the east wind come from? 诸葛亮 :I will borrow the east wind.
On the battle day, fire ships rode the wind into Cao Cao's fleet, creating a massive blaze. The allied forces attacked, inflicting heavy casualties and achieving a decisive victory.
2. Battle Analysis
Zhou Yu and Huang Gai identified the weakness of the linked ships: “If one ship catches fire, the chain will cause the others to burn as well.”
This mirrors the
服务雪崩problem in distributed systems.
3. Service Avalanche in Systems
When microservices are introduced, each service is isolated and called via interfaces. As business grows, the number of services and their inter‑dependencies increase, making the call graph more complex.
If a downstream service becomes unavailable, the upstream services that depend on it may experience timeouts or retries, leading to a cascade where many services become inaccessible—exactly a service avalanche.
4. Real‑World Avalanche Scenarios
4.1 Service Provider Unavailable
Hardware failures such as network or disk issues.
Software bugs causing high CPU usage.
Cache breakdown forcing massive database hits.
Flash sales overwhelming service capacity.
4.2 Retry Amplifies Traffic
Users repeatedly retrying due to perceived unresponsiveness.
Application retry mechanisms that trigger multiple attempts.
5. Preventing Avalanche
Pre‑emptive measures: rate limiting, proactive degradation, isolation.
Post‑incident remedies: circuit breaking, passive degradation.
“This article mainly explains circuit breaker mechanisms.”
6. Circuit Breaker Principles and Algorithms
6.1 Concept
Derived from electrical fuses: when current exceeds a threshold, the fuse blows to protect components. Similarly, a circuit breaker stops calls to an unhealthy service.
6.2 How to Trip
If, within a time window, the number of failures or the failure rate exceeds a configured threshold, the circuit opens (trips).
6.3 Request Counting Algorithm
Check if the circuit is open; if so, reject the request.
If closed, verify whether the time window is full.
If the window is not full, increment the request count.
Record success or failure based on the response.
When the window is full, evaluate whether to open the circuit.
6.4 Recovery Algorithm
After a period, switch the circuit to half‑open, allowing a limited number of test requests.
If these succeed, close the circuit; otherwise, revert to open.
6.5 Failure Rate Time Window
The window can be visualized as observing vehicles passing a window over a fixed interval; once full, it resets and clears counters.
6.6 Service Recovery Attempt Window
When the circuit is open, after a configured delay (e.g., 1 minute), it moves to half‑open to test recovery, adjusting the interval dynamically based on results.
7. Circuit Breaker Middleware
While you could implement your own, mature open‑source solutions exist, such as Alibaba's
Sentinel(recommended) and Netflix's
Hystrix(no longer maintained).
8. Turning the Tide
Potential strategies for Cao Cao include replacing chains with ropes (analogous to circuit breaking), segmenting ships to isolate fire (resource isolation), or setting checkpoints before entry (pre‑flight checks).
9. Rate Limiting and Degradation (Future Topics)
Further articles will cover rate limiting and degradation techniques, continuing the Three Kingdoms analogy.
macrozheng
Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.