Operations 13 min read

Red Cliffs Battle: Lessons on Service Avalanche and Circuit Breakers

Using the historic Red Cliffs battle as a metaphor, this article explains how linked services can cause a cascading failure—service avalanche—in microservice architectures, and details prevention techniques such as rate limiting, isolation, and especially circuit breaker mechanisms with their principles and recovery algorithms.

macrozheng
macrozheng
macrozheng
Red Cliffs Battle: Lessons on Service Avalanche and Circuit Breakers
滚滚长江东逝水,浪花淘尽英雄。 是非成败转头空。青山依旧在,几度夕阳红。 -- 来自《三国演义》

This article uses the

赤壁之战

(Battle of Red Cliffs) to illustrate how

服务雪崩

(service avalanche) can occur and be mitigated.

1. Restoring the Red Cliffs Battle

After unifying the north, Cao Cao moved south, defeated Liu Bei, and occupied Jingxiang, then aimed to eliminate Sun Quan in the east. Liu Bei and Sun Quan formed an alliance to resist Cao Cao's 800,000 troops.

Cao Cao's army, mostly from the north, lacked experience in naval warfare, and many soldiers suffered seasickness. He ordered the ships to be linked together with iron chains at the stern to reduce wave impact.

We then examine the dialogue among Zhou Yu, Huang Gai, and Zhuge Liang:

黄盖 :Cao Cao is foolish; linking the ships means if one catches fire, all will burn. We should use fire attack to bring down the enemy. 周瑜 :How can we get close to their ships? 黄盖 :I will feign surrender, bring a few ships loaded with oil‑soaked straw, and ignite them when near Cao Cao's fleet. 周瑜 :Brilliant! But where does the east wind come from? 诸葛亮 :I will borrow the east wind.

On the battle day, fire ships rode the wind into Cao Cao's fleet, creating a massive blaze. The allied forces attacked, inflicting heavy casualties and achieving a decisive victory.

2. Battle Analysis

Zhou Yu and Huang Gai identified the weakness of the linked ships: “If one ship catches fire, the chain will cause the others to burn as well.”

This mirrors the

服务雪崩

problem in distributed systems.

3. Service Avalanche in Systems

When microservices are introduced, each service is isolated and called via interfaces. As business grows, the number of services and their inter‑dependencies increase, making the call graph more complex.

If a downstream service becomes unavailable, the upstream services that depend on it may experience timeouts or retries, leading to a cascade where many services become inaccessible—exactly a service avalanche.

4. Real‑World Avalanche Scenarios

4.1 Service Provider Unavailable

Hardware failures such as network or disk issues.

Software bugs causing high CPU usage.

Cache breakdown forcing massive database hits.

Flash sales overwhelming service capacity.

4.2 Retry Amplifies Traffic

Users repeatedly retrying due to perceived unresponsiveness.

Application retry mechanisms that trigger multiple attempts.

5. Preventing Avalanche

Pre‑emptive measures: rate limiting, proactive degradation, isolation.

Post‑incident remedies: circuit breaking, passive degradation.

“This article mainly explains circuit breaker mechanisms.”

6. Circuit Breaker Principles and Algorithms

6.1 Concept

Derived from electrical fuses: when current exceeds a threshold, the fuse blows to protect components. Similarly, a circuit breaker stops calls to an unhealthy service.

6.2 How to Trip

If, within a time window, the number of failures or the failure rate exceeds a configured threshold, the circuit opens (trips).

6.3 Request Counting Algorithm

Check if the circuit is open; if so, reject the request.

If closed, verify whether the time window is full.

If the window is not full, increment the request count.

Record success or failure based on the response.

When the window is full, evaluate whether to open the circuit.

6.4 Recovery Algorithm

After a period, switch the circuit to half‑open, allowing a limited number of test requests.

If these succeed, close the circuit; otherwise, revert to open.

6.5 Failure Rate Time Window

The window can be visualized as observing vehicles passing a window over a fixed interval; once full, it resets and clears counters.

6.6 Service Recovery Attempt Window

When the circuit is open, after a configured delay (e.g., 1 minute), it moves to half‑open to test recovery, adjusting the interval dynamically based on results.

7. Circuit Breaker Middleware

While you could implement your own, mature open‑source solutions exist, such as Alibaba's

Sentinel

(recommended) and Netflix's

Hystrix

(no longer maintained).

8. Turning the Tide

Potential strategies for Cao Cao include replacing chains with ropes (analogous to circuit breaking), segmenting ships to isolate fire (resource isolation), or setting checkpoints before entry (pre‑flight checks).

9. Rate Limiting and Degradation (Future Topics)

Further articles will cover rate limiting and degradation techniques, continuing the Three Kingdoms analogy.

distributed systemsMicroservicessystem reliabilityCircuit Breakerservice avalanche
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.