Operations 14 min read

Understanding Service Avalanche and Circuit Breaker Mechanisms through the Red Cliffs Battle Analogy

This article uses the historic Battle of Red Cliffs as an analogy to explain service avalanche, its causes in micro‑service architectures, and how circuit‑breaker, rate‑limiting, and isolation techniques can prevent cascading failures in modern distributed systems.

Wukong Talks Architecture
Wukong Talks Architecture
Wukong Talks Architecture
Understanding Service Avalanche and Circuit Breaker Mechanisms through the Red Cliffs Battle Analogy

Red Cliffs Battle

The famous Battle of Red Cliffs from the novel Romance of the Three Kingdoms is used as a metaphor to illustrate how the "service avalanche" problem can be taken to the extreme in micro‑service systems.

1. Restoring the Red Cliffs Battle

After Cao Cao unified the north, he moved south, defeated Liu Bei, and occupied Jingxiang, intending to eliminate Sun Quan. Liu Bei and Sun Quan formed an alliance against Cao Cao's 800,000 troops. Cao Cao's northern troops lacked experience in naval warfare and suffered from seasickness, so he ordered the ships to be linked together with iron chains to reduce the impact of waves.

Dialogue among Zhou Yu, Huang Gai and Zhuge Liang:

Huang Gai : Cao Cao’s chained ships are a disaster; if one catches fire, the whole fleet will burn. We should use fire attacks. Zhou Yu : How can we get close to their ships? Huang Gai : I will feign surrender, bring a few ships loaded with oil‑soaked straw, and set them ablaze when near the enemy. Zhou Yu : Brilliant! But where do we get the east wind? Zhuge Liang : I will borrow the east wind.

The fire ships broke through the enemy formation, creating a sea of fire and leading to a decisive victory for the allied forces.

2. Analysis of the Battle Situation

Zhou Yu and Huang Gai identified the weakness of the chained ships: if one ship catches fire, the whole chain burns. This mirrors the "service avalanche" problem in distributed systems.

In a micro‑service architecture, each service calls others via interfaces. As business grows, the number of services and their inter‑dependencies increase, making the overall logic more complex. If a dependent service becomes unavailable, the failure can cascade, causing a complete outage—just like an avalanche of snow.

3. Service Avalanche in Systems

Micro‑services typically use RPC or HTTP calls with timeout limits and retry mechanisms. Without circuit‑breaker or rate‑limiting, a single failure can trigger an avalanche. The following example illustrates this:

Three services: Order Service , Product Service , Inventory Service .

During a high‑traffic event (e.g., Double‑11), the Inventory Service becomes unavailable, causing timeouts.

Product Service repeatedly retries, exhausting its resources, and eventually crashes.

Order Service, depending on Product Service, also fails, leading to a total outage.

4. Real‑World Scenarios Causing Avalanches

4.1 Service Provider Unavailability

Hardware failures (network, disk).

Software bugs that consume excessive CPU.

Cache breakdowns causing a sudden surge of database traffic.

Flash‑sale spikes overwhelming service capacity.

4.2 Retry Amplification

Users manually retrying after no response.

Application‑level retry logic that repeats failed calls multiple times.

5. Preventing Service Avalanche

Pre‑emptive measures: rate limiting, active degradation, isolation.

Post‑failure recovery: circuit breaking, passive degradation.

The rest of this article focuses on circuit‑breaker mechanisms.

6. Circuit‑Breaker Principles and Algorithms

6.1 Concept

A circuit breaker works like an electrical fuse: when the current (request latency or error rate) exceeds a threshold, the fuse blows to protect downstream components.

If a service becomes consistently slow or times out, the circuit opens and subsequent calls are rejected with a fast failure response, allowing the service time to recover.

6.2 How to Trip a Circuit

When the number of failures or the failure ratio within a time window exceeds a configured threshold, the circuit opens.

6.3 Request‑Counting Algorithm

Check if the circuit is open; if so, reject the request.

If closed, verify whether the time window is full.

If the window is not full, increment the request bucket.

On response, increment either the success or failure bucket.

When the window is full, evaluate whether to open the circuit.

6.4 Recovery Algorithm

After a cooldown period, the circuit moves to a half‑open state, allowing a limited number of test requests.

If test requests succeed, the circuit closes; otherwise, it reopens.

6.5 Failure‑Rate Time Window

Two types of windows are used:

Fixed window: counts total traffic in a set interval; cannot limit short‑term bursts.

Sliding window: moves continuously, providing smoother control.

6.6 Service‑Recovery Attempt Window

The circuit stays open for a configured period (e.g., 1 minute), then switches to half‑open to probe the service. If the probe succeeds, the circuit closes; otherwise, it reopens and the cycle repeats, possibly with increasing back‑off intervals.

7. Circuit‑Breaker Middleware

While you can implement your own circuit‑breaker, it is recommended to use proven open‑source solutions such as Alibaba's Sentinel or Netflix's Hystrix (now in maintenance mode).

8. Turning the Tide

To help Cao Cao avoid the chained‑ship disaster, possible strategies include:

Replace iron chains with ropes that are easier to cut (circuit‑breaker).

Segment the fleet into isolated zones so a fire in one zone does not spread (resource isolation).

Set up checkpoints to verify ships before they proceed (pre‑flight checks).

9. Rate Limiting and Degradation

Rate limiting controls traffic by allowing only a portion of requests to pass, ensuring the service can handle the load. Common algorithms are fixed‑window, leaky‑bucket, and token‑bucket.

Leaky‑Bucket Algorithm

Flows traffic at a constant rate; however, it can increase latency during bursts.

Token‑Bucket Algorithm

Allows N requests per second, refilling tokens at a steady rate; often implemented with Redis in distributed environments.

Conclusion

The classic novel Romance of the Three Kingdoms provides a vivid analogy for understanding service avalanche and circuit‑breaker concepts, helping engineers design more resilient micro‑service systems.

microservicesOperationssystem reliabilityrate limitingcircuit breakerservice avalanche
Wukong Talks Architecture
Written by

Wukong Talks Architecture

Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.