How to Prevent Cache Avalanche and Thundering Herd in High‑Traffic Apps
This article examines various strategies—including cache warm‑up, staggered expirations, aggregated caching, queuing, locking, rate‑limiting, backup caches, client‑side caches, and default empty values—to mitigate cache breakdown and service avalanche during peak traffic periods.
1 Introduction
In a previous article we described cache avalanche, penetration, and thundering herd and their solutions; now we discuss handling cache breakdown and avalanche in specific business scenarios.
2 Problem Background
A core application (e.g., WeChat, DingTalk, Baidu APP) experiences peak QPS of millions.
Analysis: traffic peaks around 9–10 am, forming a Gaussian distribution.
The cache stores basic user information (name, gender, occupation, address) keyed by user ID.
For unknown reasons the cache is lost (expiration, failure, bug, restart).
During the peak, requests miss the cache and hit the database directly.
Disk‑based databases cannot handle the load, leading to a service avalanche.
4 Candidate Answers (Compiled)
4.1 Cache Warm‑up
Since peak periods are predictable, pre‑warm the cache before the peak (e.g., fill cache between 7–9 am for the 9–10 am peak).
Drawback: only works for predictable cache failures, not sudden loss during the peak.
4.2 Staggered Expiration Times
Uniform expiration causes many keys to expire simultaneously, triggering thundering herd. Apply a 3‑4‑3 distribution: Expire = 3 h + random()*4 h + 3 h, to stagger expirations.
Drawback: same limitation as 4.1; cannot handle unexpected failures.
4.3 Aggregated Cache by User Type
Instead of caching each user individually, group users by type and cache aggregated data, reducing database hits during peaks.
Only suitable for data with very low update frequency; large aggregated values that change frequently are inefficient.
4.4 Spike‑Reduction, Locking, Rate‑Limiting
4.4.1 Spike‑Reduction
Introduce a message queue to enqueue requests and process them sequentially, avoiding request bursts.
4.4.2 Locking
Allow only the first request for a given user to query the database and update the cache; subsequent requests wait for the lock to be released.
4.4.3 Rate‑Limiting
Conduct load testing without cache to determine the maximum sustainable load, then set a rate‑limit threshold to prevent overload.
Drawbacks:
Locks and queuing significantly reduce throughput, leading to long wait times and poor user experience.
Rate‑limiting is coarse‑grained; fine‑grained per‑endpoint limits are possible but still degrade service for some users.
Note: databases also provide rate‑limiting mechanisms.
4.5 Temporary Degradation with Backup Cache
If the primary cache fails, fall back to a backup cache that syncs asynchronously, accepting slight data staleness to protect the database from overload.
4.6 Temporary Degradation with Client‑Side Cache (Redis 6.0)
Leverage Redis 6.0 client‑side cache; similar trade‑off of freshness for reduced load on the cache service.
4.7 Temporary Degradation with Empty Default Value
Return an empty or default value during failure, sacrificing some requests to keep the database from being overwhelmed.
5 Summary
Each method has its own advantages and disadvantages; the appropriate solution should be chosen based on the actual application scenario.
Architecture & Thinking
🍭 Frontline tech director and chief architect at top-tier companies 🥝 Years of deep experience in internet, e‑commerce, social, and finance sectors 🌾 Committed to publishing high‑quality articles covering core technologies of leading internet firms, application architecture, and AI breakthroughs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.