Backend Development 8 min read

Understanding Redis Cache Avalanche, Penetration, and Breakdown: Causes and Mitigation Strategies

The article explains what Redis cache avalanche, penetration, and breakdown are, illustrates real‑world incidents, and provides pre‑, during‑, and post‑incident solutions such as high‑availability setups, local caches with rate limiting, fallback mechanisms, and placeholder writes to prevent database overload.

Selected Java Interview Questions

Dec 7, 2019

Understanding Redis Cache Avalanche, Penetration, and Breakdown: Causes and Mitigation Strategies

Interview Question

Explain what Redis cache avalanche, penetration, and breakdown are, what happens when Redis crashes, and how to handle these situations.

Interviewer Psychology

These topics are frequently asked in interviews because cache avalanche and penetration are critical cache problems that can be fatal if they occur, so interviewers expect candidates to be familiar with them.

Question Analysis

Cache Avalanche

Assume system A receives 5,000 requests per second during peak hours, while the cache can normally handle 4,000 requests per second. If the cache server crashes completely, all 5,000 requests fall back to the database, which cannot sustain the load, leading to database failure and a cascade of crashes.

This situation is called a cache avalanche.

About three years ago, a well‑known Chinese internet company suffered a cache‑related avalanche that caused the entire backend to crash, lasting from the afternoon until early morning and resulting in losses of tens of millions of yuan.

Solutions before, during, and after a cache avalanche:

Pre‑incident: Use Redis high‑availability architectures such as master‑slave with Sentinel or Redis Cluster to avoid total disk failure.

During incident: Deploy a local Ehcache layer combined with Hystrix for rate limiting and fallback, preventing the database from being overwhelmed.

Post‑incident: Enable Redis persistence so that after a restart the data is quickly reloaded from disk.

Typical request flow: the system first checks the local Ehcache; if missed, it checks Redis; if still missed, it queries the database and writes the result back to both Ehcache and Redis.

Rate‑limiting components can restrict the number of requests per second that reach the database; excess requests are handled by fallback logic, returning default values or friendly messages.

Benefits:

The database will never be completely overwhelmed because the rate limiter caps the incoming traffic.

Even if only 2/5 of requests are processed, the system remains alive, providing a better user experience than a total outage.

Cache Penetration

Assume system A receives 5,000 requests per second, of which 4,000 are malicious attacks from hackers. These attacks query keys that do not exist in the cache, forcing a database lookup each time, which also returns no result.

For example, if database IDs start from 1 but attackers send negative IDs, the cache will never have those keys, causing every request to bypass the cache and hit the database, potentially crashing it.

Simple mitigation: when a database query returns no result, write a placeholder (e.g., an empty value) into the cache with a short TTL, such as set -999 UNKNOWN, so subsequent identical requests hit the cache instead of the database.

Cache Breakdown

Cache breakdown occurs when a hot key expires at a moment of high concurrency, causing a massive surge of requests to bypass the cache and directly hit the database, effectively "drilling a hole" through the cache barrier.

Possible solutions vary by scenario:

If the cached data rarely changes, set the hot key to never expire.

If updates are infrequent and cache refresh is fast, use distributed locks (e.g., via Redis or Zookeeper) or local mutexes to ensure only a few requests rebuild the cache while others wait.

If updates are frequent or refresh is slow, employ a background thread to proactively rebuild the cache before expiration or extend the TTL dynamically.

Source: https://github.com/doocs/advanced-java

Recent Issues

[28th Issue] ZooKeeper Interview Topics

[29th Issue] Java Collections Framework 10‑Question Interview

[30th Issue] Explain the Implementation Principle of HashMap

Instead of searching the web for questions, follow us now!

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend performance Cache Redis avalanche penetration Breakdown

Written by

Selected Java Interview Questions

A professional Java tech channel sharing common knowledge to help developers fill gaps. Follow us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.