Backend Development 16 min read

Cache Design and Optimization Strategies in High‑Concurrency Distributed Systems

The article explains the benefits, costs, and various update, penetration, no‑hole, avalanche, and hot‑key optimization techniques for caching in high‑concurrency distributed systems, providing practical guidance on choosing appropriate strategies based on consistency requirements and system load.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Cache Design and Optimization Strategies in High‑Concurrency Distributed Systems

Cache Benefits and Costs

Using a cache brings two main benefits: accelerated read/write performance because caches like Redis or Memcached operate entirely in memory, and reduced backend load by storing complex calculations or expensive results, which improves overall system health.

However, caches also introduce costs such as data inconsistency due to the time window between cache and storage updates, increased code maintenance because developers must handle both cache and storage logic, and higher operational overhead for ensuring high availability (e.g., master‑slave setups, clustering).

In summary, if the benefits outweigh the costs, adopting a cache is justified.

Cache Updates

Cache entries usually have a TTL and expire after a certain period to maintain consistency and efficient space usage. The article discusses three update strategies:

1. LRU/LFU/FIFO

These eviction algorithms are used when the cache is full. LRU removes the least recently used items, LFU removes the least frequently accessed, and FIFO removes the oldest entries. They have low development cost and are suitable for limited‑memory scenarios with relatively static data.

2. Timeout Eviction

Setting an explicit expiration time (e.g., Redis EXPIRE ) forces the cache to discard stale data after a defined interval. Consistency depends on the TTL; it is acceptable for use cases that can tolerate temporary inconsistency, such as promotional content.

3. Proactive Update

When the underlying data changes, the cache is updated immediately. This provides the highest consistency but adds development complexity because business updates and cache updates must be coordinated, often via message queues, and may be combined with timeout eviction for fault tolerance.

Best practices: low‑consistency workloads can combine strategy 1 with 2; high‑consistency workloads should combine strategies 2 and 3.

Penetration Optimization

Cache penetration occurs when requests query non‑existent data, causing both cache and storage misses and potentially overwhelming the backend. Two common mitigations are:

1. Cache Empty Objects

Store a placeholder for missing keys to block repeated useless queries. To avoid memory waste, filter out clearly invalid IDs and set a short TTL for empty objects.

2. Bloom Filter

A Bloom filter efficiently tests set membership with low memory overhead. It can be combined with local caching of the filter and falls back to remote cache (Redis/Memcached) on misses. For historical data, Bloom filters are ideal; for real‑time data they require proactive updates similar to proactive cache updates.

No‑Hole Optimization

The “no‑hole” problem arises when distributed batch operations (e.g., MGET ) involve many nodes, leading to increased network latency and connection overhead. The article outlines four approaches:

Serial MGET – split into N individual GET calls (high latency).

Serial I/O – map keys to nodes, then issue one request per node (better but still node‑dependent).

Parallel I/O – execute the per‑node requests concurrently, reducing overall latency at the cost of added complexity.

Hash‑tagging – force related keys into the same node using curly‑brace syntax, allowing a single request to retrieve all of them.

Choosing the appropriate method depends on the batch size, node count, and acceptable complexity.

Avalanche Optimization

A cache avalanche happens when the cache becomes unavailable, causing a sudden surge of traffic to the storage layer. Mitigation strategies include ensuring high cache availability (e.g., master‑slave, Redis Sentinel), employing rate‑limiting and fallback mechanisms (e.g., Netflix Hystrix), and isolating projects to prevent a single failure from cascading.

Hot Key Rebuild Optimization

When a hot key expires, many threads may simultaneously attempt to rebuild it, overwhelming the backend. Solutions include:

Mutex lock – allow only one thread to rebuild while others wait or serve stale data.

Never‑expire – update the cache via scheduled jobs or proactive writes.

Backend rate‑limiting – limit the number of rebuild attempts, letting successful rebuilds serve subsequent requests.

Identifying hot keys in advance simplifies mitigation; otherwise, combining rate‑limiting with fallback strategies is essential.

In conclusion, the article shares practical insights on cache design, update policies, and various optimization techniques for high‑concurrency environments.

distributed systemsperformance optimizationcachingCache PenetrationCache Eviction
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.