Backend Development 13 min read

Cache Consistency Issues and Solutions in a High‑Concurrency Push System

The article examines a cache‑consistency failure in Tmall International’s high‑concurrency push system, explains classic cache problems and mitigation techniques, analyzes the delete‑then‑update bug that caused null‑plan errors, and evaluates four corrective strategies ranging from double‑write to delayed double‑delete.

DaTaobao Tech
DaTaobao Tech
DaTaobao Tech
Cache Consistency Issues and Solutions in a High‑Concurrency Push System

The article records a cache‑consistency problem encountered in the Tmall International Push Center, which handles high‑concurrency messaging such as SMS and in‑app notifications.

Three classic cache problems are introduced: cache penetration (queries for non‑existent data bypass the cache and hit the DB), cache breakdown (a hot key expires and many requests flood the DB), and cache avalanche (massive key expiration at the same time overwhelms the DB). Definitions and mitigation strategies—Bloom filter, caching empty results, setting random expiration, locking, double‑write, etc.—are described.

Data‑consistency issues between cache and database are then discussed, covering strong consistency vs. eventual consistency, distributed‑transaction protocols (3PC), distributed locks, and consensus algorithms (Paxos, Raft).

The specific incident arose because the configuration service deletes the cache before updating the DB. The push‑service’s local cache, refreshed only periodically, reads a null value from the cache before the scheduled refresh, leading to a null‑plan exception.

Root‑cause analysis shows that the delete‑‑then‑update pattern creates a window where reads see stale or missing data, especially under the 5‑minute refresh cycle of the configuration service.

Solution options are presented:

Modify the config service to refresh the cache after DB write (double‑write).

Change the order to update cache first, then DB (requires rollback handling).

Keep the original delete‑then‑update but add a delayed second delete (delayed double‑delete) to cover the window.

Adjust the push‑service to fetch from DB on cache miss and update the cache immediately.

The article concludes that there is no universally best approach; the choice depends on system characteristics, performance requirements, and tolerance for inconsistency.

backendDistributed Systemscachedata consistencyhigh concurrencyconsistency
DaTaobao Tech
Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.