Transparent Multilevel Cache (TMC): Architecture, Hotspot Detection, and Local Cache in Youzan PaaS
Youzan’s Transparent Multilevel Cache (TMC) adds automatic hotspot detection and a 64 MB local cache to existing distributed caches via a Hermes‑SDK‑augmented Jedis client, delivering transparent Java integration, strong consistency, up to 80 % local‑hit rates, and improved QPS during high‑traffic events.
TMC (Transparent Multilevel Cache) is a comprehensive caching solution developed by Youzan PaaS to provide transparent, multi‑level caching for internal applications.
It extends a generic distributed cache (e.g., CodisProxy + Redis or Youzan’s own zanKV) with three key capabilities: application‑level hotspot detection, application‑level local cache, and hotspot‑hit statistics, aiming to alleviate hotspot access problems at the application layer.
Why TMC is needed : E‑commerce merchants frequently launch flash‑sale or promotion activities that cause unpredictable hotspot traffic on keys such as product details or order processing. Hotspot keys generate massive cache requests, consuming bandwidth and threatening service stability. TMC automatically discovers hotspots and pre‑places hotspot requests in a local cache.
Problems of traditional multi‑level cache solutions include:
Hotspot detection – quickly and accurately identifying hotspot keys.
Data consistency – ensuring local cache stays consistent with the distributed cache.
Effect verification – exposing local‑cache hit rates and hotspot keys to applications.
Transparent integration – minimizing intrusion into existing services.
The overall TMC architecture consists of three layers:
Storage layer – provides KV storage (Codis, zanKV, Aerospike, etc.).
Proxy layer – unified cache entry and routing for distributed data.
Application layer – a unified client with built‑in hotspot detection and local cache, transparent to business logic.
The article focuses on the application‑layer client.
Transparent Integration in Java
Java services can use either the spring.data.redis package with RedisTemplate or the youzan.framework.redis package with RedisClient . Regardless of the choice, the client ultimately creates a JedisPool and obtains a Jedis object to communicate with the cache proxy.
TMC modifies the native JedisPool and Jedis classes via the Hermes‑SDK . During initialization, Hermes‑SDK injects hotspot detection and local cache logic, allowing the Jedis client to interact with the proxy through Hermes‑SDK and achieve transparent access.
For Java services, simply using a specific version of the jedis‑jar enables hotspot detection and local caching without code changes, achieving minimal intrusion.
Key Modules
Jedis‑Client : Direct entry for Java applications to the cache service.
Hermes‑SDK : Self‑developed SDK that encapsulates hotspot detection and local cache.
Hermes Server Cluster : Receives reports from Hermes‑SDK, performs hotspot detection, and pushes hotspot keys to the SDK.
Cache Cluster : Consists of proxy and storage layers, providing a unified distributed cache endpoint.
Infrastructure : etcd cluster and Apollo configuration center for cluster push and unified configuration.
Hotspot Detection Workflow
The process consists of four steps:
Data collection : Hermes‑SDK uses rsyslog to send key‑access events to Kafka; Hermes servers consume them.
Hotness sliding window : Each key maintains a 10‑slot time wheel, each slot representing a 3‑second interval, thus covering a 30‑second window.
Hotness aggregation : Every 3 seconds a mapping task aggregates per‑key hotness from the time wheel and stores <key, totalHotness> in Redis as a sorted set.
Hotspot detection : The server periodically selects the top‑N keys whose hotness exceeds a threshold and pushes the list to Hermes‑SDK.
During normal key access, the Jedis‑Client queries Hermes‑SDK to check whether the key is a hotspot. If it is, the value is retrieved from the local cache; otherwise, the request is forwarded to the cache cluster. Each access event is asynchronously reported back to the server for future detection.
When a key expires (via set() , del() , or expire() ), the client notifies Hermes‑SDK via invalid() . For hotspot keys, the local cache entry is invalidated to maintain strong consistency, and the invalidation event is broadcast through etcd to other SDK instances for eventual consistency.
Stability and Consistency Features
Asynchronous reporting using rsyslog ensures non‑blocking behavior.
Communication module runs in an isolated thread pool with bounded queues to protect business threads.
Local cache size is limited to 64 MB (LRU) to avoid JVM heap overflow.
Only hotspot keys are cached locally; the rest reside in the distributed cache.
Hotspot key invalidation is synchronized via etcd, guaranteeing strong consistency locally and eventual consistency across the cluster.
Performance Results
Real‑world tests (e.g., a flash‑sale on Kuaishou) show that during peak activity, cache request volume and local‑cache hit volume both increase significantly, with local‑cache hit rates approaching 80 %. Application QPS rises while response time (RT) drops, demonstrating the effectiveness of TMC.
Additional data from Double‑11 events across multiple core services further confirms the benefits of TMC in both product and activity domains.
Future Outlook
TMC already serves product, logistics, inventory, marketing, user, gateway, and messaging modules at Youzan. Future work includes richer configuration options (hotspot thresholds, blacklist/whitelist) and continued iteration of the platform.
Youzan Coder
Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.