Backend Development 16 min read

JD Hotkey Framework: Architecture, Performance Optimizations, and Scaling to 350k QPS

The article details JD's high‑performance Hotkey middleware, explaining its real‑time hot‑data detection architecture, the challenges faced during massive traffic spikes, and a series of systematic optimizations—including JDK upgrades, thread‑pool tuning, Disruptor replacement, and serialization changes—that enabled the system to scale from 20k to 350k queries per second while maintaining millisecond‑level latency.

JD Retail Technology
JD Retail Technology
JD Retail Technology
JD Hotkey Framework: Architecture, Performance Optimizations, and Scaling to 350k QPS

JD's Hotkey framework is a high‑performance middleware developed for the JD app backend to detect hot keys (e.g., burst requests for a product, malicious crawler users, hot APIs) in real time and push them to the JVM memory of all business servers within milliseconds, thereby relieving pressure on the storage layer.

The framework operates by having a Netty server on each worker maintain long‑lived connections with hundreds to thousands of business servers, which periodically upload candidate keys (1‑500 ms intervals). Workers aggregate keys using a sliding‑window algorithm, and when a configurable threshold is reached, the hot key is broadcast to the entire server cluster for in‑memory handling such as caching, access denial, or circuit breaking.

Performance tests on a 16‑core Docker container showed the initial version could receive over 400 k keys per second, compute ~300 k keys, and push ~120 k hot keys per second, but CPU saturation occurred during large‑scale load tests due to excessive Netty and Disruptor threads caused by an outdated JDK that mis‑reported CPU cores.

After upgrading the JDK (fixing core‑count detection) and reducing Netty I/O and Disruptor consumer threads, CPU usage dropped to 7‑10 % and the system sustained 80‑140 k QPS. Further tuning lowered thread counts to half the available cores, achieving stable 160‑200 k QPS with ~40 % CPU.

Replacing Disruptor with Java's LinkedBlockingQueue eliminated duplicate consumption and reduced latency, allowing the system to handle 250‑300 k QPS with CPU usage under 2 % in normal operation, while still supporting 100‑120 k hot‑key pushes per second.

Subsequent experiments increased Netty I/O threads to 16, enabling stable push rates of 600‑800 k per second before GC pressure and memory overflow became limiting factors.

The final conclusions emphasize that practical performance tuning—adjusting thread pools, choosing appropriate data structures, and selecting efficient serialization (e.g., protobuf over JSON) —is essential for high‑throughput, low‑latency services, and that real‑world traffic combined with aggressive stress testing is the only reliable way to validate scalability.

JavascalabilityHotKeyMiddlewareNettyDisruptor
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.