Backend Development 12 min read

Performance, Storage, Index, and Cache Design of JD's JMQ4 Message Middleware

The article presents JD's self‑developed JMQ middleware, detailing its 2018 performance improvements, comparative benchmarks with Kafka, storage and index architecture evolution from JMQ2 to JMQ4, and the high‑performance I/O and caching strategies that enable millions of TPS on modern hardware.

JD Tech
JD Tech
JD Tech
Performance, Storage, Index, and Cache Design of JD's JMQ4 Message Middleware

JMQ is JD.com’s in‑house message middleware launched in 2014, serving nearly ten thousand applications and handling over 500 billion messages during the 2018 11.11 peak. The fourth major version released in 2018 achieved more than 1 million TPS per broker.

Performance tests under identical hardware compared JMQ4, JMQ2, and Kafka in two scenarios. In a single‑message synchronous flush test, JMQ4 wrote ~230 k messages/s, roughly twice JMQ2 and nearly three times Kafka’s ~92 k messages/s. In a batch asynchronous flush test, JMQ4 reached 1.037 M messages/s (10× its predecessor) while Kafka peaked at ~1.14 M messages/s.

JMQ2’s storage used a shared journal file for all topics and separate queue files for indexes, maximizing sequential disk writes but limiting flexibility for topic‑level data operations. Kafka stores data per partition with paired log and index files, offering better batch write performance and easier replication, yet suffers when many partitions are needed for massive micro‑service clusters.

JMQ4 adopts a hybrid design: each topic has its own log files, and each partition maintains dense index files. This balances performance with flexibility, allowing higher concurrency per topic.

For indexing, Kafka uses sparse indexes requiring binary search, while JMQ employs fixed‑length dense indexes, enabling direct calculation of index positions and faster lookups.

JMQ4’s I/O layer switched from memory‑mapped files (MappedByteBuffer) to DirectBuffer with asynchronous FileChannel writes. Although DirectBuffer adds an extra memory copy, it avoids mmap overhead, page‑fault latency, and allows better batch size control, resulting in superior throughput.

The cache subsystem stores entire message files as DirectBuffer pages, reducing disk‑to‑memory copies and allowing immediate consumption of newly written messages. A two‑dimensional eviction policy prioritizes removal of cold pages far from the log tail, preserving hot‑data hit rates and preventing cache pollution during mid‑log scans.

Predictive cache pre‑loading leverages the sequential read/write nature of message streams: when a hot cache page near the tail is accessed, the next page is asynchronously loaded, further improving cache hit ratios.

JavaperformancecacheIndexingMessage QueueJMQstorage design
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.