Backend Development 15 min read

Design and Implementation of QMQ: Qunar.com’s Internal Message Queue

This article examines Qunar.com’s internally developed message queue (QMQ), discussing the motivations behind building it, the challenges of consistency and storage, client‑side transaction integration, the novel log‑based storage model, and its support for arbitrary delayed messages.

Qunar Tech Salon

Dec 8, 2018

Design and Implementation of QMQ: Qunar.com’s Internal Message Queue

Qunar.com recently open‑sourced its internally used message queue, codenamed QMQ, on GitHub. The article starts by describing the background: rapid business growth in 2012 forced the company to move from a monolithic architecture to a service‑oriented one, requiring a reliable communication middleware. While synchronous RPC (Dubbo) was chosen for direct calls, an asynchronous solution was needed for decoupled services.

At that time, existing open‑source MQs such as RabbitMQ, Kafka, and ActiveMQ each had drawbacks for Qunar’s needs—RabbitMQ required Erlang, Kafka and MetaQ were immature, and ActiveMQ suffered from frequent crashes, message loss, and complex code. Consequently, Qunar decided to build its own queue.

Problems

The primary concerns were ensuring data consistency for e‑commerce transactions. The queue had to guarantee that business operations and message publishing succeed or fail together, and that consumers could achieve eventual consistency through ACKs and retries.

On the server side, the initial design stored messages in a database, but rapid growth in message volume and the need for flexible delayed delivery forced a redesign of the storage model.

Client‑Side Consistency

Qunar leveraged MySQL’s transactional capabilities to bind business operations and message publishing within a single DB transaction. By creating a dedicated message database on every MySQL instance, both the business write and the message insert could be committed atomically.

Example SQL transaction:

begin transaction;
insert into A.tbl1(name, age) values('admin', 18);
insert into B.tbl2(num) values(20);
end transaction;

In a payment scenario, the code ensures that inserting a payment record and sending a payment‑completed message occur in the same transaction. The producer does not send the message over the network immediately; instead, it inserts a record into the message table. After the transaction commits, a callback sends the message, and on successful delivery the message is deleted. If sending fails, a compensation task retries until success.

Compensation workflow:

begin tx – start local transaction

do work – execute business logic

insert message – write to the same‑instance message table

end tx – commit transaction

send message – network transmission to MQ server

response – server acknowledges

delete message – remove on success

scan messages – compensation task scans unsent messages

send message – retry sending

delete messages – remove successfully resent messages

Server Storage Model

QMQ’s storage model was designed after analyzing the limitations of partition‑based systems like Kafka and RocketMQ. Partition‑based queues suffer from load‑balancing issues when the number of consumers does not match the number of partitions, making scaling cumbersome and leading to uneven consumption.

Qunar’s requirements—tens of thousands of topics, many consumers per topic, and the need for dynamic scaling—made a partition‑based approach unsuitable.

Issues with Partition‑Based Model

When consumers exceed partitions, some consumers stay idle; when partitions exceed consumers, some consumers handle more load, causing imbalance. Adding partitions to increase throughput is heavyweight, and partitions cannot be removed easily, complicating scaling down.

Furthermore, message backlogs in a few partitions cannot be consumed faster by merely adding more consumers, because the backlog is tied to specific partitions.

QMQ Storage Model

QMQ adopts a three‑log architecture:

Message log : sequentially appends all messages from every subject; serves as the primary storage.

Consume log : stores indexes (offsets) into the message log for each consumer, representing consumption progress.

Pull log : generated per consumer pull operation, recording the sequence in the consume log that the consumer has fetched.

This decouples consumers from partitions; consumers can scale independently by using the pull log’s sequence as their offset.

Delayed Message Model

QMQ also supports arbitrary‑time delayed messages using a two‑layer hash‑wheel timer. The first layer persists hourly buckets on disk (up to two years, ~17,500 files). The second layer loads the imminent hour’s bucket into an in‑memory hash wheel for timely dispatch.

Three logs are used for delayed messages:

Message log : same as real‑time, appends incoming messages.

Schedule log : organized by delivery hour on disk, generated by replaying the message log according to delay times.

Dispatch log : records messages that have been delivered, enabling recovery after restarts.

Summary

Message queues are a critical infrastructure for micro‑service architectures. This article presents Qunar’s practical experience, compares existing open‑source solutions, and details the design and implementation of QMQ, including its log‑based storage, client‑side transactional publishing, and flexible delayed‑message support. The QMQ source code is available on GitHub for the community to try and contribute.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Consistency delayed messages Storage Model

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.