Cloud Computing 11 min read

Technical Overview of Tencent Cloud CMQ: Architecture, Reliability, Consistency, and Scalability

Tencent Cloud CMQ is a Raft‑based distributed message queue that delivers high reliability, strong consistency, and horizontal scalability for finance‑grade workloads, using multi‑node broker sets with majority‑acknowledged writes, automatic leader election, unlimited buffering, full‑path tracing, while requiring application‑level idempotency and offering limited strict ordering.

Tencent Cloud Developer

Mar 13, 2017

Technical Overview of Tencent Cloud CMQ: Architecture, Reliability, Consistency, and Scalability

In the era of widespread distributed systems, message middleware is extensively used within systems and between platforms for data exchange and decoupling. CMQ is Tencent Cloud's internally developed, high‑reliability, strong‑consistency, and scalable distributed message queue. It is used internally by WeChat, QQ, mobile recharge, advertising orders, and other services, and has now been opened to the public. This article shares the core technical principles of Tencent Cloud CMQ.

CMQ is primarily suitable for business scenarios such as finance, trading, and order processing that demand high reliability and availability.

Taking the Tencent recharge system as an example, the system uses CMQ to asynchronously decouple the transaction module, delivery module, and settlement system, smoothing traffic spikes and reducing inter‑module coupling. During a typical day at the beginning of a month, the system forwards more than one billion messages through CMQ, with peak rates exceeding 100,000 messages per second. At its highest, hundreds of millions of messages are buffered by CMQ, relieving pressure on downstream consumer modules.

Figure 1 – Architecture of a typical recharge system.

The overall CMQ architecture is shown in Figure 2, with a focus on the backend broker‑set implementation. A set usually consists of three nodes; multi‑replication ensures message reliability, and multiple nodes improve system availability. The number of nodes in a set can be increased according to business needs to further enhance reliability and availability.

Figure 2 – Overall CMQ architecture.

The internal structure of a CMQ set is illustrated in Figure 3.

Figure 3 – Internal structure of a broker set.

The following sections discuss high reliability, strong consistency, system availability, scalability, and full‑path message tracing.

Production Reliability

As shown in Figure 3, a client’s message is considered successfully produced when more than half of the brokers in the set have flushed the data to disk and returned an acknowledgment. If the client does not receive an acknowledgment within a certain time, it must retry to ensure successful delivery.

Because acknowledgments may be lost in the network, duplicate messages can occur. CMQ does not perform automatic deduplication; idempotency must be handled by the business logic.

Storage Reliability

In a CMQ set, one node acts as the leader and the others as followers. The leader writes incoming messages to a Raft log, flushes it to disk, and replicates the log to followers. Once a majority of nodes have successfully persisted the log, the leader commits the request to the message‑queue state machine, which updates the appropriate queue. This ensures that a message acknowledged to the client is stored on at least two disks, dramatically reducing the risk of data loss due to disk failure. Additionally, a checksum is stored with each message so that consumers can verify integrity before processing.

Consumption Reliability

When a consumer pulls a message, it specifies a visibility timeout. During this period the consumer must explicitly acknowledge and delete the message. If the timeout expires without acknowledgment, the message becomes visible again for other consumers, preventing loss caused by processing failures.

The acknowledgment process mirrors the production path: the broker writes the message ID and status to its log.

Strong Consistency Implementation

CMQ uses the Raft consensus algorithm. In a three‑node set (A as leader, B and C as followers), a message acknowledged to the client is stored on at least two nodes (e.g., A and B). If the leader A fails, the remaining nodes elect a new leader (B) that already contains the complete log, ensuring that the new leader serves the same consistent view of data.

In network partition scenarios (Figure 5), the original leader cannot obtain a majority and steps down, while the remaining nodes elect a new leader. Raft guarantees that the new leader possesses the most up‑to‑date log, preserving strong consistency.

Availability Guarantee

When the master (leader) fails, the remaining followers automatically elect a new leader. Client requests are transparently redirected to the new leader. The election timeout (RTO) is currently around 5 seconds. CMQ prioritizes CP in the CAP theorem: as long as a majority of nodes are operational, message production and consumption continue. In cases where multiple nodes fail, CMQ’s monitoring and scheduling quickly migrate queues to healthy sets, minimizing downtime.

Horizontal Scaling and Unlimited Accumulation

The concept of a set is transparent to users. The CMQ controller server monitors load and dynamically migrates queues across sets. If a queue’s request volume exceeds a set’s threshold, the controller routes the queue to additional sets, increasing concurrency. For services requiring massive accumulation, routing can theoretically achieve unlimited buffering capacity.

CMQ can guarantee strict ordering only under specific conditions, such as a single producer, a single consumer, or when the queue’s consumption window is set to 1.

Full‑Path Message Trace

Each message’s complete path includes the producer, broker, and consumer. During processing, each component appends trace information. Aggregating these traces provides the full lifecycle of any message, greatly simplifying production‑environment troubleshooting.

Conclusion

CMQ is a Raft‑based distributed message queue that ensures high reliability and strong consistency, serving order‑ and transaction‑heavy business scenarios. Idempotency must be handled by the application layer, and strict ordering is guaranteed only under certain conditions.

For workloads that prioritize ultra‑high performance and throughput, Tencent Cloud offers another messaging engine compatible with the Kafka protocol, catering to big‑data scenarios. Further details will be covered in upcoming articles.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Reliability Distributed Messaging cloud messaging Raft CMQ

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.