Multi-AZ High‑Availability Architecture of Tencent Cloud TDMQ for Apache Pulsar
Tencent Cloud TDMQ for Apache Pulsar achieves multi‑AZ high availability by containerizing ZooKeeper, BookKeeper and Brokers, using managed ZK, persistent cloud disks and elastic NICs, enforcing quorum and rack‑aware replicas, and planning cross‑region bidirectional replication to ensure seamless disaster recovery and continuous messaging.
This article analyzes the disaster‑recovery strategy of Pulsar in multi‑availability‑zone (AZ) high‑availability scenarios from four dimensions: overall architecture, cloud‑native considerations, high‑availability design, and cross‑region synchronization.
Overall Architecture : Tencent Cloud TDMQ for Apache Pulsar (TDMQ Pulsar) is a financial‑grade commercial messaging middleware built on Apache Pulsar. It consists of three core components: ZooKeeper (ZK), BookKeeper (BK), and Broker. In the cloud version, ZK is provided as a managed service, while BK and Broker are self‑managed. An additional Lookup Service, deployed per region, handles routing and address resolution for all brokers in that region, similar to RocketMQ’s NameServer.
The entire system is fully containerized, allowing each module to run in containers for flexibility and efficiency.
Cloud‑Native and Stateful Services : Since BK is a stateful service, the design must ensure that storage and network bindings remain consistent across container restarts. Key points include:
Node composition: compute + storage + network.
Stateful service requirements: persistent storage and stable network identity.
Pluggable resources: cloud disks and elastic network cards.
To achieve this, cloud disks are used for storage and elastic NICs for network, enabling pods or machines to move across the resource pool while preserving their IP and disk mappings.
The design follows cloud‑native immutable principles: once a pod starts, its processes, programs, and configurations remain unchanged, improving stability and predictability.
High Availability : The high‑availability topology focuses on ZK, BK, and Broker:
ZK : Deploy at least three AZs with a 1‑1‑1 or 2‑2‑1 layout to guarantee quorum.
BK : As a stateful service, BK must keep the relationship between disk and IP stable; multiple replicas and rack awareness are essential.
Broker : Being stateless, the service remains available as long as at least one broker can handle the traffic.
For multi‑AZ deployment, a minimum of three nodes across three AZs is required so that ZooKeeper maintains a majority after any single AZ failure.
BK High‑Availability Details :
Multi‑AZ deployment (e.g., three AZs).
Multiple replicas using Pulsar’s built‑in quorum mechanisms.
Rack awareness to distribute replicas across zones.
Auto‑recovery: Pulsar’s Auto Recovery module restores missing replicas when a BK node fails.
Quorum parameters:
Ensemble Quorum : Number of BK nodes that must be selected for a write operation.
Write Quorum : Number of selected nodes that actually receive the data.
Ack Quorum : Number of acknowledgments required before confirming a successful write.
BK writes use striping: with an ensemble of 3 and a write quorum of 2, each message is split across two BK nodes, balancing load and ensuring durability.
Cross‑Region Synchronization (Extension) : Future plans include synchronizing instances across regions (e.g., Guangzhou and Shanghai). Metadata and messages are bidirectionally replicated, allowing failover via DNS redirection. While latency increases (tens of ms intra‑China, hundreds of ms inter‑continental), the design ensures that even if one region fails, the other retains up‑to‑date data. Special considerations include handling unsynchronized topic creation during disaster recovery and supporting global e‑commerce scenarios with distributed reporting.
Overall, the article provides a comprehensive guide to building a resilient, cloud‑native Pulsar deployment on Tencent Cloud, covering architecture, stateful service handling, high‑availability configurations, quorum settings, rack awareness, and cross‑region replication strategies.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.