Apache Pulsar: Architecture, Topics, Producers, Consumers, and Storage
Apache Pulsar is a cloud‑native distributed messaging system that combines stateless brokers, a BookKeeper storage cluster, and ZooKeeper metadata to route messages from producers to consumers across scalable partitioned or non‑partitioned topics, supporting multiple subscription models, producer access modes, routing strategies, batching, acknowledgments, delayed delivery, and deduplication.
This article introduces Apache Pulsar, an Apache top‑level project that provides a cloud‑native distributed messaging platform integrating messaging, storage, and lightweight function computing.
Architecture : Pulsar consists of Producers, Consumers, multiple Brokers, a BookKeeper cluster, and a ZooKeeper ensemble. Producers publish messages to topics, Consumers subscribe to topics, Brokers handle stateless message routing and load balancing, BookKeeper provides persistent storage, and ZooKeeper stores metadata and coordinates the cluster.
Broker Expansion : Brokers are stateless; adding more Brokers scales consumer and producer capacity. Automatic partition load balancing migrates partitions to less‑loaded Brokers when resource thresholds are reached.
BookKeeper Expansion : Scaling the storage layer is achieved by adding Bookie nodes. The shard‑based storage design avoids unnecessary data movement during expansion.
Topic Types : Pulsar supports non‑partitioned and partitioned Topics. Partitioned Topics distribute load across multiple Brokers, improving throughput. Topics can be persistent (stored in BookKeeper) or non‑persistent (in‑memory only).
Subscription Models : Four subscription types are available—Exclusive, Failover, Shared, and Key_Shared. Exclusive allows a single consumer per subscription; Failover provides a primary consumer with automatic failover; Shared distributes messages round‑robin among consumers; Key_Shared ensures messages with the same key go to the same consumer.
Subscription Modes : Subscriptions can be durable (cursor persisted) or non‑durable (cursor lost on broker restart). Durable subscriptions retain messages until acknowledged.
Producer Access Modes : Producers can use Shared (default), Exclusive, or WaitForExclusive modes to control how many producers can publish to a Topic.
Routing Modes : When publishing to partitioned Topics, Pulsar offers RoundRobinPartition (default), SinglePartition, and CustomPartition routing strategies.
Batching and Acknowledgment : Pulsar supports batch message publishing. Acknowledgment can be cumulative or individual. The AcknowledgmentsGroupingTracker batches ack requests for performance. NegativeAcksTracker handles negative acknowledgments with configurable delay.
Redelivery and Ack Timeout : Redelivery backoff, automatic redelivery on ack timeout, and negative acks provide reliable message reprocessing.
Message Retention and Deduplication : Brokers delete fully acknowledged messages but retain unacknowledged ones. Message retention and expiry policies control storage duration. Pulsar can deduplicate messages based on producer‑generated sequence IDs.
Delayed Delivery : Messages can be delayed using deliverAfter or deliverAt, with a DelayedDeliveryTracker managing timing and ordering.
Bundle Mechanism : Topics are mapped to Bundles, which are hash ranges bound to specific Brokers. Bundles can be split and reassigned to balance load.
Storage Layer (BookKeeper) : Pulsar stores messages in BookKeeper ledgers, which are sharded across Bookies. Each ledger consists of entries written to journals, entry logs, and index files, with caching layers for fast reads and writes.
References include Pulsar official documentation, BookKeeper documentation, and several technical series articles on Pulsar features.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.