Understanding Broker‑Side Message Acknowledgment and Cursor Management in Apache Pulsar
This article explains how Apache Pulsar brokers track consumer acknowledgment using cursors, describes persistent and non‑persistent subscription differences, details cursor metadata stored in ZooKeeper and BookKeeper, and outlines optimizations for handling message gaps, including RangeSet improvements and a new LRU‑based storage design.
In the previous article we introduced the various client‑side acknowledgment modes of Apache Pulsar; this piece focuses on how the broker side manages those acknowledgments using cursors.
Each subscription has a cursor that records the current consumption position; persistent cursors store metadata in ZooKeeper while non‑persistent cursors keep it in broker memory. The cursor contains several key attributes such as Bookkeeper , MarkDeletePosition , PersistentMarkDeletePosition , ReadPosition , LastMarkDeleteEntry , CursorLedger , IndividualDeletedMessages , and BatchDeletedIndexes , as shown in the table.
ZooKeeper only holds index information for the cursor (ledger name, last deleted entry, and last activity timestamp), while the bulk of the cursor data is persisted in BookKeeper ledgers. When a consumer acknowledges a message, the cursor’s pointer may move forward depending on the acknowledgment type (single, cumulative, or batch), but negative acknowledgments do not affect the cursor.
Shared subscriptions can create acknowledgment gaps (holes). Pulsar stores these gaps using a IndividualDeletedMessages container backed by Guava Range objects, which efficiently represent continuous acknowledged ranges. For high‑frequency gaps, an optimized ConcurrentOpenLongPairRangeSet uses a BitSet to reduce memory usage.
Configuration options such as managedLedgerUnackedRangesOpenCacheSetEnabled=true enable the optimized RangeSet, and managedLedgerMaxUnackedRangesToPersistInZooKeeper limits how many gap entries are persisted to ZooKeeper.
A new PIP introduces an LRU‑based, segmented storage approach to handle large volumes of gap data. Hot gap intervals are kept in memory, while cold data is evicted. Gaps are split across multiple BookKeeper entries, and a special Marker entry records the index of all split entries, ensuring atomic recovery.
The marker‑based design also supports dirty‑flag tracking per ledger, so only modified gap data is rewritten, reducing I/O. When a ledger fills, a new marker triggers copying of indices to a new ledger.
Overall, Pulsar’s cursor and gap management combine ZooKeeper indexing, BookKeeper persistence, RangeSet optimizations, and upcoming LRU‑segmented storage to provide reliable, scalable message acknowledgment handling.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.