Follower Reads, Closed Timestamp, and Minimum Proposal Tracker in CB‑SQL
This article explains how CB‑SQL implements follower reads by using safe (closed) timestamps, describes the CT update mechanism with a Minimum Proposal Tracker, and discusses routing, replica read validation, timestamp forwarding, range split/merge handling, and recovery strategies for consistent distributed reads.
Preface – Follower read enables clients (typically gateway proxies) to perform consistent historical reads from replica nodes using a timestamp, which is especially useful for long‑running analytical queries, reduces load on the leaseholder, improves cluster read performance, and aids CDC.
Safe Timestamp – A safe timestamp (closed timestamp) guarantees that no new transaction committing before this timestamp will be visible; CB‑SQL stores a closed timestamp per store, periodically exchanges it across the cluster, and uses it as a global snapshot point for replica reads.
CT Update – Each store maintains a Minimum Proposal Tracker (MPT) that generates CT updates containing a liveness epoch, closed timestamp, sequence number, and MLAI (minimum lease‑applied‑index map). These updates ensure that leaseholders cannot acquire leases at timestamps ≤ the closed timestamp, that MLAI reflects fully committed proposals, and that lease transfers respect the closed timestamp.
Routing – A read‑only transaction can include AS OF SYSTEM TIME in its SQL; the gateway computes an offset from the current CT, selects an appropriate replica based on latency and geography, and if the replica cannot serve the request it returns NotLeaseHolderError prompting a retry on the primary.
Reading from Replicas – When a replica receives a read request it: (1) checks the lease and its epoch, (2) checks its CT state and epoch, (3) verifies the transaction timestamp ≤ CT, and (4) ensures the lease‑applied index meets or exceeds the range’s MLAI. If all checks pass, the replica serves the read; otherwise it returns an error.
Timestamp Forwarding and Intents – Transactions have an OrigTimestamp (read timestamp) and a commit Timestamp. Uncommitted intents are written with the OrigTimestamp and are ignored on read unless the transaction commits, in which case the commit timestamp is used. MVCC visibility also considers HLC error margins, requiring a restart with a higher timestamp if the version is within the margin.
CT Update Process – In a simplified scenario with no pending writes, a store closes a timestamp by preventing new writes at that timestamp and tracks MLAI for each replica. When writes are ongoing, the MPT tracks in‑flight proposals and determines when a new closed timestamp is safe.
Minimum Proposal Tracker (MPT) – The MPT uses two reference counters (left and right of a moving “next” timestamp) to track proposals. It updates MLAI entries as proposals are applied, sends empty MLAI maps when the left counter is non‑zero, and finally closes the next timestamp when both counters are zero.
Update Recovery – If a store’s liveness epoch or sequence number jumps, it either sends a full update with a zero sequence number (covering all ranges) or requests per‑range updates when MLAI loss causes a read failure.
Range Split and Merge – Splitting a range copies its MLAI to both children. Merging ranges with different closed timestamps requires a subsume operation that returns the current timestamp and resets the timestamp‑cache low water to align the merged ranges.
Conclusion – Using follower reads is as simple as appending AS OF SYSTEM TIME to a query; the core challenge lies in maintaining the closed timestamp. CB‑SQL’s MPT‑driven CT updates provide a robust solution that also benefits CDC by generating resolved timestamps for downstream processing.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.