Online Schema Change in Distributed Databases: Insights from Google F1 and CB‑SQL
The article explains how distributed databases can perform online, asynchronous schema changes without blocking reads or writes by using lease‑based coordination, intermediate delete‑only and write‑only states, and a multi‑step algorithm inspired by Google F1 and implemented in CB‑SQL.
Traditional relational databases require a table lock during schema changes, which blocks all DML operations and is unsuitable for distributed systems that store petabytes of data.
Online schema change is challenging because network latency prevents all nodes from receiving the new schema simultaneously, leading to periods where both old and new schemas coexist, risking data consistency and requiring costly backfill operations.
Google’s paper "Online, Asynchronous Schema Change in F1" presented a solution that inspired the CB‑SQL online schema change mechanism.
The article uses an analogy of a multinational company switching communication tools (email → QQ → WeChat) to illustrate the need for a deterministic transition period during which both old and new systems may operate.
The core algorithm relies on a lease mechanism: a table schema lease (typically a few minutes) ensures that nodes cache the schema and refresh it before expiration, guaranteeing that all nodes hold the latest schema within a bounded time.
Two intermediate states are defined: delete‑only , where only delete operations for the new schema are allowed, and write‑only , where writes are permitted but reads cannot see the new schema. This allows safe, non‑blocking transitions.
The full schema change workflow is: Init → delete‑only → write‑only → backfill → public. During each phase, nodes may hold different schema versions, but the defined states prevent data inconsistency.
CB‑SQL adopts this approach with three states: DELETE_ONLY , WRITE_AND_DELETE_ONLY , and PUBLIC . It also introduces Read Lease and Write Lease to coordinate schema changes, ensuring only one change task runs per table at a time.
When a schema change is triggered, the system acquires a Write Lease, broadcasts the new schema via a gossip protocol, and performs backfill as a distributed, parallel task that splits the work into many small transactions to minimize impact on normal operations.
The design provides high availability: if the lease‑holder node fails, a new holder resumes the change task, and in case of failure, a reverse backfill can roll back the change.
References include the original Google F1 paper and several Chinese technical blogs and GitHub documentation that detail the implementation.
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.