An Overview of Google Spanner: Architecture, Design, and Concurrency Control
This article provides a comprehensive overview of Google Spanner, a globally distributed, multi‑version database, covering its background, core components, TrueTime‑based global synchronization, Paxos replication, data model, and the various transaction and concurrency mechanisms it employs.
Google Spanner is a globally distributed database that combines the scalability of NoSQL systems with the strong consistency and relational features of traditional RDBMSs. It can scale to millions of machines across dozens of data centers while providing external consistency through synchronous replication and multi‑version storage.
The system sits between Google’s F1 relational service and the GFS/Colossus file systems. F1 evolved from MySQL to a fault‑tolerant, highly available RDBMS, while Colossus (the second‑generation GFS) offers improved metadata handling, smaller chunk sizes, and Reed‑Solomon replication.
Compared with BigTable and Megastore, Spanner adds strong cross‑data‑center consistency, schema support, and SQL‑like queries, positioning itself as the next‑generation successor to both.
Spanner’s architecture relies on Paxos state machines to replicate partitioned data globally. Data is organized into Universes (deployment instances) and Zones (individual data centers). Key components include UniverseMaster, PlacementDriver, ZoneMaster, LocationProxy, and SpanServer. Each SpanServer stores tablets—B‑tree‑like files with write‑ahead logs—where keys are paired with timestamps, enabling built‑in multi‑versioning.
Directories act as logical buckets of keys with common prefixes, allowing fine‑grained placement control and background migration without client impact. This abstraction, together with the ability to interleave tables, improves data locality and performance.
The TrueTime API, powered by GPS and atomic clocks, provides a globally synchronized time interval with bounded error (≈10 ms). TrueTime timestamps all writes, enabling external consistency and snapshot reads that are globally ordered.
Spanner supports several transaction types: read‑write, read‑only, and snapshot reads (by timestamp or time range). Read‑only transactions use TrueTime timestamps to avoid locking, while read‑write transactions employ a two‑phase commit coordinated by a leader replica and Paxos, with wound‑wait deadlock avoidance.
Overall, Spanner demonstrates how precise time synchronization, Paxos replication, and a layered data placement model can achieve both high scalability and strong consistency in a worldwide database system.
Images illustrating the architecture, components, and TrueTime model are retained from the original source:
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.