Fundamentals 26 min read

Data Consistency, Replication, and Distributed Transaction Protocols: From Partitioning to Paxos and Dynamo

The article examines the challenges of scaling a single‑server data service, compares data partitioning and replication, explains consistency models, and surveys distributed transaction protocols such as 2PC, 3PC, Paxos, and Dynamo's NWR model, highlighting their trade‑offs in availability, consistency, and performance.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Data Consistency, Replication, and Distributed Transaction Protocols: From Partitioning to Paxos and Dynamo

When a single server cannot handle all network requests and its failure would cause data loss, we must scale out by adding more machines, typically using either data partitioning or data replication.

Data partitioning (e.g., sharding by uid % 16 or consistent hashing) spreads data across servers but cannot prevent data loss on a node failure. Data replication stores identical copies on multiple servers, providing high availability at the cost of increased complexity, especially for cross‑server transactions.

The classic use case of transferring money from account A to account B illustrates a six‑step transaction that must be atomic; adding more machines makes this coordination difficult.

Two main scenarios arise:

With partitioning, if A and B reside on different servers, a cross‑machine transaction and rollback logic are required.

With replication, concurrent writes to different replicas of the same account must be synchronized to avoid conflicts.

Beyond availability, we must consider three key concerns: disaster recovery (failover), data consistency (transaction processing), and performance (throughput and latency).

High availability is achieved by storing multiple copies of data, which inevitably introduces consistency challenges that in turn affect performance.

Consistency Models

Three basic consistency types are discussed:

Weak consistency : a read may or may not see a recent write (e.g., caches, some games).

Eventual consistency : a read may miss a recent write but will see it after some time (e.g., DNS, email, S3).

Strong consistency : any read sees the latest write immediately (e.g., file systems, RDBMS).

Weak and eventual models are usually asynchronous (better performance, more complex state), while strong models are synchronous (simpler but slower).

Replication Architectures

Master‑Slave : the master handles all reads/writes and synchronizes changes to slaves, either asynchronously (eventual) or synchronously (strong). Failure of the master during a sync window can cause data loss; slaves can be promoted to read‑only or take over if needed.

Master‑Master (Multi‑master) : multiple masters provide read‑write service; synchronization is typically asynchronous, leading to eventual consistency and potential write conflicts that must be resolved (e.g., Dynamo’s vector clocks).

Two‑Phase Commit (2PC)

2PC ensures atomicity across multiple nodes by using a coordinator that first asks all participants to prepare (phase 1) and then, if all agree, issues a commit (phase 2). It provides strong consistency but suffers from blocking, timeout handling, and coordinator availability issues.

Three‑Phase Commit (3PC)

3PC adds an extra pre‑commit phase to avoid the “uncertain” state of 2PC, allowing the system to proceed to commit even if a participant fails after the pre‑commit stage.

Two Generals Problem

The classic unsolvable coordination problem illustrates the difficulty of achieving agreement over an unreliable channel; it leads to the Byzantine Generals problem.

Paxos Algorithm

Paxos achieves consensus on a value in a distributed system despite failures. It operates in two phases (Prepare and Accept) and requires a majority of nodes to agree, forming the basis for many modern distributed databases and services.

NWR Model (Amazon Dynamo)

Dynamo lets users choose consistency vs. availability by configuring N (total replicas), W (writes required), and R (reads required) such that W+R > N. Different settings trade off latency, durability, and the possibility of stale reads or write conflicts, which are resolved using version vectors or vector clocks.

Summary

High availability demands redundant writes, which introduces consistency challenges that in turn affect performance; the CAP theorem formalizes that only two of consistency, availability, and partition tolerance can be fully achieved at once.

distributed systemsCAP theoremdata consistencyReplicationPaxostransaction protocolsDynamo
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.