Understanding Distributed Transactions, Consistency Models, Sharding, and Commit Protocols
This article explains the fundamentals of distributed transactions, including ACID properties, consistency models, sharding strategies, and the two‑phase, three‑phase, and TCC protocols, while discussing CAP and BASE theories and the challenges of implementing reliable distributed databases.
What Is a Distributed Transaction
In everyday life a transaction is an all‑or‑nothing operation, such as transferring money between accounts. When data is stored across multiple databases, guaranteeing that all steps either complete or roll back becomes difficult due to network latency, failures, and power loss.
Atomicity (Atomic)
Atomicity means a transaction cannot be divided into smaller parts; it must be executed as a whole, similar to how a molecule is the smallest unit that retains chemical properties.
Consistency
Consistency ensures that the total amount of money before and after a transaction remains unchanged, preventing intermediate states that would break data integrity. In distributed systems, all replicas must eventually reflect the same result.
Isolation
Isolation guarantees that a transaction in progress is invisible to other concurrent transactions, preventing interference such as double‑spending.
Durability
Durability means that once a transaction commits, its effects are permanently stored on disk and survive crashes.
Consistency Discussion
Three consistency levels are described:
Strong consistency : every read sees the most recent write across all nodes.
Weak consistency : reads may see stale data; the system tolerates temporary divergence.
Eventual consistency : replicas converge to the same state over time, even if they diverge temporarily.
Practical systems choose the appropriate level based on business needs—for example, payment processing requires strong consistency, while product inventory can tolerate weaker guarantees.
Sharding (Database Partitioning)
Sharding splits data across multiple databases to alleviate single‑node bottlenecks.
Vertical Sharding
Vertical sharding separates tables based on business domains, storing loosely coupled tables in different databases. This resembles micro‑service data isolation.
Horizontal Sharding
Horizontal sharding distributes rows of a single table across databases, often by ID range or hash.
Problems Introduced by Sharding
When a transaction spans multiple shards, it becomes a distributed transaction, requiring coordination across nodes and exposing the system to higher latency, increased conflict probability, and potential deadlocks.
CAP Principle
CAP states that a distributed system can only simultaneously guarantee two of the three properties: Consistency, Availability, and Partition tolerance. Partition tolerance is mandatory; thus, designers must trade off between consistency and availability.
BASE Theory
BASE (Basically Available, Soft state, Eventually consistent) relaxes ACID constraints for large‑scale internet systems, accepting temporary inconsistency in exchange for higher availability.
Two‑Phase Commit (2PC)
2PC splits a distributed transaction into a voting phase and a commit phase, using a single coordinator and multiple participants. It ensures atomicity but suffers from single‑point‑of‑failure, blocking, and possible inconsistency if the coordinator crashes.
Three‑Phase Commit (3PC)
3PC adds a pre‑commit (can_commit) phase to reduce blocking time and introduces timeout handling. It improves safety over 2PC but is more complex and still not as widely adopted due to performance costs.
TCC (Try‑Confirm‑Cancel) Pattern
TCC moves the transaction logic to the service layer. The Try step reserves resources, Confirm finalizes them, and Cancel releases them. Each step is a local transaction, making the overall process more resilient to failures.
Conclusion
The article provides a comprehensive overview of distributed transaction concepts, consistency models, sharding techniques, and the main coordination protocols (2PC, 3PC, TCC), helping architects choose the right strategy based on system requirements and trade‑offs.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.