Fundamentals 13 min read

Mastering Distributed Consistency: Paxos, Raft, and ZAB Explained

This article examines high‑concurrency distributed consistency algorithms—explaining the CAP challenges, detailing Paxos, Raft, and ZAB’s core concepts, roles, and workflow, and discussing their practical applications and selection criteria for ensuring strong data consistency in critical systems.

Architecture & Thinking
Architecture & Thinking
Architecture & Thinking
Mastering Distributed Consistency: Paxos, Raft, and ZAB Explained

This article focuses on distributed consistency algorithms in high‑concurrency scenarios.

This article concentrates on the analysis and discussion of distributed consistency algorithms under high‑concurrency scenarios.

In distributed environments we face three core CAP problems: consistency, availability, and partition tolerance.

Consistency: all nodes see the same data at the same time.

Availability: every request receives a response, successful or not.

Partition tolerance: the system continues to operate despite network partitions.

Ensuring data consistency under high load is crucial for core financial services such as payment, order placement, and inter‑bank transfers, where strong consistency is required to avoid monetary errors.

Distributed consistency algorithms are the key mechanisms that guarantee strong data consistency across multiple nodes.

Commonly used algorithms include:

Paxos

Raft

ZAB (ZooKeeper Atomic Broadcast)

3.1 Paxos Algorithm

Basic Concepts

Proposal: Consists of a proposal ID and a value (the command or log entry to be applied).

Roles:

Proposer – initiates proposals.

Acceptor – votes on proposals.

Learner – learns the chosen value.

Proposer creates a proposal containing an ID and the value to be written.

Acceptor must receive a majority of votes (N/2+1) before a proposal is accepted.

Learner does not participate in voting; it learns the accepted value after consensus.

Algorithm Flow

Prepare Phase: Proposer sends a Prepare request with a unique, increasing proposal number N to all Acceptors.

Promise Phase: Acceptors respond if the proposal number is higher than any previously seen, promising not to accept lower numbers.

Acknowledge Phase: Acceptors confirm they have accepted the proposal.

Decision Phase: Once a majority of Acceptors accept a proposal, it becomes the decision and is applied by all nodes.

Learn Phase: Learners retrieve the chosen value from Acceptors.

If the number of successful responses exceeds half of the Acceptors, the value is committed; otherwise the proposer retries with a higher proposal number.

Applications

Paxos is highly fault‑tolerant and is used in systems such as Zookeeper (via Multi‑Paxos) and Google’s distributed lock service.

3.2 Raft Algorithm

Basic Concepts

Raft solves distributed consistency by providing a clear approach to leader election, log replication, and safety.

Leader Election and Timeouts

Servers can be in three states: Leader, Follower, or Candidate. Followers start election timers; if a timer expires without hearing from a Leader, the Follower becomes a Candidate and requests votes. A candidate that receives a majority becomes the Leader.

Roles:

Leader – handles client requests, replicates logs, sends heartbeats.

Follower – passive, receives heartbeats and logs from the Leader.

Candidate – seeks election when no Leader is known.

3.3 ZAB (ZooKeeper Atomic Broadcast) Algorithm

Basic Concepts

ZAB is the atomic broadcast protocol used by ZooKeeper to guarantee data consistency. It adapts ideas from Paxos but is tailored for ZooKeeper’s leader‑follower architecture, supporting crash recovery.

Broadcast Process

Clients send write requests to the Leader, which packages them into a proposal and broadcasts to Followers. If a majority of Followers acknowledge, the Leader commits the transaction and notifies all Followers.

Summary

Distributed consistency algorithms ensure that multiple nodes produce the same result when reading or modifying shared data, which is essential for the reliability of distributed systems. The most common algorithms are Paxos, Raft, and ZAB, each with distinct characteristics and suitable scenarios. Selecting the appropriate algorithm depends on factors such as system scale, node count, communication overhead, consistency requirements, and fault tolerance.

Paxos: Message‑based consensus algorithm suitable for a wide range of distributed systems.

Raft: Easier‑to‑understand consensus algorithm that separates concerns into leader election, log replication, and safety.

ZAB: ZooKeeper‑specific atomic broadcast protocol designed for crash recovery and strong consistency.

Other algorithms like the Gossip protocol also exist for specific use cases.

distributed systemsCAP theoremRaftPaxosZABconsistency algorithms
Architecture & Thinking
Written by

Architecture & Thinking

🍭 Frontline tech director and chief architect at top-tier companies 🥝 Years of deep experience in internet, e‑commerce, social, and finance sectors 🌾 Committed to publishing high‑quality articles covering core technologies of leading internet firms, application architecture, and AI breakthroughs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.