Fundamentals 21 min read

Advanced Distributed Systems Theory: Paxos, Raft, and Zab

This article provides an in‑depth exploration of distributed consensus protocols, detailing the basics of Paxos, extending to Multi‑Paxos, and comparing it with Raft and Zab while discussing leader election, quorum, lease mechanisms, and practical considerations for implementing these algorithms in real‑world systems.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Advanced Distributed Systems Theory: Paxos, Raft, and Zab

In this article we introduce the consistency problem in distributed systems and present the Paxos protocol, which ensures agreement among nodes even under failures, message reordering, or network partitions.

Basic Paxos defines two roles: proposers and acceptors. When a single proposer issues a proposal and there are no failures, an acceptor can decide a value by accepting the first proposal it receives:

P1. 一个acceptor接受它收到的第一项提议

To relax the assumptions, Paxos requires each proposal to have a unique ID (ID, value) and that a value is chosen only when a majority (quorum) of acceptors have accepted it. The following conditions guarantee that only one value is ever chosen:

P2. 如果一项值为v的提议被确定,那么后续只确定值为v的提议
P2a. 如果一项值为v的提议被确定,那么acceptor后续只接受值为v的提议

When multiple proposers may act concurrently, proposers must also obey:

P2b. 如果一项值为v的提议被确定,那么proposer后续只发起值为v的提议
P2c. 对于提议(n,v),如果多数派acceptor中最近一次接受的提议的值为v',则要求v = v';否则v可为任意值

These three constraints (B1‑B3) form the core of Basic Paxos and ensure consistency even when some nodes crash and later recover.

To record the highest accepted proposal and to promise not to accept lower‑ID proposals, acceptors must (1) store the largest ID they have accepted and (2) refuse proposals with smaller IDs.

With these rules, a single round of Paxos consists of a prepare phase (proposer queries acceptors) and an accept phase (proposer sends the chosen value). The process is illustrated in the diagram below:

When several proposers issue proposals concurrently, livelock can occur. A common remedy is to elect a single leader that serialises proposals.

Multi‑Paxos extends Basic Paxos by running many instances (each deciding a single value) and by keeping a stable leader, which removes the need for the prepare phase in most rounds, thus improving performance.

A learner role is added so that nodes can pull decided values from acceptors after recovery, ensuring that all replicas eventually converge.

After covering Paxos, the article discusses related concepts: leader election, quorum, and lease.

Leader election algorithms such as Bully assign a numeric identifier to each node; the node with the highest identifier becomes the leader. The algorithm proceeds through a series of messages illustrated below:

Quorum (majority) voting guarantees that only one decision can be made even when the network is partitioned: with 2f+1 nodes, a decision requires more than f votes.

Lease mechanisms grant exclusive rights to a node for a bounded time, preventing split‑brain scenarios during leader failures. The lease workflow is shown below:

Finally, the article compares Paxos with two widely‑used practical consensus protocols: Raft and Zab.

Raft simplifies understanding by fixing a single leader that handles all client requests, replicating a log of commands to followers, and using terms as logical clocks. The protocol proceeds through AppendEntries, commitment, and state‑machine execution, as depicted:

Zab, the protocol used inside ZooKeeper, also relies on a unique leader but focuses on strong (linearizable) consistency. Zab consists of three phases—discovery, sync, and broadcast—and orders transactions using a zxid composed of an epoch and a counter.

A comparative diagram highlights the similarities and differences among Paxos, Raft, Zab, and Viewstamped Replication, emphasizing leader roles, quorum usage, and state‑machine replication.

In summary, Paxos, Raft, and Zab each address the same fundamental consistency problem with different trade‑offs; the choice of protocol should be guided by application requirements rather than a notion of one being universally superior.

distributed systemsRaftConsensusLeader ElectionPaxosZAB
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.