Fundamentals 23 min read

Mastering Paxos: A Step‑by‑Step Guide to Distributed Consensus

This article provides a comprehensive, step‑by‑step explanation of the Paxos consensus algorithm—including its basic principles, roles, prepare and accept phases, Multi‑Paxos extensions, leader election, and state‑machine integration—while offering practical insights for building reliable distributed systems.

Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
Mastering Paxos: A Step‑by‑Step Guide to Distributed Consensus

Before discussing distributed consistency, we first get a basic understanding of distributed protocol algorithms, then analyze common problems in distributed environments, and finally return to the consistency issue. This article mainly explains the Paxos consensus algorithm, covering a naive description, process principles, and implementation details. It does not prove Paxos; interested readers can refer to Lamport’s paper.

Consensus Problem Description

Assume three service nodes can propose values. Paxos ensures that one of the proposals is selected, satisfying three safety conditions:

Only proposed values are eligible for selection.

Exactly one proposal value is chosen.

Unless a proposal is chosen, processes cannot learn its value.

Considering three independently deployed nodes communicating asynchronously (non‑Byzantine), we model the system accordingly.

Nodes run at arbitrary speeds; if a proposal is persisted before a crash, it can be recovered after restart.

Messages may be delayed arbitrarily, can be duplicated or lost but not corrupted.

Paxos Roles

Proposers : initiate a write with a monotonically increasing proposal number and value v.

Acceptors : receive proposals, run the consensus algorithm to select the final value v, and write it to learners; once written, it cannot change.

Learners : store the chosen value and provide it to other learners.

How Values Are Determined

Prepare Phase

In the asynchronous non‑Byzantine model, to determine a value the proposer must satisfy:

Acceptors must accept the first proposal request.

If a value v is chosen, every higher‑numbered proposer must also have value v.

<code>1) 如果选择一个值为v的提议请求,则被任意一个接收者所接收的每一个较高编号的提议者都存在提议值v.
2) 如果选择一个值为v的提议请求,那么任何提议者发出的每一个高编号的提案都存在提议值为v.
3) 对于任意的数据v和n, ...</code>

After the prepare phase, acceptors respond:

Acceptors promise not to accept proposals with numbers lower than n and return a response.

If a value v' has already been chosen, they return the highest numbered proposal .

This constitutes the prepare stage.

Accept Phase

Based on the prepare responses, proposers send accept requests:

If the proposer receives a response with its own number n, it proceeds to send an accept request [n, v].

If the response contains a higher numbered chosen value , the proposer adopts v' and sends [n', v'].

Acceptors, having promised not to accept lower numbers, accept the request with the highest number and store the value in learners.

Basic Paxos Algorithm Overview

The algorithm ensures that multiple nodes reach consensus on a single value.

Paxos diagram
Paxos diagram

Multi‑Paxos Design

Multi‑Paxos runs many instances of Basic Paxos to achieve consensus on a sequence of values. A leader node is elected to handle client writes, forwarding them to acceptors, while reads are served directly from the leader.

Using a leader reduces proposal conflicts and improves performance, but introduces a single‑point‑of‑failure and a write‑throughput bottleneck.

To achieve high availability, proposers can be clustered, persisting proposals so that redundant nodes can learn and recover data.

Multi-Paxos leader diagram
Multi-Paxos leader diagram

State Machine Strategy

To guarantee ordered, consistent execution of commands, a replicated state machine records each proposal number and chosen value. Servers act as proposer/acceptor/learner, learning the latest instance and proposing with a higher number.

Leader election reduces the need for each server to learn before proposing, improving performance.

State machine replication diagram
State machine replication diagram

For readers interested in the formal proof of Paxos, refer to Lamport’s paper.

State MachineLeader Electionmulti-paxosPaxosdistributed consensus
Xiaokun's Architecture Exploration Notes
Written by

Xiaokun's Architecture Exploration Notes

10 years of backend architecture design | AI engineering infrastructure, storage architecture design, and performance optimization | Former senior developer at NetEase, Douyu, Inke, etc.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.