Fundamentals 20 min read

Understanding the Byzantine Generals Problem and the Raft Consensus Algorithm

This article explains the Byzantine Generals problem, its fault‑tolerance limits, and how the Raft consensus algorithm solves a simplified version of the problem through leader election, log replication, and safety mechanisms, while also comparing Raft with Paxos, ZAB, and PBFT and providing Go code examples.

JD Tech
JD Tech
JD Tech
Understanding the Byzantine Generals Problem and the Raft Consensus Algorithm

The Byzantine Generals problem describes how unreliable communication and malicious participants can prevent a distributed system from reaching agreement, requiring at least two‑thirds of honest nodes to tolerate up to one‑third faulty nodes.

In early solutions, Byzantine fault tolerance (BFT) uses a "majority of the majority" rule: if fewer than one‑third of the generals are faulty, the remaining honest majority can still achieve consensus.

Raft is a leader‑based consensus algorithm that addresses a simplified Byzantine scenario (no message loss or tampering). It splits consensus into three sub‑problems: leader election, log replication, and safety.

1. Leader Election – Nodes start as followers; if a follower times out it becomes a candidate, requests votes via RequestVote RPCs, and becomes leader when it obtains a majority. The election timeout is randomized (150‑300 ms) to avoid split votes.

2. Log Replication – The elected leader receives client commands, creates log entries <term, index, cmd> , appends them to its own log, and replicates them to followers via AppendEntries RPCs. Once a majority acknowledge, the entry is committed and applied to the state machine.

3. Safety – Raft ensures that a leader never overwrites committed entries and that a follower’s log matches the leader’s up to the committed index. Terms act as logical clocks; nodes update their term numbers on receiving higher‑term messages.

Key Go implementations:

func (rf *Raft) RequestVote(request *RequestVoteRequest, response *RequestVoteResponse) {
    rf.mu.Lock()
    defer rf.mu.Unlock()
    defer rf.persist()
    // ... (logic omitted for brevity)
}

func (rf *Raft) StartElection() {
    request := rf.genRequestVoteRequest()
    grantedVotes := 1
    rf.votedFor = rf.me
    rf.persist()
    for peer := range rf.peers {
        if peer == rf.me { continue }
        go func(peer int) {
            response := new(RequestVoteResponse)
            if rf.sendRequestVote(peer, request, response) {
                // handle response
            }
        }(peer)
    }
}

func (rf *Raft) AppendEntries(request *AppendEntriesRequest, response *AppendEntriesResponse) {
    rf.mu.Lock()
    defer rf.mu.Unlock()
    defer rf.persist()
    // ... (logic omitted for brevity)
}

Other consensus algorithms are briefly introduced:

Paxos – the original algorithm by Leslie Lamport, used in systems like Google’s Chubby.

ZAB – ZooKeeper’s atomic broadcast protocol, similar in flow to Raft.

PBFT – Practical Byzantine Fault Tolerance, suitable for permissioned blockchains.

In summary, Raft provides strong consistency, decentralization, and high availability by separating consensus into leader election and log replication, making it a widely adopted solution for building reliable distributed services.

distributed systemsRaftConsensus AlgorithmLeader ElectionPaxosLog ReplicationByzantine Generals
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.