Implementing Log Snapshotting in Raft: A Step‑by‑Step Guide
This article provides a comprehensive tutorial on adding log snapshotting (snapshotting) to a Raft‑based distributed key‑value store, explaining the motivation, the snapshot mechanism, and detailed Go code for generating, transferring, applying, and persisting snapshots to reduce log size and improve performance.
The article shares a tutorial on implementing log snapshotting (snapshotting) for the Raft consensus algorithm as part of a distributed key‑value store project based on MIT 6.824.
In Raft each node keeps a growing log; snapshotting replaces old log entries with a compact state (snapshot) that records the system at a specific index, thereby saving storage and reducing transmission overhead.
Snapshot mechanism steps include: (1) generate a snapshot on the leader, (2) transfer the snapshot to followers, (3) apply the snapshot to the follower’s state machine, (4) update the log to discard entries covered by the snapshot, and (5) persist the snapshot for crash recovery.
RaftLog structure :
type RaftLog struct {
snapLastIdx int
snapLastTerm int
// contains index [1, snapLastIdx]
snapshot []byte
// the first entry is `snapLastIdx`, but only contains the snapLastTerm
// the entries between (snapLastIdx, snapLastIdx+len(tailLog)-1] have real data
tailLog []LogEntry
}Snapshot creation method (Raft.Snapshot) with safety checks:
func (rf *Raft) Snapshot(index int, snapshot []byte) {
rf.mu.Lock()
defer rf.mu.Unlock()
LOG(rf.me, rf.currentTerm, DSnap, "Snap on %d", index)
if index > rf.commitIndex {
LOG(rf.me, rf.currentTerm, DSnap, "Couldn't snapshot before CommitIdx: %d>%d", index, rf.commitIndex)
return
}
if index <= rf.log.snapLastIdx {
LOG(rf.me, rf.currentTerm, DSnap, "Already snapshot in %d<=%d", index, rf.log.snapLastIdx)
return
}
rf.log.doSnapshot(index, snapshot)
// todo persist
}doSnapshot implementation updates metadata and rebuilds the tail log after discarding old entries:
func (rl *RaftLog) doSnapshot(index int, snapshot []byte) {
idx := rl.idx(index)
rl.snapLastTerm = rl.tailLog[idx].Term
rl.snapLastIdx = index
rl.snapshot = snapshot
newLog := make([]LogEntry, 0, rl.size()-rl.snapLastIdx)
newLog = append(newLog, LogEntry{Term: rl.snapLastTerm})
newLog = append(newLog, rl.tailLog[idx+1:]...)
rl.tailLog = newLog
}Leader sending a snapshot when a follower’s nextIndex is behind the leader’s snapshot:
func (rf *Raft) startReplication(term int) bool {
// ...
prevIdx := rf.nextIndex[peer] - 1
if prevIdx < rf.log.snapLastIdx {
args := &InstallSnapshotArgs{
Term: rf.currentTerm,
LeaderId: rf.me,
LastIncludedIndex: rf.log.snapLastIdx,
LastIncludedTerm: rf.log.snapLastTerm,
Snapshot: rf.log.snapshot,
}
LOG(rf.me, rf.currentTerm, DDebug, "-> S%d, InstallSnap, Args=%v", peer, args.String())
go installOnPeer(peer, term, args)
continue
}
// ...
}Follower handling InstallSnapshot RPC with term checks, duplicate‑snapshot detection, installation, and signaling the apply goroutine:
func (rf *Raft) InstallSnapshot(args *InstallSnapshotArgs, reply *InstallSnapshotReply) {
rf.mu.Lock()
defer rf.mu.Unlock()
LOG(rf.me, rf.currentTerm, DDebug, "<- S%d, RecvSnap, Args=%v", args.LeaderId, args.String())
reply.Term = rf.currentTerm
if args.Term < rf.currentTerm {
LOG(rf.me, rf.currentTerm, DSnap, "<- S%d, Reject Snap, Higher Term, T%d>T%d", args.LeaderId, rf.currentTerm, args.Term)
return
}
if args.Term >= rf.currentTerm {
rf.becomeFollowerLocked(args.Term)
}
if rf.log.snapLastIdx >= args.LastIncludedIndex {
LOG(rf.me, rf.currentTerm, DSnap, "<- S%d, Reject Snap, Already installed, Last: %d>=%d", args.LeaderId, rf.log.snapLastIdx, args.LastIncludedIndex)
return
}
rf.log.installSnapshot(args.LastIncludedIndex, args.LastIncludedTerm, args.Snapshot)
rf.applyCond.Signal()
}Applying a snapshot in the apply loop distinguishes between normal log entries and a pending snapshot, sending the appropriate ApplyMsg and updating lastApplied and commitIndex :
// inside applicationTicker()
if snapPendingApply {
rf.applyCh <- ApplyMsg{
SnapshotValid: true,
Snapshot: rf.log.snapshot,
SnapshotIndex: rf.log.snapLastIdx,
SnapshotTerm: rf.log.snapLastTerm,
}
} else {
// send normal log entries
}Persistence : After creating or installing a snapshot, the Raft state and the snapshot byte slice are saved together via persister.Save to survive crashes.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.