Fundamentals 12 min read

Key Distributed System Concepts: Bloom Filter, Consistent Hashing, Quorum, Leader/Follower, and More

This article explains essential distributed‑system concepts such as Bloom filters, consistent hashing, quorum, leader/follower roles, heartbeats, fencing, write‑ahead logs, segmented logs, high‑water marks, leases, gossip protocols, Phi accrual failure detection, split‑brain handling, checksums, the CAP and PACELC theorems, hinted handoff, read repair, and Merkle trees, illustrating each with practical examples and diagrams.

Architecture Digest
Architecture Digest
Architecture Digest
Key Distributed System Concepts: Bloom Filter, Consistent Hashing, Quorum, Leader/Follower, and More

Bloom Filter : A space‑efficient probabilistic data structure used to test whether an element is a member of a set, reducing disk reads in systems like BigTable and Cassandra by quickly filtering out non‑existent keys.

Consistent Hashing : Allows easy scaling by hashing data item keys onto a ring and assigning each item to the first node clockwise, providing incremental stability where only neighboring nodes are affected by node joins or leaves.

Quorum : The minimum number of servers that must successfully complete an operation before it is considered successful in a distributed environment; used by Cassandra (write quorum), Chubby (Paxos), and Dynamo (loose quorum).

Leader and Follower : To achieve fault tolerance, a cluster elects a leader that makes decisions for the whole system and propagates them to followers; the system does not accept client requests until a leader is elected.

Heartbeat : A mechanism that periodically checks whether the current leader is alive, triggering a new election if the leader fails.

Fencing : Prevents a failed leader that might still be running from accessing cluster resources; implemented via resource fencing (blocking access to critical resources) or node fencing (power‑off or reset).

Write‑Ahead Log (WAL) : Inspired by databases, it records an operation’s summary to a log before writing to disk, enabling recovery after crashes by replaying the log.

Segmented Log : Splits a large log file into smaller segments to avoid performance bottlenecks and simplify cleanup; logs roll over after reaching a size limit.

High‑Water Mark : Tracks the last log entry on the leader that has been replicated to a quorum of followers; Kafka uses it to expose only committed messages to consumers.

Lease : A lock‑like mechanism with a limited lifetime; clients can renew the lease before it expires. Chubby uses leases to maintain timed sessions with the leader.

Gossip Protocol : A peer‑to‑peer communication method where each node periodically exchanges state information with a random peer, disseminating updates throughout the cluster.

Phi Accrual Failure Detection : An adaptive algorithm that outputs a suspicion level rather than a binary up/down status; Cassandra employs it to assess node health.

Split‑Brain : Occurs when multiple leaders are active simultaneously; resolved using a monotonically increasing generation clock (epoch) to identify the true leader, as seen in Kafka and HDFS.

Checksum : Detects data corruption during transfer by storing a cryptographic hash (MD5, SHA‑1, SHA‑256, etc.) alongside the data and verifying it on retrieval; used by HDFS and Chubby.

CAP Theorem : States that a distributed system can provide at most two of Consistency, Availability, and Partition tolerance; Dynamo is an AP system, while BigTable is CP.

PACELC Theorem : Extends CAP by adding latency vs. consistency trade‑offs when no partition occurs (EL); it explains why systems balance latency and consistency in normal operation.

Hinted Handoff : When a node is down, the leader stores missed requests as hints; once the node recovers, the leader forwards the stored writes.

Read Repair : During a read, the system compares replicas and updates stale copies, ensuring eventual consistency; used by Cassandra and Dynamo.

Merkle Trees : Binary hash trees where each internal node is the hash of its children; they enable efficient anti‑entropy checks by comparing root hashes and recursing only on differing sub‑trees.

distributed systemsCAP theoremBloom Filterconsistent hashingMerkle treeLeader ElectionQuorum
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.