Key Distributed System Design Patterns and Concepts
This article introduces essential distributed system design patterns such as Bloom filters, consistent hashing, quorum, leader‑follower architecture, heartbeat, fencing, write‑ahead logs, segmented logs, high‑water marks, leases, gossip protocol, Phi failure detection, split‑brain handling, checksums, CAP and PACELC theorems, hinted handoff, read repair, and Merkle trees, explaining their purpose and operation.
1. Bloom Filter
Bloom filter is a space‑efficient probabilistic data structure used to test whether an element is a member of a set, useful when only membership checks are needed.
In systems like BigTable (and Cassandra), every read must fetch data from SSTables; if SSTables are not in memory, many disk accesses occur. Bloom filters reduce these disk reads.
2. Consistent Hashing
Consistent hashing enables easy scaling by hashing data item keys to positions on a ring and assigning each item to the first node encountered clockwise, providing incremental stability where only neighboring nodes are affected by node joins or departures.
3. Quorum
In a distributed environment, a quorum is the minimum number of servers that must successfully complete an operation before it is considered successful.
Cassandra can be configured to succeed only after a write reaches at least one quorum of replica nodes; leader election systems like Chubby use Paxos with quorum for strong consistency; Dynamo uses a “sloppy quorum” for writes.
4. Leader and Follower
To achieve fault tolerance, data is replicated across multiple servers, and one server is elected as the leader to make decisions and propagate them to followers. In a 3‑5 node cluster, leader election occurs at startup, and the system rejects client requests until a leader is chosen.
5. Heartbeat
The heartbeat mechanism detects leader failure so a new leader election can be triggered.
6. Fencing
When a leader fails, fencing prevents the old leader from accessing cluster resources, using either resource fencing (blocking access to essential resources) or node fencing (power‑off or reset).
7. Write‑Ahead Log (WAL)
WAL records operations in a log before applying them to disk, inspired by database systems, allowing recovery after crashes by replaying the log.
8. Segmented Log
Logs are split into multiple smaller files (segments) to avoid performance bottlenecks of a single large log file; old segments are periodically cleaned.
9. High‑Water Mark
The high‑water mark tracks the last log entry replicated to a quorum of followers; only entries up to this index are exposed to clients. Kafka uses it to ensure consumers see only committed messages.
10. Lease
A lease acts like a lock with a limited lifetime; clients must renew it before expiration. Chubby uses leases to guarantee that a leader cannot unilaterally terminate a session.
11. Gossip Protocol
Gossip is a peer‑to‑peer communication mechanism where each node periodically exchanges state information with a random peer.
12. Phi Accrual Failure Detection
This algorithm adapts its failure detection threshold based on historical heartbeat intervals, outputting a suspicion level rather than a binary up/down status; Cassandra uses it to assess node health.
13. Split‑Brain
Split‑brain occurs when multiple active leaders exist; a monotonically increasing generation clock helps nodes agree on the leader with the highest number.
Kafka uses an epoch number, and ZooKeeper ensures a single active NameNode in HDFS.
14. Checksum
Checksums (e.g., MD5, SHA‑1, SHA‑256) verify data integrity during transfer; systems like HDFS and Chubby store checksums alongside data.
15. CAP Theorem
CAP states that a distributed system can provide at most two of the three guarantees: Consistency, Availability, Partition tolerance. Dynamo is AP, BigTable is CP.
16. PACELC Theorem
Extends CAP: when a partition occurs (P) trade off between Availability and Consistency (A vs C); otherwise (E) trade off between Latency and Consistency (L vs C).
17. Hinted Handoff
If a node is down, the leader stores missed requests as hints; when the node recovers, the leader forwards the stored requests.
18. Read Repair
During reads, the system compares replicas and pushes newer data to nodes with stale copies; Cassandra and Dynamo use this mechanism.
19. Merkle Trees
Merkle trees are hash‑based binary trees that enable efficient comparison of large data sets by comparing root hashes and recursively checking mismatched sub‑trees, used by Dynamo for anti‑entropy.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.