Fundamentals 54 min read

Fundamentals of Distributed Systems: Concepts, Replication, Consistency, and Core Protocols

This article provides a comprehensive overview of distributed system fundamentals, covering concepts such as nodes, replicas, consistency models, data distribution strategies, lease and quorum mechanisms, two‑phase commit, MVCC, Paxos, and the CAP theorem, while discussing their practical engineering trade‑offs and failure handling.

IT Architects Alliance
IT Architects Alliance
IT Architects Alliance
Fundamentals of Distributed Systems: Concepts, Replication, Consistency, and Core Protocols

1 Concepts

1.1 Model

A node is typically an OS process; in the model it is treated as an indivisible whole, though a process may be split into multiple logical nodes if needed.

1.2 Replica

Replica (copy) provides redundancy for data or services. Data replicas store identical data on different nodes to survive node failures; service replicas provide the same service without relying on local storage.

1.3 Consistency Models

Consistency ensures that reads from different replicas return the same data under defined constraints. Types include:

Strong consistency – every read sees the most recent successful write.

Monotonic consistency – a client never reads older data after seeing newer data.

Session consistency – monotonic guarantees within a single client session.

Eventual consistency – replicas converge to the same state eventually, without a bound.

Weak consistency – no guarantee on when a read reflects a recent write.

2 Distributed System Principles

2.1 Data Distribution Methods

Three common strategies:

Hash‑based distribution – simple but suffers from poor scalability and data skew.

Range‑based distribution – partitions data by key ranges, often using dynamic splitting to balance load.

Chunk‑based (size‑based) distribution – splits a logical file into equal‑sized chunks, avoiding data skew.

Consistent hashing improves hash‑based scaling by placing virtual nodes on a ring, allowing smooth node addition/removal.

Replica Placement

Replica placement can be machine‑centric (simple but low scalability) or segment‑centric (replicating data segments across many machines for faster recovery and load balancing).

2.2 Basic Replica Protocols

Two families:

Centralized protocols – a single coordinator (e.g., primary‑secondary) handles ordering, concurrency, and consistency, but the system’s availability depends on the coordinator.

Decentralized protocols – all nodes are peers (e.g., Paxos), providing higher fault tolerance at the cost of complexity.

Primary‑Secondary Protocol

Updates flow: client → primary → (concurrency control) → secondary replicas → client response. Reads can be from any replica (eventual consistency) or only the primary (strong consistency). Primary election and failover require metadata servers and typically incur a detection window (~10 s).

2.3 Lease Mechanism

Leases grant a time‑bounded permission to cache data. While a lease is valid, the server promises not to modify the data, allowing safe local caching. After lease expiry, caches are invalidated, enabling safe updates without coordination.

2.4 Quorum Mechanism

Writes succeed after being persisted on W out of N replicas; reads query R replicas where R > N‑W, guaranteeing that at least one read replica has the latest committed version. Quorum balances consistency, availability, and performance.

2.5 Log Techniques

Redo logs record the result of each update before applying it to memory, enabling crash recovery by replaying logs. Checkpointing periodically dumps the full state to disk and records begin/end markers, reducing replay work.

2.6 Two‑Phase Commit (2PC)

A classic centralized protocol with a coordinator and participants. Phase 1: coordinator asks participants to prepare; Phase 2: if all vote commit, the coordinator sends a global‑commit, otherwise a global‑abort. 2PC suffers from poor fault tolerance and high latency.

2.7 MVCC (Multi‑Version Concurrency Control)

Each transaction creates a new data version; reads select the appropriate version, allowing concurrent reads and writes while preserving snapshot isolation.

2.8 Paxos Protocol

Strongly consistent, decentralized consensus algorithm. A proposer issues a proposal with a monotonically increasing ballot number; acceptors promise not to accept lower ballots and later accept the value if a majority (N/2 + 1) agree. Learners observe accepted values.

2.9 CAP Theorem

In any distributed system, you can only simultaneously achieve two of the three properties: Consistency, Availability, and Partition tolerance. Different protocols make different trade‑offs (e.g., Lease favors C + P, Quorum balances all three, 2PC sacrifices A and P, Paxos offers C with good A and P).

Additional Resources

Links to further reading on multi‑level caching, architecture separation, cloud case studies, DDD practice, and more.

distributed systemsCAP theoremReplicationconsistencyconsensus protocols
IT Architects Alliance
Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.