Consistent Hashing: Principles, Monotonicity, and Virtual Nodes
This article explains the consistent hashing algorithm, its motivation in cache systems, the concept of monotonicity, the step‑by‑step mapping process, and how virtual nodes improve balance and resilience when cache servers are added or removed.
Consistent hashing was first introduced in the 1997 paper "Consistent hashing and random trees" and is now widely used in cache systems.
1 Basic Scenario
When you have N cache servers, a naive approach maps an object to a server using hash(object) % N . This works until a server fails (reducing N) or a new server is added (increasing N), causing massive remapping of objects.
2 Hash Algorithm and Monotonicity
Monotonicity requires that when new caches are added, existing objects remain mapped to their original caches unless necessary. The simple modulo hash fails this requirement.
3 Consistent Hashing Algorithm Principles
Consistent hashing maps both objects and caches onto the same circular hash space (0 to 2^32‑1). Objects are placed on the ring by their hash values, and each cache is also placed on the ring.
3.1 Ring Hash Space
The 32‑bit hash space is visualized as a circle where 0 and 2^32‑1 are adjacent.
3.2 Mapping Objects to the Ring
Example objects object1‑object4 receive hash keys key1‑key4:
hash(object1) = key1; … … hash(object4) = key4;3.3 Mapping Caches to the Ring
Each cache (A, B, C) is hashed to a position on the same ring:
hash(cache A) = keyA; … … hash(cache C) = keyC;3.4 Mapping Objects to Caches
Starting from an object's key and moving clockwise, the first cache encountered stores the object. This yields a deterministic mapping (e.g., object1 → cache A, object2/3 → cache C, object4 → cache B).
3.5 Cache Changes
When a cache fails, only objects that map between the failed cache and the next clockwise cache need to be remapped. When a new cache is added, only objects that fall between the new cache and its predecessor are moved.
3.5.1 Removing a Cache
If cache B fails, only object4 (which was between B and C) is remapped to cache C.
3.5.2 Adding a Cache
Adding cache D between object2 and object3 causes only object2 to move to D.
4 Virtual Nodes
To improve balance, each physical cache is represented by multiple virtual nodes (replicas) placed at different points on the ring. For example, with two replicas per cache, cache A becomes A1 and A2, and cache C becomes C1 and C2, leading to a more even distribution of objects.
5 Summary
The core idea of consistent hashing is to map both objects and caches onto a circular hash space, using virtual nodes to achieve better balance and minimal remapping when caches are added or removed.
Reference implementations are available in Java, PHP, and C (links provided in the original article).
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.