Backend Development 9 min read

Consistent Hashing: Principles, Monotonicity, and Virtual Nodes

This article explains the consistent hashing algorithm, its motivation in cache systems, the concept of monotonicity, the step‑by‑step mapping process, and how virtual nodes improve balance and resilience when cache servers are added or removed.

Qunar Tech Salon

Dec 29, 2014

Consistent Hashing: Principles, Monotonicity, and Virtual Nodes

Consistent hashing was first introduced in the 1997 paper "Consistent hashing and random trees" and is now widely used in cache systems.

1 Basic Scenario

When you have N cache servers, a naive approach maps an object to a server using hash(object) % N. This works until a server fails (reducing N) or a new server is added (increasing N), causing massive remapping of objects.

2 Hash Algorithm and Monotonicity

Monotonicity requires that when new caches are added, existing objects remain mapped to their original caches unless necessary. The simple modulo hash fails this requirement.

3 Consistent Hashing Algorithm Principles

Consistent hashing maps both objects and caches onto the same circular hash space (0 to 2^32‑1). Objects are placed on the ring by their hash values, and each cache is also placed on the ring.

3.1 Ring Hash Space

The 32‑bit hash space is visualized as a circle where 0 and 2^32‑1 are adjacent.

3.2 Mapping Objects to the Ring

Example objects object1‑object4 receive hash keys key1‑key4:

hash(object1) = key1;

… …

hash(object4) = key4;

3.3 Mapping Caches to the Ring

Each cache (A, B, C) is hashed to a position on the same ring:

hash(cache A) = keyA;

… …

hash(cache C) = keyC;

3.4 Mapping Objects to Caches

Starting from an object's key and moving clockwise, the first cache encountered stores the object. This yields a deterministic mapping (e.g., object1 → cache A, object2/3 → cache C, object4 → cache B).

3.5 Cache Changes

When a cache fails, only objects that map between the failed cache and the next clockwise cache need to be remapped. When a new cache is added, only objects that fall between the new cache and its predecessor are moved.

3.5.1 Removing a Cache

If cache B fails, only object4 (which was between B and C) is remapped to cache C.

3.5.2 Adding a Cache

Adding cache D between object2 and object3 causes only object2 to move to D.

4 Virtual Nodes

To improve balance, each physical cache is represented by multiple virtual nodes (replicas) placed at different points on the ring. For example, with two replicas per cache, cache A becomes A1 and A2, and cache C becomes C1 and C2, leading to a more even distribution of objects.

5 Summary

The core idea of consistent hashing is to map both objects and caches onto a circular hash space, using virtual nodes to achieve better balance and minimal remapping when caches are added or removed.

Reference implementations are available in Java, PHP, and C (links provided in the original article).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems load balancing consistent hashing virtual nodes

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.