Consistent Hashing Algorithm: Principles, Advantages, and Applications
Consistent hashing maps both servers and keys onto a virtual hash ring, allowing keys to be assigned to the nearest clockwise server, which minimizes data movement during node addition or removal, improves load distribution with virtual nodes, and is widely used in distributed caching and load balancing systems.
Scenario Construction
Assume three cache servers named node0 , node1 , and node2 and 30 million key objects that need to be evenly distributed across the three machines.
The simplest solution is the modulo algorithm hash(key) % N , where N is the number of servers. This directly maps each key to one of the three nodes, but it suffers when the number of servers changes.
Problems with Simple Modulo
When a server is added or removed, the expression hash(key) % N yields different results, causing many keys to be remapped. For example, if a server fails and N changes from 3 to 2, most cached keys would need to be relocated, leading to cache avalanche and service disruption.
Consistent Hashing
Consistent hashing also uses a modulo operation, but it takes the modulo of a fixed space of 2^32 instead of the number of servers.
“IPv4 addresses consist of 4 groups of 8‑bit binary numbers, so using 2^32 guarantees a unique mapping for each IP address.”
The 2^32 values are imagined as points on a circular hash ring. Each server’s IP address is hashed and placed on the ring, and each key is also hashed onto the same ring.
Mapping Servers to the Hash Ring
The servers node0 , node1 , and node2 are each hashed (e.g., hash(server_ip) % 2^32 ) and placed on the ring.
Mapping Keys to the Hash Ring
Each key is hashed with hash(key) % 2^32 . To find the server for a key, move clockwise on the ring from the key’s position until the first server is encountered; that server stores the key.
“Starting from the key’s position on the ring, the first server encountered clockwise is the server that will cache the object.”
key-1 -> node-1
key-3 -> node-2
key-4 -> node-2
key-5 -> node-2
key-2 -> node-0
Advantages of Consistent Hashing
When adding a new server (e.g., node-4 ), only the keys that fall between the new server and its predecessor on the ring need to be remapped, affecting a small portion of data. Similarly, if a server fails, only the keys that were mapped to the failed server are reassigned to the next clockwise server, limiting the impact.
Data Skew Problem
With few servers, uneven distribution on the ring can cause data skew, where most keys concentrate on a single node, leading to resource imbalance.
Virtual Nodes
To mitigate skew, each physical server is represented by multiple virtual nodes on the ring. For example, node-1#1 , node-1#2 , node-1#3 are hashed separately, spreading the load more evenly.
The mapping then becomes key -> virtual node -> real node .
Application Scenarios
Consistent hashing is the preferred algorithm for load balancing in distributed systems and is used in cache middleware such as memcached and redis . It also appears in RPC frameworks (e.g., Dubbo), distributed databases, LVS load balancers, and other systems.
Summary
Consistent hashing provides a scalable way to distribute keys across nodes with minimal data movement during topology changes, though it can incur higher lookup cost with very large rings and introduces a single point of failure if the routing service itself is not highly available.
IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.