Consistent Hashing Explained Through the Tale of Han Xin’s Soldier Allocation
The article uses a historical story to illustrate basic hash‑based partitioning, its shortcomings when nodes change, and then introduces consistent hashing, hash rings, and virtual nodes as solutions for balanced load distribution and minimal data migration in distributed systems.
Using a fictional dialogue between Liu Bang, Han Xin, and Xiao He, the article introduces the problem of evenly distributing a large number of soldiers (data items) into groups (nodes) and explains how a simple hash algorithm works by taking the modulo of a soldier's numeric ID with the number of groups.
Han Xin's first skill: Hash algorithm Each soldier's six‑digit number is treated as a hash value; the remainder after dividing by the group count determines the group. This method is simple but causes massive data reshuffling when the number of groups changes.
One Skill: Hash Algorithm
Grouping
The hash algorithm assigns a soldier to a group by num % N , where N is the number of groups. For example, soldier 123456 belongs to group 0 because 123456 % 3 = 0.
Finding a Soldier
To locate soldier 666666, compute 666666 % 3 = 0, indicating the first group, then search within that group.
Drawbacks of Simple Hashing
When the group count changes (e.g., from 3 to 4), many soldiers are reassigned to different groups, breaking existing relationships and causing large data migration.
Second Skill: Consistent Hashing
Hash Ring
Consistent hashing also uses modulo operations but maps the entire 2^32 hash space onto a virtual ring. Nodes are placed on the ring, and each data item is assigned to the first node encountered clockwise.
Hash algorithm : modulo the number of nodes.
Consistent hash algorithm : modulo 2^32, creating a ring.
Soldier Allocation on the Ring
With three nodes (A, B, C) the ring is divided into three intervals (C‑A, A‑B, B‑C). A soldier whose hash falls into C‑A is stored on node A, and so on.
Adding a New Group
When a fourth node D is added, only the interval that D now covers (e.g., B‑D) needs to migrate its data from the previous node, while the rest of the data remains untouched, dramatically reducing migration.
Hash Ring Drawbacks
If nodes are few, the intervals on the ring can be uneven, leading to hot‑spot and cold‑spot problems in real‑world services.
Third Skill: Virtual Nodes
Virtual nodes are multiple logical points for each physical node, evenly spread around the ring. For example, 12 virtual nodes N1‑N12 are mapped to physical nodes A, B, C, D, ensuring a more uniform distribution and balanced load.
With virtual nodes, the previously small B‑D interval is split among several virtual nodes, making the data distribution much more even.
Summary
Simple hash algorithms cause large data migration when nodes are added or removed.
Consistent hashing reduces migration by assigning data based on a ring.
Few nodes lead to uneven interval sizes and load imbalance.
Virtual node mapping solves the uneven distribution problem.
More nodes mean less data to move; consistent hashing is ideal for load‑balancing scenarios.
Wukong Talks Architecture
Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.