Backend Development 20 min read

Consistent Hashing Algorithm: Principles, Java Implementation, and Optimizations for Distributed Cache Load Balancing

This article explains the fundamentals of consistent hashing, its application in load‑balancing distributed caches, analyzes common issues such as data skew and cache avalanche, introduces virtual nodes for uniform distribution, provides Java code examples, and compares it with Redis's HashSlot approach.

IT Architects Alliance

Aug 21, 2022

Consistent Hashing Algorithm: Principles, Java Implementation, and Optimizations for Distributed Cache Load Balancing

Consistent Hash is a special hashing algorithm widely used in load‑balancing scenarios such as Nginx and Memcached because of its balanced and stable mapping properties.

The article first reviews ordinary hash functions, showing how they scatter similar inputs into random outputs, and explains why a simple modulo‑based distribution fails when the number of nodes changes.

It then introduces the ring‑based consistent‑hash mechanism, which requires two mappings: placing each node on the ring by hashing its identifier, and locating the node for a given key by hashing the key and moving clockwise to the first node.

Common problems of the basic algorithm are discussed, including data skew when few nodes occupy a large hash space and cache avalanche when a node leaves, and the solution of adding many virtual nodes to make the distribution more uniform.

Java code examples are provided. The hash utility uses the FNV1_32_HASH algorithm:

public class HashUtil {
    /**
     * Compute hash value using FNV1_32_HASH algorithm
     */
    public static int getHash(String str) {
        final int p = 16777619;
        int hash = (int) 2166136261L;
        for (int i = 0; i < str.length(); i++) {
            hash = (hash ^ str.charAt(i)) * p;
        }
        hash += hash << 13;
        hash ^= hash >> 7;
        hash += hash << 3;
        hash ^= hash >> 17;
        hash += hash << 5;
        if (hash < 0) {
            hash = Math.abs(hash);
        }
        return hash;
    }
}

A simple implementation without virtual nodes stores node hashes in a TreeMap and finds the responsible node via tailMap:

public class ConsistentHashingWithoutVirtualNode {
    private static String[] groups = {"192.168.0.0:111", "192.168.0.1:111", "192.168.0.2:111", "192.168.0.3:111", "192.168.0.4:111"};
    private static SortedMap<Integer, String> sortedMap = new TreeMap<>();
    static {
        for (String group : groups) {
            int hash = HashUtil.getHash(group);
            System.out.println("[" + group + "] launched @ " + hash);
            sortedMap.put(hash, group);
        }
    }
    private static String getServer(String widgetKey) {
        int hash = HashUtil.getHash(widgetKey);
        SortedMap<Integer, String> subMap = sortedMap.tailMap(hash);
        if (subMap == null || subMap.isEmpty()) {
            return sortedMap.get(sortedMap.firstKey());
        }
        return subMap.get(subMap.firstKey());
    }
    public static void main(String[] args) {
        Map<String, Integer> resMap = new HashMap<>();
        for (int i = 0; i < 100000; i++) {
            Integer widgetId = (int) (Math.random() * 10000);
            String server = getServer(widgetId.toString());
            resMap.put(server, resMap.getOrDefault(server, 0) + 1);
        }
        resMap.forEach((k, v) -> System.out.println("group " + k + ": " + v + " (" + v/1000.0 + "% )"));
    }
}

The version with virtual nodes creates 1000 virtual nodes per real node, maps them to the ring, and resolves a key to its real node via the virtual‑node map:

public class ConsistentHashingWithVirtualNode {
    private static String[] groups = {"192.168.0.0:111", "192.168.0.1:111", "192.168.0.2:111", "192.168.0.3:111", "192.168.0.4:111"};
    private static List<String> realGroups = new LinkedList<>(Arrays.asList(groups));
    private static SortedMap<Integer, String> virtualNodes = new TreeMap<>();
    private static final int VIRTUAL_NODE_NUM = 1000;
    static {
        for (String realGroup : realGroups) {
            for (int i = 0; i < VIRTUAL_NODE_NUM; i++) {
                String virtualNodeName = getVirtualNodeName(realGroup, i);
                int hash = HashUtil.getHash(virtualNodeName);
                System.out.println("[" + virtualNodeName + "] launched @ " + hash);
                virtualNodes.put(hash, virtualNodeName);
            }
        }
    }
    private static String getVirtualNodeName(String realName, int num) {
        return realName + "&&VN" + num;
    }
    private static String getRealNodeName(String virtualName) {
        return virtualName.split("&&")[0];
    }
    private static String getServer(String widgetKey) {
        int hash = HashUtil.getHash(widgetKey);
        SortedMap<Integer, String> subMap = virtualNodes.tailMap(hash);
        String virtualNodeName = (subMap == null || subMap.isEmpty()) ? virtualNodes.get(virtualNodes.firstKey()) : subMap.get(subMap.firstKey());
        return getRealNodeName(virtualNodeName);
    }
    public static void main(String[] args) {
        Map<String, Integer> resMap = new HashMap<>();
        for (int i = 0; i < 100000; i++) {
            String group = getServer(Integer.toString(i));
            resMap.put(group, resMap.getOrDefault(group, 0) + 1);
        }
        resMap.forEach((k, v) -> System.out.println("group " + k + ": " + v + " (" + v/1000.0 + "% )"));
    }
}

Test results show that without virtual nodes the load distribution is uneven, while adding 1000 virtual nodes brings the distribution close to uniform. The article also presents code for adding/removing groups and refreshing the hash circle, demonstrating graceful scaling and the impact on key redistribution.

Further optimization topics include high‑frequency key pre‑warming, historical hash fallback during scaling, circuit‑breaker mechanisms for node overload, and handling load‑balancer synchronization delays.

Finally, the article compares consistent hashing with Redis’s HashSlot + P2P design, explaining how HashSlot maps 16384 slots to nodes, how slot migration works during scaling, and how gossip‑based P2P routing eliminates a single point of failure, while noting the added complexity.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

algorithm Distributed Cache consistent hashing virtual nodes

Written by

IT Architects Alliance

Discussion and exchange on system, internet, large‑scale distributed, high‑availability, and high‑performance architectures, as well as big data, machine learning, AI, and architecture adjustments with internet technologies. Includes real‑world large‑scale architecture case studies. Open to architects who have ideas and enjoy sharing.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.