Fundamentals 20 min read

Understanding the Underlying Principles of Java HashMap

This article explains how Java's HashMap stores key‑value pairs using an array‑plus‑linked‑list structure, details the hash function, index calculation, collision handling, resizing, and the conditions under which linked lists are replaced by red‑black trees, illustrated with concrete code examples.

Sohu Tech Products

Oct 7, 2022

Understanding the Underlying Principles of Java HashMap

Preface

HashMap is a high‑frequency key‑value data structure in Java; this article explains its design principles.

Problem Description

When a custom hashCode() implementation is poor, HashMap performance can degrade dramatically. Two code examples illustrate the effect.

public static void main(String[] args) {
    long start = System.currentTimeMillis();
    int n = 10000;
    Map<Key, Integer> map = new HashMap<>();
    for (int i = 0; i < n; i++) {
        map.put(new Key(i), i);
    }
    for (int i = 0; i < n; i++) {
        Integer v = map.get(i);
    }
    System.out.println(System.currentTimeMillis() - start); // average ~1400ms
}

@AllArgsConstructor
public static class Key {
    private Integer id; // key id
    @Override
    public int hashCode() {
        return 1; // terrible hash, forces collisions
    }
    // equals omitted
}

public static void main(String[] args) {
    long start = System.currentTimeMillis();
    int n = 10000;
    Map<Key, Integer> map = new HashMap<>();
    for (int i = 0; i < n; i++) {
        map.put(new Key(i), i);
    }
    for (int i = 0; i < n; i++) {
        Integer v = map.get(i);
    }
    System.out.println(System.currentTimeMillis() - start); // average ~10ms
}

@AllArgsConstructor
public static class Key {
    private Integer id;
    @Override
    public int hashCode() {
        return Objects.hash(id); // proper hash
    }
    // equals omitted
}

The first program takes about 1400 ms for 10 000 entries, while the second finishes in roughly 10 ms, showing the impact of a good hash function.

HashMap Internal Mechanics

HashMap combines an array with linked lists (or trees) to store entries. The core fields are:

public class HashMap<K,V> {
    // ... other code omitted
    Node<K,V>[] table; // the bucket array
    // ... other code omitted
}

class Node<K,V> {
    final int hash; // hash of the key
    final K key;    // key
    V value;       // value
    Node<K,V> next; // next node in the same bucket (linked list)
}

The array provides O(1) random access; the hash function maps a key to an index in this array.

Hash Function Design

Java objects have a hashCode() method that returns an int. HashMap refines this value to improve distribution:

int hash(Object key) {
    int h;
    return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

The high 16 bits are XOR‑ed with the low 16 bits so that both parts influence the final hash, reducing collisions.

Mapping a Key to an Array Index

After obtaining the refined hash, the index is computed with a bitwise AND, not a modulo operation:

int hash = hash(key);
int n = table.length;
int i = (n - 1) & hash; // index in the bucket array
Node<K,V> node = table[i];

Because the array length is always a power of two, n‑1 has a binary form of all low bits set to 1, ensuring the result is always within the array bounds and distributing keys uniformly.

Hash Collisions

When multiple keys map to the same bucket, a linked list (or later a red‑black tree) stores the colliding entries. This is analogous to the pigeonhole principle.

Put Operation

The put method follows these steps:

Compute hash(key). Calculate the bucket index with (n‑1) & hash . If the bucket is empty, store a new Node . If not, traverse the linked list (or tree) to either update an existing key or append a new node.

// Equality check inside put
if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) {
    // key already exists, replace value
}

Resizing (Expansion)

HashMap expands when the load factor (size / capacity) reaches a threshold (default 0.75). The capacity is always a power of two, so resizing doubles the array size.

loadFactor = numberOfEntries / table.length;
// When loadFactor >= 0.75, resize to 2 * oldCapacity

During resize, each existing node’s hash is already stored, so re‑hashing is unnecessary; only the index is recomputed using the new capacity. Nodes whose high‑order bits are 0 stay in the same bucket; those with a 1 move to oldIndex + oldCapacity.

Treeification

If a bucket’s linked list exceeds 8 nodes and the total map size is at least 64, the list is converted into a red‑black tree, giving O(log n) lookup time. When the tree shrinks below the threshold, it reverts to a linked list.

Answer to the Opening Question

A key’s hashCode() determines its bucket index; a constant or poorly distributed hash forces many entries into the same bucket, causing severe performance degradation. Therefore, a good, well‑distributed hashCode() is essential for HashMap efficiency.

Conclusion

This article summarized the internal implementation and design rationale of Java's HashMap, covering hash calculation, index mapping, collision handling, resizing, and the conditions for converting linked lists to red‑black trees.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

HashMap hash function Red-Black Tree Collision

Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.