Understanding the Underlying Principles of Java HashMap
This article explains how Java's HashMap stores key‑value pairs using an array‑plus‑linked‑list structure, details the hash function, index calculation, collision handling, resizing, and the conditions under which linked lists are replaced by red‑black trees, illustrated with concrete code examples.
Preface
HashMap is a high‑frequency key‑value data structure in Java; this article explains its design principles.
Problem Description
When a custom hashCode() implementation is poor, HashMap performance can degrade dramatically. Two code examples illustrate the effect.
public static void main(String[] args) {
long start = System.currentTimeMillis();
int n = 10000;
Map
map = new HashMap<>();
for (int i = 0; i < n; i++) {
map.put(new Key(i), i);
}
for (int i = 0; i < n; i++) {
Integer v = map.get(i);
}
System.out.println(System.currentTimeMillis() - start); // average ~1400ms
}
@AllArgsConstructor
public static class Key {
private Integer id; // key id
@Override
public int hashCode() {
return 1; // terrible hash, forces collisions
}
// equals omitted
} public static void main(String[] args) {
long start = System.currentTimeMillis();
int n = 10000;
Map
map = new HashMap<>();
for (int i = 0; i < n; i++) {
map.put(new Key(i), i);
}
for (int i = 0; i < n; i++) {
Integer v = map.get(i);
}
System.out.println(System.currentTimeMillis() - start); // average ~10ms
}
@AllArgsConstructor
public static class Key {
private Integer id;
@Override
public int hashCode() {
return Objects.hash(id); // proper hash
}
// equals omitted
}The first program takes about 1400 ms for 10 000 entries, while the second finishes in roughly 10 ms, showing the impact of a good hash function.
HashMap Internal Mechanics
HashMap combines an array with linked lists (or trees) to store entries. The core fields are:
public class HashMap
{
// ... other code omitted
Node
[] table; // the bucket array
// ... other code omitted
}
class Node
{
final int hash; // hash of the key
final K key; // key
V value; // value
Node
next; // next node in the same bucket (linked list)
}The array provides O(1) random access; the hash function maps a key to an index in this array.
Hash Function Design
Java objects have a hashCode() method that returns an int . HashMap refines this value to improve distribution:
int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}The high 16 bits are XOR‑ed with the low 16 bits so that both parts influence the final hash, reducing collisions.
Mapping a Key to an Array Index
After obtaining the refined hash, the index is computed with a bitwise AND, not a modulo operation:
int hash = hash(key);
int n = table.length;
int i = (n - 1) & hash; // index in the bucket array
Node
node = table[i];Because the array length is always a power of two, n‑1 has a binary form of all low bits set to 1, ensuring the result is always within the array bounds and distributing keys uniformly.
Hash Collisions
When multiple keys map to the same bucket, a linked list (or later a red‑black tree) stores the colliding entries. This is analogous to the pigeonhole principle.
Put Operation
The put method follows these steps:
Compute hash(key) . Calculate the bucket index with (n‑1) & hash . If the bucket is empty, store a new Node . If not, traverse the linked list (or tree) to either update an existing key or append a new node.
// Equality check inside put
if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) {
// key already exists, replace value
}Resizing (Expansion)
HashMap expands when the load factor (size / capacity) reaches a threshold (default 0.75). The capacity is always a power of two, so resizing doubles the array size.
loadFactor = numberOfEntries / table.length;
// When loadFactor >= 0.75, resize to 2 * oldCapacityDuring resize, each existing node’s hash is already stored, so re‑hashing is unnecessary; only the index is recomputed using the new capacity. Nodes whose high‑order bits are 0 stay in the same bucket; those with a 1 move to oldIndex + oldCapacity .
Treeification
If a bucket’s linked list exceeds 8 nodes and the total map size is at least 64, the list is converted into a red‑black tree, giving O(log n) lookup time. When the tree shrinks below the threshold, it reverts to a linked list.
Answer to the Opening Question
A key’s hashCode() determines its bucket index; a constant or poorly distributed hash forces many entries into the same bucket, causing severe performance degradation. Therefore, a good, well‑distributed hashCode() is essential for HashMap efficiency.
Conclusion
This article summarized the internal implementation and design rationale of Java's HashMap, covering hash calculation, index mapping, collision handling, resizing, and the conditions for converting linked lists to red‑black trees.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.