Fundamentals 15 min read

Why Java's String.hashCode Uses 31 as Multiplier: Theory and Experiments

This article explores the rationale behind Java's String.hashCode method using the multiplier 31, detailing its implementation, mathematical justification, performance optimizations, and experimental analysis of hash collision rates with various multipliers, concluding why 31 is an optimal choice.

Architect's Tech Stack
Architect's Tech Stack
Architect's Tech Stack
Why Java's String.hashCode Uses 31 as Multiplier: Theory and Experiments

When reading the source of String.hashCode() a curious constant 31 appears as the multiplication factor. The article investigates why this particular number was chosen and what impact it has on hash distribution and performance.

Implementation

The method is implemented as follows:

public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char[] val = value;
        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

The core of the algorithm is the loop that repeatedly multiplies the current hash by 31 and adds the next character code.

Reason 1 – Prime Multiplier

31 is a small odd prime. Using a prime as a multiplier helps to spread hash values more uniformly, reducing the likelihood of collisions compared with even numbers or very small primes such as 2. Larger primes (e.g., 101) can cause integer overflow, which may discard useful information.

Reason 2 – JVM Optimization

Multiplication by 31 can be replaced by a shift‑and‑subtract operation: 31 * i == (i << 5) - i . Modern JVMs perform this optimization automatically, making the operation cheap while preserving the benefits of a prime multiplier.

Experimental Verification

The author wrote a benchmark that hashes over 230,000 English words from the Unix /usr/share/dict/words file using different multipliers (2, 3, 17, 31, 37, 41, 101, etc.). For each multiplier the program computes:

Minimum and maximum hash values

Total number of collisions

Collision rate (percentage)

Distribution of hash values across 64 equal partitions of the 32‑bit integer space

Key code fragments used in the experiment:

public static Integer hashCode(String str, Integer multiplier) {
    int hash = 0;
    for (int i = 0; i < str.length(); i++) {
        hash = multiplier * hash + str.charAt(i);
    }
    return hash;
}

public static void calculateConflictRate(Integer multiplier, List
hashs) {
    // compute min, max, distinct count, conflict number, conflict rate
    // output formatted result
    System.out.println(String.format("multiplier=%4d, minHash=%11d, maxHash=%10d, conflictNum=%6d, conflictRate=%.4f%%",
        multiplier, minHash, maxHash, conflictNum, conflictRate * 100));
}

Results

Multipliers 2 and 3 produce extremely high collision rates (>55%) and concentrate hash values in a single partition. Multipliers 17 and 31 achieve low collision rates (well below 1%) and a more even distribution across partitions. Larger primes such as 101 also give good distribution but risk overflow, which the author considers undesirable for a general‑purpose hash function.

Conclusion

The combination of being a small odd prime and allowing a cheap shift‑and‑subtract implementation makes 31 an excellent default multiplier for String.hashCode() . It balances good distribution, low collision probability, and efficient execution, which explains its long‑standing use in the Java standard library.

JavaperformanceHashCodecollision analysisprime multiplier
Architect's Tech Stack
Written by

Architect's Tech Stack

Java backend, microservices, distributed systems, containerized programming, and more.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.