Fundamentals 10 min read

Mastering Hash Tables: Concepts, Collision Resolution, and Popular Hash Functions

This article explains the fundamentals of hash tables, covering their basic concepts, construction methods, various collision‑resolution techniques such as open addressing and chaining, and compares several widely used hash functions with performance insights.

Tencent IMWeb Frontend Team
Tencent IMWeb Frontend Team
Tencent IMWeb Frontend Team
Mastering Hash Tables: Concepts, Collision Resolution, and Popular Hash Functions

In JavaScript, an object is essentially a hash table. To truly understand data structures, we can implement a hash table from scratch.

Basic Concepts

A hash table is a data structure that maps keys directly to memory locations. The mapping function is called a hash function.

Construction Method

Assume we need to store n elements. Allocate an array of length m (m > n). For each key Ki, compute address hash(Ki) and store the element at that address.

From a mathematical perspective, a good hash function should be simple and efficient, have good distribution to reduce collisions, and be compressive to save memory.

Common Construction Techniques

Direct Addressing : hash(K)=aK+C; no collisions but high space cost.

Division Method : address = K mod C; simple and widely used; C should be a prime near the table size.

Mid‑square Method : square the key and take middle digits as the address.

Segment Sum Method : split the key into segments, sum them, and use the result.

Collision Resolution

When two different keys map to the same address, a hash collision occurs.

Two main factors affect collisions: the load factor α = n/m (typically kept between 0.6 and 0.9) and the quality of the hash function.

Open Addressing

Probe sequence:

Hi = (H(key) + di) mod m

where di can be:

Linear probing: di = 1,2,3,…

Quadratic probing: di = ±i²

Pseudo‑random probing: di = pseudo‑random numbers

Drawback: primary clustering and secondary clustering can degrade search performance.

Rehashing

Use a second hash function when a collision occurs:

Hi = RHi(key)

. Reduces clustering but adds computation.

Separate Chaining (Linked Lists)

All keys that hash to the same slot are stored in a linked list.

Advantages: simple, no clustering, dynamic memory, easy deletions.

Disadvantages: extra pointer overhead, may use more memory for small keys.

Overflow Area

Maintain an auxiliary overflow array where colliding entries are placed.

Common Hash Functions

DJBHash – popular bit‑operation hash.

PJWHash – based on Peter J. Weinberger’s algorithm.

ELFHash – similar to PJW, used in Unix.

BKDRHash – simple, uses a seed (e.g., 131).

SDBMHash – used in the SDBM database.

DEKHash – from Knuth’s “The Art of Computer Programming”.

APHash – contributed by Arash Partow.

Experimental comparison on 100 000 random strings and 100 000 English sentences shows BKDRHash performs best overall, APHash is also strong, while PJWHash and ELFHash rank lowest, with similar scores.

javascriptdata structureshash tablehash functioncollision resolution
Tencent IMWeb Frontend Team
Written by

Tencent IMWeb Frontend Team

IMWeb Frontend Community gathering frontend development enthusiasts. Follow us for refined live courses by top experts, cutting‑edge technical posts, and to sharpen your frontend skills.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.