Fundamentals 10 min read

Mastering Hash Tables: Concepts, Collision Resolution, and Popular Hash Functions

This article explains the fundamentals of hash tables, covering their basic concepts, construction methods, various collision‑resolution techniques such as open addressing and chaining, and compares several widely used hash functions with performance insights.

Tencent IMWeb Frontend Team

Feb 6, 2018

Mastering Hash Tables: Concepts, Collision Resolution, and Popular Hash Functions

In JavaScript, an object is essentially a hash table. To truly understand data structures, we can implement a hash table from scratch.

Basic Concepts

A hash table is a data structure that maps keys directly to memory locations. The mapping function is called a hash function.

Construction Method

Assume we need to store n elements. Allocate an array of length m (m > n). For each key Ki, compute address hash(Ki) and store the element at that address.

From a mathematical perspective, a good hash function should be simple and efficient, have good distribution to reduce collisions, and be compressive to save memory.

Common Construction Techniques

Direct Addressing : hash(K)=aK+C; no collisions but high space cost.

Division Method : address = K mod C; simple and widely used; C should be a prime near the table size.

Mid‑square Method : square the key and take middle digits as the address.

Segment Sum Method : split the key into segments, sum them, and use the result.

Collision Resolution

When two different keys map to the same address, a hash collision occurs.

Two main factors affect collisions: the load factor α = n/m (typically kept between 0.6 and 0.9) and the quality of the hash function.

Open Addressing

Probe sequence: Hi = (H(key) + di) mod m where di can be:

Linear probing: di = 1,2,3,…

Quadratic probing: di = ±i²

Pseudo‑random probing: di = pseudo‑random numbers

Drawback: primary clustering and secondary clustering can degrade search performance.

Rehashing

Use a second hash function when a collision occurs: Hi = RHi(key). Reduces clustering but adds computation.

Separate Chaining (Linked Lists)

All keys that hash to the same slot are stored in a linked list.

Advantages: simple, no clustering, dynamic memory, easy deletions.

Disadvantages: extra pointer overhead, may use more memory for small keys.

Overflow Area

Maintain an auxiliary overflow array where colliding entries are placed.

Common Hash Functions

DJBHash – popular bit‑operation hash.

PJWHash – based on Peter J. Weinberger’s algorithm.

ELFHash – similar to PJW, used in Unix.

BKDRHash – simple, uses a seed (e.g., 131).

SDBMHash – used in the SDBM database.

DEKHash – from Knuth’s “The Art of Computer Programming”.

APHash – contributed by Arash Partow.

Experimental comparison on 100 000 random strings and 100 000 English sentences shows BKDRHash performs best overall, APHash is also strong, while PJWHash and ELFHash rank lowest, with similar scores.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

JavaScript Data Structures hash table hash function collision-resolution

Written by

Tencent IMWeb Frontend Team

IMWeb Frontend Community gathering frontend development enthusiasts. Follow us for refined live courses by top experts, cutting‑edge technical posts, and to sharpen your frontend skills.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.