Mastering Hash Tables: Concepts, Collision Resolution, and Popular Hash Functions
This article explains the fundamentals of hash tables, covering their basic concepts, construction methods, various collision‑resolution techniques such as open addressing and chaining, and compares several widely used hash functions with performance insights.
In JavaScript, an object is essentially a hash table. To truly understand data structures, we can implement a hash table from scratch.
Basic Concepts
A hash table is a data structure that maps keys directly to memory locations. The mapping function is called a hash function.
Construction Method
Assume we need to store n elements. Allocate an array of length m (m > n). For each key Ki, compute address hash(Ki) and store the element at that address.
From a mathematical perspective, a good hash function should be simple and efficient, have good distribution to reduce collisions, and be compressive to save memory.
Common Construction Techniques
Direct Addressing : hash(K)=aK+C; no collisions but high space cost.
Division Method : address = K mod C; simple and widely used; C should be a prime near the table size.
Mid‑square Method : square the key and take middle digits as the address.
Segment Sum Method : split the key into segments, sum them, and use the result.
Collision Resolution
When two different keys map to the same address, a hash collision occurs.
Two main factors affect collisions: the load factor α = n/m (typically kept between 0.6 and 0.9) and the quality of the hash function.
Open Addressing
Probe sequence:
Hi = (H(key) + di) mod mwhere di can be:
Linear probing: di = 1,2,3,…
Quadratic probing: di = ±i²
Pseudo‑random probing: di = pseudo‑random numbers
Drawback: primary clustering and secondary clustering can degrade search performance.
Rehashing
Use a second hash function when a collision occurs:
Hi = RHi(key). Reduces clustering but adds computation.
Separate Chaining (Linked Lists)
All keys that hash to the same slot are stored in a linked list.
Advantages: simple, no clustering, dynamic memory, easy deletions.
Disadvantages: extra pointer overhead, may use more memory for small keys.
Overflow Area
Maintain an auxiliary overflow array where colliding entries are placed.
Common Hash Functions
DJBHash – popular bit‑operation hash.
PJWHash – based on Peter J. Weinberger’s algorithm.
ELFHash – similar to PJW, used in Unix.
BKDRHash – simple, uses a seed (e.g., 131).
SDBMHash – used in the SDBM database.
DEKHash – from Knuth’s “The Art of Computer Programming”.
APHash – contributed by Arash Partow.
Experimental comparison on 100 000 random strings and 100 000 English sentences shows BKDRHash performs best overall, APHash is also strong, while PJWHash and ELFHash rank lowest, with similar scores.
Tencent IMWeb Frontend Team
IMWeb Frontend Community gathering frontend development enthusiasts. Follow us for refined live courses by top experts, cutting‑edge technical posts, and to sharpen your frontend skills.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.