Fundamentals 15 min read

B16 Hash Table: Exploiting Controlled Collisions and SIMD for High‑Performance, Compact Hashing

This article introduces the B16 hash table, a novel design that deliberately tolerates a higher hash‑collision probability and leverages SIMD instructions to achieve faster lookups, lower memory overhead, and a compact read‑only variant comparable to minimal perfect hash functions.

Baidu Intelligent Testing
Baidu Intelligent Testing
Baidu Intelligent Testing
B16 Hash Table: Exploiting Controlled Collisions and SIMD for High‑Performance, Compact Hashing

Background – Hash tables are widely used data structures with O(1) average lookup time, yet real‑world implementations differ greatly in performance. Traditional designs focus on avoiding collisions, using techniques such as open addressing, chaining, or double hashing.

Avoiding Hash Collisions – Perfect hash functions can map a fixed key set without collisions, but they are limited to static key sets, have non‑trivial construction cost, and introduce extra space and branch overhead.

Utilizing Hash Collisions – Modern CPUs provide SIMD (Single Instruction Multiple Data) capabilities that can process many small hash values in parallel. Facebook’s F14 hash table splits keys into blocks, stores an 8‑bit secondary hash (H2) per key, and uses SIMD to filter candidates quickly, effectively embracing collisions for speed.

B16 Hash Table – Inspired by F14, B16 replaces open addressing with a block‑wise chaining scheme. Each block (chunk) holds up to 16 elements; the first 14 bytes store H2 values, while the remaining two bytes control chaining. Keys are hashed to two codes H1 (block selector) and H2 (SIMD filter). Insertion, lookup, and deletion follow the block‑wise algorithm described, allowing higher load factors (≈11‑13) and better cache utilization.

B16Compact Hash Table – For read‑only scenarios, B16Compact removes per‑chunk next pointers, merges all chunks into a single array, and stores only bucket‑to‑chunk indices. This reduces overhead to about 9.23 bits per key (≈1 byte) for a million‑key table, approaching minimal perfect hash space while retaining fast SIMD‑based lookup.

Experimental Results – Benchmarks on a 4‑core Xeon server compare unordered_map, F14, B16, and B16Compact using uint64_t keys/values. B16 achieves lower memory usage than unordered_map and comparable or better insertion and lookup throughput up to 1 M keys; B16Compact offers the smallest footprint (≈17.3 bytes/key) with competitive query speed.

Conclusion – By shifting from collision avoidance to controlled collision exploitation and SIMD parallelism, the B16 family demonstrates that hash tables can be both fast and memory‑efficient, making them suitable for memory‑constrained environments such as embedded devices.

// 链表节点总大小
umap.size() * (sizeof(uint64_t) + sizeof(std::pair
) + sizeof(void*))
// bucket 总大小
+ umap.bucket_count() * sizeof(void *)
// 控制元素大小
+ 3 * sizeof(void*) + sizeof(size_t);
performanceMemory OptimizationC++data structuresSIMDhash tableB16
Baidu Intelligent Testing
Written by

Baidu Intelligent Testing

Welcome to follow.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.