Fundamentals 13 min read

Bloom Filter, Counting Bloom Filter, and Cuckoo Filter: Principles, Problems, and Optimizations

This article explains the concept and operation of Bloom filters, discusses their limitations such as false positives and deletion difficulties, introduces the Counting Bloom filter and Cuckoo filter as improvements, and explores their underlying hash mechanisms, performance trade‑offs, and optimization strategies.

Java Architect Essentials
Java Architect Essentials
Java Architect Essentials
Bloom Filter, Counting Bloom Filter, and Cuckoo Filter: Principles, Problems, and Optimizations

Everyone knows that I/O is a bottleneck in computer systems, and many frameworks and hardware aim to reduce I/O. This article introduces filters as a way to block unnecessary database queries.

When a backend service checks a cache before querying a database, many requests may ask for data that does not exist, causing the database to perform useless I/O. Filters are introduced to prevent such requests from reaching the database.

Bloom Filter

The Bloom Filter (Bloom Filter) works by using a bitmap. When an element arrives, three hash functions compute three positions in the bitmap and set those bits to 1. If a later query hashes to the same three positions, the filter assumes the element is present; otherwise it returns a miss.

The bitmap initially contains all zeros. When data is inserted, the three hash functions set the corresponding bits to 1. If a new query maps to the same three bits, the filter reports the element as present; if any bit is 0, it reports a miss.

Problems of Bloom Filter

Bloom filters suffer from false positives: different elements may set overlapping bits, causing a non‑existent element to be reported as present.

Deletion is also problematic. Removing an element’s bits may clear bits that belong to other elements, leading to false negatives.

Counting Bloom Filter (Enhanced Bloom Filter)

The Counting Bloom Filter replaces the bitmap with an array of counters. Each insertion increments the counters at the hashed positions, and deletion decrements them, avoiding the need to recompute other elements’ bits. However, false positives remain.

Cuckoo Filter

To solve the deletion issue, the Cuckoo Filter was proposed. Compared with Bloom filters, it suffers from weaker query performance, lower space efficiency, and lack of reverse operations.

Weak query performance stems from multiple hash probes causing low CPU cache hit rates.

Low space efficiency because, for the same false‑positive rate, Bloom filters can use a non‑power‑of‑two bitmap, while Cuckoo filters require the array length to be a power of two.

No delete support – once an element’s fingerprint is stored, removing it may affect other elements sharing the same slots.

Cuckoo Hash

The simplest Cuckoo hash uses a one‑dimensional array with two hash functions mapping an element to two possible slots. If either slot is empty, the element is placed there; otherwise one of the occupants is evicted and re‑hashed.

p1 = hash1(x) % l
p2 = hash2(x) % l

If the array becomes too crowded, repeated evictions degrade insertion efficiency, requiring a resize.

Another issue is eviction cycles when multiple elements map to the same pair of slots.

Optimizations

Typical improvements include increasing the number of hash functions (e.g., 3‑4) to reduce collisions and adding multiple “buckets” per slot, which raises space utilization and cache friendliness.

Combining both ideas—using four hash functions and two buckets per slot—can achieve near‑99% space utilization with high query speed.

Cuckoo Filter Details

The Cuckoo Filter stores only fingerprints (a few bits) of elements, not the full element, which saves space but introduces false positives.

It uses two hash functions to compute two candidate slots, but the second slot is derived from the first slot and the fingerprint:

fp = fingerprint(x)
p1 = hash(x) % l
p2 = p1 ^ hash(fp) // XOR

Because the array length is a power of two, the modulo operation can be performed by masking low bits, and the XOR ensures p1 ≠ p2 when hash(fp) ≠ 0, preventing self‑eviction loops.

Thus, knowing fp and one position allows direct computation of the opposite position without the original element.

For more details, see the original article: www.cnblogs.com/Courage129/p/14337466.html

到此文章就结束了。

Database Optimizationdata structureshashingbloom filtercuckoo filterCounting Bloom
Java Architect Essentials
Written by

Java Architect Essentials

Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.