Fundamentals 14 min read

Understanding Bloom Filters: Theory, Implementation, and Applications

Bloom filters are space‑efficient probabilistic structures that test set membership using multiple hash functions, offering fast, low‑memory checks with a controllable false‑positive rate, and can be implemented manually in Java, via Guava’s library, or deployed at scale with RedisBloom for distributed applications.

Java Tech Enthusiast

Feb 1, 2024

Understanding Bloom Filters: Theory, Implementation, and Applications

Bloom filters are probabilistic data structures designed to test set membership efficiently, especially for massive data sets where a small false‑positive rate is acceptable.

What is a Bloom Filter?

A Bloom filter consists of a large bit array and multiple independent hash functions. When an element is added, each hash function maps the element to a bit position, which is set to 1. To query an element, the same hash functions are applied; if all corresponding bits are 1, the element is probably present, otherwise it is definitely absent.

Principle

Adding an element:

Compute hash values using several hash functions.

Set the bits at the resulting indices to 1.

Querying an element:

Compute the same hash values.

Check whether all bits at those indices are 1.

The structure is space‑efficient (e.g., 1 000 000 bits ≈ 122 KB) but does not support deletion and incurs a false‑positive probability that grows with the number of inserted items.

Typical Use Cases

Cache‑penetration protection – quickly reject requests for non‑existent keys.

Large‑scale deduplication – e.g., URL crawling, email spam lists, IP blacklists.

Membership checks in distributed systems where exact storage is prohibitive.

Java Manual Implementation

The following Java class demonstrates a simple Bloom filter implementation using a BitSet and custom hash functions.

import java.util.BitSet;

public class MyBloomFilter {
    private static final int DEFAULT_SIZE = 2 << 24; // bit array size
    private static final int[] SEEDS = new int[]{3, 13, 46, 71, 91, 134};
    private BitSet bits = new BitSet(DEFAULT_SIZE);
    private SimpleHash[] func = new SimpleHash[SEEDS.length];

    public MyBloomFilter() {
        for (int i = 0; i < SEEDS.length; i++) {
            func[i] = new SimpleHash(DEFAULT_SIZE, SEEDS[i]);
        }
    }

    public void add(Object value) {
        for (SimpleHash f : func) {
            bits.set(f.hash(value), true);
        }
    }

    public boolean contains(Object value) {
        boolean ret = true;
        for (SimpleHash f : func) {
            ret = ret && bits.get(f.hash(value));
        }
        return ret;
    }

    public static class SimpleHash {
        private int cap;
        private int seed;
        public SimpleHash(int cap, int seed) { this.cap = cap; this.seed = seed; }
        public int hash(Object value) {
            int h;
            return (value == null) ? 0 : Math.abs(seed * (cap - 1) & ((h = value.hashCode()) ^ (h >>> 16)));
        }
    }
}

Test code:

String value1 = "https://javaguide.cn/";
String value2 = "https://github.com/Snailclimb";
MyBloomFilter filter = new MyBloomFilter();
System.out.println(filter.contains(value1)); // false
System.out.println(filter.contains(value2)); // false
filter.add(value1);
filter.add(value2);
System.out.println(filter.contains(value1)); // true
System.out.println(filter.contains(value2)); // true

Using Guava’s Bloom Filter

Guava provides a well‑tested Bloom filter implementation. Example:

BloomFilter<Integer> filter = BloomFilter.create(
    Funnels.integerFunnel(),
    1500,
    0.01);
System.out.println(filter.mightContain(1)); // false
System.out.println(filter.mightContain(2)); // false
filter.put(1);
filter.put(2);
System.out.println(filter.mightContain(1)); // true
System.out.println(filter.mightContain(2)); // true

Guava’s version is suitable for single‑node applications; for distributed scenarios, a Redis‑based filter is preferable.

Redis Bloom Filter (RedisBloom Module)

Redis 4.0+ supports modules; RedisBloom adds Bloom filter commands. Example Docker launch:

docker run -p 6379:6379 --name redis-redisbloom redislabs/rebloom:latest
docker exec -it redis-redisbloom bash
redis-cli

Common commands: BF.ADD {key} {item} – create filter if needed and add an item. BF.MADD {key} {item} [item ...] – add multiple items. BF.EXISTS {key} {item} – check membership. BF.MEXISTS {key} {item} [item ...] – batch check.

BF.RESERVE {key} {error_rate} {capacity} [EXPANSION expansion]

– create a filter with specific parameters.

Sample usage:

BF.ADD myFilter java
BF.ADD myFilter javaguide
BF.EXISTS myFilter java   // returns 1 (probably present)
BF.EXISTS myFilter github // returns 0 (definitely absent)

References

Original article and additional resources are linked throughout the text, including the official Redis modules page and the RedisBloom GitHub repository.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Redis Guava Data Structure bloom-filter probability

Written by

Java Tech Enthusiast

Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.