Fundamentals 8 min read

How to Test Membership in 4 Billion Numbers Using Only 1 GB Memory

This article explains how to determine whether a given unsigned integer belongs to a set of 4 billion distinct numbers within a 1 GB memory limit, comparing a bitmap approach with a Bloom filter, providing detailed implementation steps and C++ code examples for both methods.

ITPUB
ITPUB
ITPUB
How to Test Membership in 4 Billion Numbers Using Only 1 GB Memory

Problem

We have 4 billion distinct unsigned 32‑bit integers stored unsorted. The task is to answer membership queries (does a given number belong to the set?) while using no more than 1 GB of RAM.

Approach Comparison

Bitmap (bit‑set) method – one bit for each possible value (0 … 2³²‑1). Exact answers, O(1) query time, memory usage = 2³² bits ≈ 512 MiB.

Bloom filter – multiple hash functions map each element to bits in a smaller array. Saves memory (≈10 % of bitmap) but introduces a configurable false‑positive probability.

Bitmap Solution

Because 512 MiB fits comfortably within the 1 GB limit, a plain bitmap is the simplest exact solution.

Memory Requirement

2³² bits = 4,294,967,296 bits ≈ 512 MiB.

Implementation Steps

Create an array of uint32_t elements large enough to hold 2³² bits (i.e., 2³² / 32 = 134,217,728 words).

For each input number n, set the corresponding bit:

index = n / 32;
offset = n % 32;
bitmap[index] |= (1U << offset);

To answer a query, test the same bit:

bool exists = (bitmap[index] & (1U << offset)) != 0;

Complete example (C++17):

#include <iostream>
#include <vector>
#include <cstdint>

const std::size_t WORD_COUNT = 1ULL << 30; // 2^32 / 32
std::vector<uint32_t> bitmap(WORD_COUNT, 0);

inline void setBit(uint32_t n) {
    std::size_t idx = n / 32;
    uint32_t off = n % 32;
    bitmap[idx] |= (1U << off);
}

inline bool checkBit(uint32_t n) {
    std::size_t idx = n / 32;
    uint32_t off = n % 32;
    return (bitmap[idx] & (1U << off)) != 0;
}

int main() {
    // Assume "numbers" holds the 4 billion values
    std::vector<uint32_t> numbers = {/* ... */};
    for (uint32_t v : numbers) setBit(v);

    uint32_t query = 123456789U;
    if (checkBit(query))
        std::cout << "Number exists." << std::endl;
    else
        std::cout << "Number does not exist." << std::endl;
    return 0;
}

Bloom Filter Alternative

If memory must be reduced further, a Bloom filter can be used at the cost of false positives.

Typical Parameters

Bit array size ≈ 100 million bits (≈12.5 MiB).

Number of hash functions = 5 (gives a false‑positive rate around 1 % for 4 billion items).

Implementation Steps

Allocate HASH_COUNT vectors of bool of length BIT_ARRAY_SIZE.

Initialize an array of std::hash<uint32_t> (or any fast 32‑bit hash) for the HASH_COUNT functions.

For each input value, compute each hash, take modulo BIT_ARRAY_SIZE, and set the corresponding bit.

To query, compute the same hashes; if any of the bits is 0 the element is definitely absent, otherwise it may be present.

Example (C++17):

#include <array>
#include <vector>
#include <functional>
#include <cstdint>
#include <iostream>

constexpr std::size_t BIT_ARRAY_SIZE = 100'000'000; // 100 M bits
constexpr std::size_t HASH_COUNT = 5;

std::array<std::vector<bool>, HASH_COUNT> filter;
std::array<std::hash<uint32_t>, HASH_COUNT> hasher;

inline void add(uint32_t x) {
    for (std::size_t i = 0; i < HASH_COUNT; ++i) {
        std::size_t idx = hasher[i](x) % BIT_ARRAY_SIZE;
        filter[i][idx] = true;
    }
}

inline bool possiblyContains(uint32_t x) {
    for (std::size_t i = 0; i < HASH_COUNT; ++i) {
        std::size_t idx = hasher[i](x) % BIT_ARRAY_SIZE;
        if (!filter[i][idx]) return false; // definitely not present
    }
    return true; // may be present (false positive possible)
}

int main() {
    for (std::size_t i = 0; i < HASH_COUNT; ++i) {
        filter[i] = std::vector<bool>(BIT_ARRAY_SIZE, false);
        hasher[i] = std::hash<uint32_t>();
    }
    // Insert the 4 billion numbers here …

    uint32_t query = 123456789U;
    if (possiblyContains(query))
        std::cout << "Number might exist (false positive possible)." << std::endl;
    else
        std::cout << "Number definitely does not exist." << std::endl;
    return 0;
}

Performance Considerations

Bitmap: O(1) query and update, exact answer, 512 MiB memory.

Bloom filter: O(k) query where k = number of hash functions (e.g., 5), memory ≈ 12.5 MiB, false‑positive probability configurable (≈1 % with the parameters above).

Conclusion

With a 1 GB memory budget, the bitmap method is the most straightforward and efficient way to achieve exact membership testing for 4 billion unsigned integers. A Bloom filter is an alternative when memory must be reduced further and a small false‑positive rate is acceptable.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

bitmaplarge datasetbloom-filterC++membership test
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.