How to Test Membership in 4 Billion Numbers Using Only 1 GB Memory
This article explains how to determine whether a given unsigned integer belongs to a set of 4 billion distinct numbers within a 1 GB memory limit, comparing a bitmap approach with a Bloom filter, providing detailed implementation steps and C++ code examples for both methods.
Problem
We have 4 billion distinct unsigned 32‑bit integers stored unsorted. The task is to answer membership queries (does a given number belong to the set?) while using no more than 1 GB of RAM.
Approach Comparison
Bitmap (bit‑set) method – one bit for each possible value (0 … 2³²‑1). Exact answers, O(1) query time, memory usage = 2³² bits ≈ 512 MiB.
Bloom filter – multiple hash functions map each element to bits in a smaller array. Saves memory (≈10 % of bitmap) but introduces a configurable false‑positive probability.
Bitmap Solution
Because 512 MiB fits comfortably within the 1 GB limit, a plain bitmap is the simplest exact solution.
Memory Requirement
2³² bits = 4,294,967,296 bits ≈ 512 MiB.
Implementation Steps
Create an array of uint32_t elements large enough to hold 2³² bits (i.e., 2³² / 32 = 134,217,728 words).
For each input number n, set the corresponding bit:
index = n / 32;
offset = n % 32;
bitmap[index] |= (1U << offset);To answer a query, test the same bit:
bool exists = (bitmap[index] & (1U << offset)) != 0;Complete example (C++17):
#include <iostream>
#include <vector>
#include <cstdint>
const std::size_t WORD_COUNT = 1ULL << 30; // 2^32 / 32
std::vector<uint32_t> bitmap(WORD_COUNT, 0);
inline void setBit(uint32_t n) {
std::size_t idx = n / 32;
uint32_t off = n % 32;
bitmap[idx] |= (1U << off);
}
inline bool checkBit(uint32_t n) {
std::size_t idx = n / 32;
uint32_t off = n % 32;
return (bitmap[idx] & (1U << off)) != 0;
}
int main() {
// Assume "numbers" holds the 4 billion values
std::vector<uint32_t> numbers = {/* ... */};
for (uint32_t v : numbers) setBit(v);
uint32_t query = 123456789U;
if (checkBit(query))
std::cout << "Number exists." << std::endl;
else
std::cout << "Number does not exist." << std::endl;
return 0;
}Bloom Filter Alternative
If memory must be reduced further, a Bloom filter can be used at the cost of false positives.
Typical Parameters
Bit array size ≈ 100 million bits (≈12.5 MiB).
Number of hash functions = 5 (gives a false‑positive rate around 1 % for 4 billion items).
Implementation Steps
Allocate HASH_COUNT vectors of bool of length BIT_ARRAY_SIZE.
Initialize an array of std::hash<uint32_t> (or any fast 32‑bit hash) for the HASH_COUNT functions.
For each input value, compute each hash, take modulo BIT_ARRAY_SIZE, and set the corresponding bit.
To query, compute the same hashes; if any of the bits is 0 the element is definitely absent, otherwise it may be present.
Example (C++17):
#include <array>
#include <vector>
#include <functional>
#include <cstdint>
#include <iostream>
constexpr std::size_t BIT_ARRAY_SIZE = 100'000'000; // 100 M bits
constexpr std::size_t HASH_COUNT = 5;
std::array<std::vector<bool>, HASH_COUNT> filter;
std::array<std::hash<uint32_t>, HASH_COUNT> hasher;
inline void add(uint32_t x) {
for (std::size_t i = 0; i < HASH_COUNT; ++i) {
std::size_t idx = hasher[i](x) % BIT_ARRAY_SIZE;
filter[i][idx] = true;
}
}
inline bool possiblyContains(uint32_t x) {
for (std::size_t i = 0; i < HASH_COUNT; ++i) {
std::size_t idx = hasher[i](x) % BIT_ARRAY_SIZE;
if (!filter[i][idx]) return false; // definitely not present
}
return true; // may be present (false positive possible)
}
int main() {
for (std::size_t i = 0; i < HASH_COUNT; ++i) {
filter[i] = std::vector<bool>(BIT_ARRAY_SIZE, false);
hasher[i] = std::hash<uint32_t>();
}
// Insert the 4 billion numbers here …
uint32_t query = 123456789U;
if (possiblyContains(query))
std::cout << "Number might exist (false positive possible)." << std::endl;
else
std::cout << "Number definitely does not exist." << std::endl;
return 0;
}Performance Considerations
Bitmap: O(1) query and update, exact answer, 512 MiB memory.
Bloom filter: O(k) query where k = number of hash functions (e.g., 5), memory ≈ 12.5 MiB, false‑positive probability configurable (≈1 % with the parameters above).
Conclusion
With a 1 GB memory budget, the bitmap method is the most straightforward and efficient way to achieve exact membership testing for 4 billion unsigned integers. A Bloom filter is an alternative when memory must be reduced further and a small false‑positive rate is acceptable.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
