Optimizing Large-Scale Product Set Refresh with RoaringBitmap
By representing pre‑and post‑refresh SPU sets as RoaringBitmaps and diffing them, the system avoids full‑insert writes, cuts memory usage by orders of magnitude, speeds refreshes by over 50 % and reduces write volume nearly 87 %, solving large‑scale tag‑based product refresh challenges.
In the product‑selection scenario, each tag ID is bound to a large set of SPUs (often hundreds of thousands or millions). When tag rules change or scheduled tasks trigger, the entire SPU set must be refreshed, inserting new SPUs and deleting obsolete ones.
Storing the full before‑and‑after SPU sets in memory for diff computation leads to memory overflow and service crashes, especially when many tags are refreshed simultaneously.
The current backend uses HBase and adopts a "full insert‑full delete" strategy: the refreshed set is fully written (leveraging HBase's idempotent row‑key overwrite) and then old rows with an outdated timestamp are scanned and removed. Although only about 10% of SPUs actually change, the full‑insert approach causes massive duplicate writes, increased storage pressure, and unnecessary notifications to downstream platforms.
To reduce memory usage while still supporting set operations (intersection, difference), a bitmap‑based data structure is proposed. Bitmaps store data as a bit array, offering O(1) operations and very low memory consumption (e.g., 5 M entries require only ~0.5 MiB).
The solution stores the pre‑ and post‑refresh SPU sets as Bitmaps, computes the difference, and then performs only the necessary add/delete actions.
Several bitmap implementations were evaluated: Java's native BitSet , Guava's EWAHCompressedBitmap , and RoaringBitmap . Redis Bitmap was excluded because it stores data remotely.
Memory‑usage tests (adding 5 M entries with varying sparsity) showed that EWAHCompressedBitmap uses the least memory at sparsity = 1, while for other sparsities the order is RoaringBitmap < EWAHCompressedBitmap < BitSet.
CPU‑time tests (2000 diff operations against a full 5 M bitmap) indicated RoaringBitmap ≈ EWAHCompressedBitmap < BitSet.
Considering both memory and CPU results across realistic sparsity levels, RoaringBitmap was selected as the optimal implementation.
RoaringBitmap splits a 32‑bit unsigned integer into 2^16 high‑order containers and low‑order values, using four container types (ArrayContainer, BitmapContainer, RunContainer, SharedContainer) to adapt to data density. This design keeps each container small enough to fit in L1 cache, dramatically improving performance.
Advantages of RoaringBitmap include:
Significant memory savings compared to raw bitmaps (e.g., 5 M uint32 values need only ~0.5 MiB).
Faster union/intersection operations due to container‑level indexing and selective computation.
Optimized logical flow: ordered top‑level indexes allow early filtering of irrelevant containers, and binary‑search based intersection for mismatched container sizes reduces overhead.
Real‑world deployment results show average refresh speed improvement of 52.42% and write‑volume reduction of 86.98%, effectively alleviating memory pressure and downstream processing load.
The approach is also applicable to other refresh scenarios such as platform‑side tag updates.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.