Big Data 12 min read

Efficient High‑Concurrency Data Retrieval Using Inverted Index and Bitmap Techniques

This article explores how to achieve fast, scalable data retrieval in million‑level high‑concurrency scenarios by replacing naïve full‑combination rule matching with column‑wise inverted indexes and bitmap operations, dramatically reducing time complexity and improving stability while leveraging RoaringBitmap compression for space efficiency.

JD Tech

Jan 22, 2024

Efficient High‑Concurrency Data Retrieval Using Inverted Index and Bitmap Techniques

The article investigates efficient data retrieval and processing in million‑level high‑concurrency environments, focusing on the implementation of inverted indexes and bitmap calculations to accelerate search and reduce storage overhead.

Background : The Promise order‑control system serves multiple nodes before and after order placement, handling the highest concurrency in the system (over one million TPS during peak periods). The core technical challenge is quickly and correctly matching user requests against a rule library containing hundreds of thousands of rules.

Naïve Solution : Cache the rule library line‑by‑line in Redis (key = rule condition, value = rule result). For each request, generate all possible parameter combinations and attempt to match them, leading to a time complexity of 2ⁿ (n≈12) and worst‑case 4096 calculations per request.

Figure 1.

New Solution : Adopt column‑wise inverted indexes combined with bitwise operations. Each column’s values are mapped to posting lists (row IDs). Bitmaps are generated for each posting list, and query parameters are used as keys to retrieve matching bitmaps. Column‑level OR operations handle empty values, while inter‑column AND operations yield the candidate rule set.

Figure 2.

Algorithm Details :

Pre‑compute column inverted indexes and corresponding bitmaps (Posting Lists).

During a request, use request parameters as keys to fetch matching bitmaps.

Perform column‑wise OR for each column (including empty values) and then inter‑column AND to obtain the candidate rule set.

Rank candidate rules by business priority and iteratively apply AND operations to find the optimal rule.

Figure 3.

Complexity Analysis : The new approach reduces time complexity from exponential to linear (≈3n ≈ n). Space complexity increases due to storing posting lists for each column, but this is mitigated by compressing bitmaps with RoaringBitmap, achieving up to 100× space savings.

Engineering Considerations – Bitmap Compression : Sparse bitmaps can waste memory; RoaringBitmap stores only populated blocks, drastically reducing memory usage (e.g., 1.2 MB vs. 144 bytes for a 10 million‑bit bitmap).

Figure 4.

Applicability : The method works well for equality queries with simple posting lists but is unsuitable for range or LIKE queries due to the need for exhaustive key enumeration.

Other Optimizations : Alternatives such as skip‑lists or hash tables (e.g., Elasticsearch’s skip‑list) can further accelerate intersect operations, achieving logarithmic time complexity.

Figure 5.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Rule Engine high concurrency bitmap RoaringBitmap inverted index

Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.