Cloud Native 16 min read

Design and Implementation of Bilibili's Large-Scale Recall System

Bilibili’s large‑scale recall system separates online processing into a two‑tier merge service and an index service, supports multi‑channel text, item‑to‑item and vector indexes with real‑time updates, uses horizontal sharding, robust CI/CD, monitoring and degradation mechanisms, and is being extended toward model‑based recall and greater automation.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Design and Implementation of Bilibili's Large-Scale Recall System

The article introduces the large‑scale recall system used by Bilibili (B站) in its search and recommendation pipelines. Recall is the first stage of a multi‑level funnel that selects a small, highly relevant subset of items from massive content collections to feed downstream ranking stages. To satisfy diverse user needs, a multi‑channel recall strategy is employed, and the quality of recall directly determines the upper bound of the overall system performance.

Initially, recall was a simple sub‑module within a monolithic engine service, but rapid business growth exposed issues such as high code complexity, poor maintainability, and memory bottlenecks. The recall module was therefore extracted into an independent service. Subsequent growth in recall strategies, candidate set size, and channel count made the standalone service insufficient, prompting a redesign toward a cloud‑native, scalable, and configurable recall framework.

The redesigned architecture separates the online path into a two‑tier system: a merge service that orchestrates multi‑channel recall, de‑duplication, filtering, scoring, and final merging; and an index service that hosts channel‑specific indexes and executes the actual retrieval logic. The merge service follows a “re‑compute, light‑store” principle and integrates external components such as KV stores for user data, inference services for user embeddings, scoring modules, and the primary ranking service for filtering.

The index service is organized into four layers—interaction, execution, index, and build—supporting billions of items with horizontal sharding and real‑time updates. Three main index types are provided: text‑based inverted indexes, x2i (item‑to‑item) indexes using the NeighborHash structure, and vector indexes built on Facebook’s Faiss library (IVF and HNSW). Each channel (text, x2i, vector) has tailored indexing and retrieval optimizations, including term‑pair indexes, cache‑friendly hash layouts, and quantized scoring.

For text recall, documents are tokenized into inverted and forward indexes; queries are parsed into term expressions and matched against the posting lists. Real‑time incremental updates are handled via Kafka, WAL buffers, delta indexes, and periodic index merging to balance latency and query performance.

The x2i channel supports collaborative‑filtering‑style recall (e.g., item‑cf, swing) and tag/category‑based recall. NeighborHash provides a flat‑array, cache‑friendly hash table with minimal probe sequence length and bidirectional probing to achieve high throughput.

Vector recall maps users and items into a shared embedding space. Offline builders generate base indexes, while real‑time pipelines insert new item embeddings into a temporary “rt” index that is periodically consolidated into delta indexes. Queries retrieve candidates from base, delta, and rt indexes, merge results, and apply quantized scoring (fp32/fp16/int8) with SIMD acceleration.

Index construction is performed offline in a distributed fashion. Raw data are sharded by hash(key) % shard_num, pre‑processed, and built into binary index files that are then deployed to the online services. Incremental data are streamed via Kafka, written to WAL logs, and periodically materialized into major‑dump indexes to speed up service restarts.

Stability engineering includes a robust CI/CD pipeline with pre‑release checks, a debugging platform, core‑dump and performance regression detection, comprehensive multi‑level monitoring (service metrics, funnel metrics, index DQC, channel‑specific KPIs), and a tiered degradation strategy that can throttle channels, indexes, or the entire recall layer while preserving user experience.

Looking ahead, the system will incorporate next‑generation model‑based recall, enhance component composability and orchestration, and further automate operations to reduce operational costs.

References: 1) NeighborHash implementation (GitHub), 2) Related arXiv paper, 3) Faiss library (GitHub).

cloud nativeVector Searchsearch architecturerecall systemBilibililarge-scale indexing
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.