Databases 13 min read

Design and Implementation of a Multi-Level Storage Architecture for Bilibili Comment Service

The paper proposes a multi‑level storage architecture for Bilibili’s comment service that replaces TiDB with a custom KV store (Taishan) and Redis caching, introduces unstructured indexes, CAS‑based consistency, real‑time and offline reconciliation, and a hedged degradation strategy to boost reliability, read throughput, and scalability during traffic spikes.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Design and Implementation of a Multi-Level Storage Architecture for Bilibili Comment Service

The comment system is a core component of Bilibili's ecosystem, enabling interaction between creators and users, influencing content recommendation, community culture, and user retention. During hot events, comment traffic spikes dramatically, making service stability and cache hit rates critical. A cache miss forces requests to TiDB, risking service outage if TiDB fails.

To improve reliability, a multi‑level storage architecture is proposed, replacing TiDB as the single point of failure with a custom KV store (Taishan) and leveraging Redis for caching and sorted‑set indexes.

Architecture Design

The existing architecture relies on Redis for sorted‑set indexes (likes, time, heat) and TiDB for persistent storage. When Redis misses, TiDB queries become slow and consume excessive CPU and memory. Example SQL for like‑order index:

SELECT id FROM reply WHERE ... ORDER BY like_count DESC LIMIT m,n

The new design introduces a multi‑level storage system based on Taishan KV, with three core ideas:

Convert structured index storage to unstructured.

Replace SQL queries with higher‑performance NoSQL operations.

Trade write‑side complexity for read‑side performance.

Storage Design

Two abstract data models are defined:

Index : secondary index stored as Redis Sorted Set.

KV : primary key‑row stored as key‑value pairs containing metadata and comment content.

The model enables efficient pagination, sorting, and scanning by using Redis Sorted Set for indexes and a KV cache (Redis or Memcache) for comment material.

Data Consistency

Switching from TiDB (structured) to Taishan (unstructured) requires custom synchronization, which may cause data loss, write failures, conflicts, out‑of‑order writes, and latency. To mitigate these issues, a retry queue is introduced for failed writes, and CAS (Compare‑and‑Swap) is used to ensure atomic read‑modify‑write operations.

Version numbers are added to each comment change to detect and discard stale updates. Example update statement:

UPDATE reply SET like_count=like_count+1, version=version+1 WHERE id = xxx

During CAS writes, the version from the binlog is compared with the current version in Taishan; only newer or equal versions are applied.

Reconciliation System

Because CAP theorem limits simultaneous availability and consistency, a reconciliation mechanism is needed. Real‑time reconciliation consumes TiDB binlog events with a delayed queue, compares them against Taishan data, and triggers alerts for mismatches. Offline reconciliation uses T+1 data from both TiDB and Taishan warehouses to verify eventual consistency.

Degradation Strategy

High availability is achieved by automatically degrading from TiDB to Taishan when the primary store fails or times out. Instead of simple serial or parallel fallback, a hedging policy is employed: after a configurable delay, a backup request is sent to the secondary store; the faster successful response is returned, balancing latency and resource usage.

In production, TiDB may serve as the primary for latency‑sensitive queries, while Taishan acts as primary for heavy‑weight analytical workloads. An incident example shows TiKV node failure leading to seamless degradation to Taishan without user impact.

Summary and Outlook

The multi‑level storage architecture significantly improves the stability and performance of Bilibili's comment service, providing reliable fallback, higher read throughput, and better scalability. Ongoing work includes refining synchronization, expanding NoSQL capabilities, and further automating degradation mechanisms.

Redisdata consistencyTiDBstorage architectureNoSQLcomment systemfallback strategy
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.