Big Data 18 min read

Bilibili Data Service Middle Platform: Architecture, Practices, and Future Roadmap

This article presents Bilibili's data service middle platform, detailing its background, one‑stop data service architecture, core processes, model and API construction, query mechanisms, full‑link control, cost‑reduction, high‑availability strategies, achieved results, and future roadmap.

DataFunTalk
DataFunTalk
DataFunTalk
Bilibili Data Service Middle Platform: Architecture, Practices, and Future Roadmap

The article begins by describing the pain points of traditional data acquisition at Bilibili, illustrated with two real cases that highlight high cost, long communication chains, duplicated modeling, and unclear data lineage.

To address these issues, a one‑stop data service middle platform is proposed, offering unified definition, unified production, and unified consumption of data.

The platform’s framework is layered on top of the data warehouse and includes a data construction layer (model definition, acceleration, API creation), a data query layer (atomic and secondary calculations), a service interface layer (synchronous and asynchronous APIs), and a service gateway layer (degradation, rate‑limiting, authentication, caching).

Core processes are explained from the perspective of data developers and business developers: defining metrics, building models via drag‑and‑drop, accelerating data from cold to hot storage, publishing APIs, and consuming them through a unified gateway.

Model building supports various schemas (star, snowflake, constellation) and offers two acceleration modes: detail acceleration (cold‑to‑hot mirroring) and pre‑computation acceleration (aggregated data in hot storage). Different scenarios (online, near‑online, OLAP, offline) are mapped to appropriate engine combinations such as KV, TiDB, MySQL, ClickHouse, and Iceberg.

API construction can be visual (model‑based) or metric‑driven, allowing users to configure request/response parameters without writing SQL.

Data query flow is broken into DSL parsing, task splitting, result processing, translation to engine‑specific SQL via a two‑layer AST, and execution on multiple engines with optimizations like connection pooling and timeout cancellation.

Full‑link control is achieved through unified definition, automated production, and comprehensive monitoring of metric consistency, out‑bound consistency, and service quality.

Cost‑reduction and efficiency gains stem from reusing standardized models and APIs, shortening development cycles from a week to about a day, and reducing duplicated effort.

High‑availability measures include service isolation into independent resource groups and active‑active multi‑region deployment for disaster recovery.

Results after one year show over 600 APIs, improved performance and stability, and a development cycle reduced to one day. The future roadmap focuses on service governance, broader scenario support, service orchestration, disaster‑recovery automation, and continued cost‑efficiency improvements.

architecturebig datadata platformData GovernanceData Service
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.