Big Data 21 min read

Bilibili Big Data Governance: From Reactive Storage Management to Proactive Multi‑Dimensional Governance

Bilibili’s exabyte‑scale big‑data platform, after rapid growth created fragmented ownership and costly storage, launched the Wanglou project to build a metadata‑driven, indicator‑based governance framework that cut storage use by half, introduced compliance scoring and automation, and now plans to extend proactive, multi‑dimensional governance to compute, traffic and lake‑house resources.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Bilibili Big Data Governance: From Reactive Storage Management to Proactive Multi‑Dimensional Governance

Bilibili's big data platform, founded in 2009, began building its data team in 2017 and scaled rapidly, reaching exabyte‑scale data by 2023.

The explosive growth created challenges: fragmented data ownership, mixed business and technical data, lack of asset attribution, and inconsistent reporting standards.

In late 2021, the "望楼" (Wanglou) project was launched to establish a comprehensive data‑asset governance framework that can monitor anomalies, issue alerts, and enforce policies.

The governance approach starts with building an asset metadata model and a governance indicator system. Initially a bottom‑up strategy identifies key assets (Hive tables, scheduling tasks) and maps their lifecycle; later a top‑down strategy uses the indicator system to drive improvements.

Storage governance was chosen as the first focus due to high storage costs and frequent >90% utilization alerts. Targets included reducing overall storage usage by 50% in 2022 and lowering water‑level risk.

Problems were categorized, standards defined, and concrete strategies formulated, e.g.: Problem Standard Strategy Downstream unused Offline low‑heat data should be decommissioned Push decommission of low‑heat data TTL too long Set TTL according to data freshness or hierarchy Shorten TTL for over‑retained data Data not compressed Data must be compressed Enforce compression on uncompressed data

Governance scoring was introduced to quantify department compliance, linking scores to incentives and awards.

To improve user participation, a three‑step process was designed: automatic issue interception, clear prioritization, and guided execution via the Governance Center product.

Automation, standardized tagging, quota controls, and SOP‑driven destruction workflows were implemented to increase operational efficiency.

Future plans include expanding governance from storage to compute and traffic resources, introducing cost‑value assessment, integrating lake‑house and one‑service architectures, and continuing proactive, multi‑dimensional governance.

Big DatametadataCost ManagementStorage Optimizationdata governanceBilibili
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.