Bilibili Big Data Governance: From Reactive Storage Management to Proactive Multi‑Dimensional Governance
Bilibili’s exabyte‑scale big‑data platform, after rapid growth created fragmented ownership and costly storage, launched the Wanglou project to build a metadata‑driven, indicator‑based governance framework that cut storage use by half, introduced compliance scoring and automation, and now plans to extend proactive, multi‑dimensional governance to compute, traffic and lake‑house resources.
Bilibili's big data platform, founded in 2009, began building its data team in 2017 and scaled rapidly, reaching exabyte‑scale data by 2023.
The explosive growth created challenges: fragmented data ownership, mixed business and technical data, lack of asset attribution, and inconsistent reporting standards.
In late 2021, the "望楼" (Wanglou) project was launched to establish a comprehensive data‑asset governance framework that can monitor anomalies, issue alerts, and enforce policies.
The governance approach starts with building an asset metadata model and a governance indicator system. Initially a bottom‑up strategy identifies key assets (Hive tables, scheduling tasks) and maps their lifecycle; later a top‑down strategy uses the indicator system to drive improvements.
Storage governance was chosen as the first focus due to high storage costs and frequent >90% utilization alerts. Targets included reducing overall storage usage by 50% in 2022 and lowering water‑level risk.
Problems were categorized, standards defined, and concrete strategies formulated, e.g.: Problem Standard Strategy Downstream unused Offline low‑heat data should be decommissioned Push decommission of low‑heat data TTL too long Set TTL according to data freshness or hierarchy Shorten TTL for over‑retained data Data not compressed Data must be compressed Enforce compression on uncompressed data
Governance scoring was introduced to quantify department compliance, linking scores to incentives and awards.
To improve user participation, a three‑step process was designed: automatic issue interception, clear prioritization, and guided execution via the Governance Center product.
Automation, standardized tagging, quota controls, and SOP‑driven destruction workflows were implemented to increase operational efficiency.
Future plans include expanding governance from storage to compute and traffic resources, introducing cost‑value assessment, integrating lake‑house and one‑service architectures, and continuing proactive, multi‑dimensional governance.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.