Bilibili's One-Stop Big Data Cluster Management Platform (BMR) - Architecture and Implementation
Bilibili’s one‑stop Big Data Cluster Management Platform (BMR) consolidates HDFS, Spark, Flink, ClickHouse, Kafka and other services into a unified system that evolved through four stages—standardization, metadata‑driven construction, containerization, and observability—addressing node consistency, scaling, fault self‑healing, and resource optimization while delivering elastic scaling, automated start/stop, and future cost‑saving and stability enhancements.
This article introduces Bilibili's one-stop big data cluster management platform (BMR), which was developed to address the rapid growth and increasing complexity of the company's big data services. The platform has evolved through four main stages: survival (standardization and rapid iteration), subsistence (metadata management and scenario-based construction), prosperity (containerization and capacity management), and common prosperity (observability and service quality).
The article details the challenges faced during platform development, including node consistency issues, standardization implementation, large-scale management, iteration efficiency, and peak shaving. It then presents the platform's technical architecture, which consists of cluster management, component management, change control, and resource management modules.
The platform supports various big data components including HDFS, Spark, Flink, ClickHouse, Kafka, and others. It provides capabilities such as application iteration, configuration updates, smooth start/stop, elastic scaling, fault self-healing, containerization, service mixing, and tidal retreat. The article includes detailed tables showing the current status of support for different components and their capabilities.
Future plans include further cost reduction through resource optimization, efficiency improvement through expanded fault self-healing and prediction, enhanced stability through increased coverage and standardization, and improved service quality management.
Bilibili Tech
Provides introductions and tutorials on Bilibili-related technologies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.