Tag

BMR

0 views collected around this technical thread.

Bilibili Tech
Bilibili Tech
Dec 10, 2024 · Big Data

Fault Self‑Healing System for Bilibili's Large‑Scale Big Data Cluster (BMR)

Bilibili's fault‑self‑healing platform for its massive BMR big‑data cluster—over 10,000 machines and 1 EB storage—adds near‑real‑time fault discovery, intelligent diagnosis, and automated workflow handling, dramatically cutting resolution time, improving stability across services, and scaling to dozens of daily automated repairs.

AutomationBMRbig-data
0 likes · 16 min read
Fault Self‑Healing System for Bilibili's Large‑Scale Big Data Cluster (BMR)
Bilibili Tech
Bilibili Tech
Oct 29, 2024 · Big Data

Bilibili One‑Stop Big Data Cluster Management Platform (BMR): Architecture, Modules, and Future Outlook

Bilibili's One‑Stop Big Data Cluster Management Platform (BMR) unifies cluster, metadata, intelligent operations, and custom managers to oversee 50+ services, 10,000 machines, exabyte storage, and millions of cores, using cloud‑native containers, fault prediction, and resource‑sharing techniques to boost efficiency, stability, and cost savings.

BMRDevOpsIntelligent Operations
0 likes · 17 min read
Bilibili One‑Stop Big Data Cluster Management Platform (BMR): Architecture, Modules, and Future Outlook