Bilibili Tech
Dec 10, 2024 · Big Data
Fault Self‑Healing System for Bilibili's Large‑Scale Big Data Cluster (BMR)
Bilibili's fault‑self‑healing platform for its massive BMR big‑data cluster—over 10,000 machines and 1 EB storage—adds near‑real‑time fault discovery, intelligent diagnosis, and automated workflow handling, dramatically cutting resolution time, improving stability across services, and scaling to dozens of daily automated repairs.
AutomationBMRbig-data
0 likes · 16 min read