Tag

AI compute network

0 views collected around this technical thread.

Bilibili Tech
Bilibili Tech
Dec 31, 2024 · Cloud Computing

Design and Implementation of Bilibili AI Compute Network: Topology, Hardware Selection, Load Balancing, and Monitoring

Bilibili designed and deployed an AI compute network for large language model training, choosing a Fat-Tree topology, selecting high‑speed switches, optical modules, and fibers, implementing fixed‑path load balancing, and building a sub‑second telemetry monitoring platform, with plans to scale to ten‑thousand GPUs.

AI compute networkFat-Tree topologyhardware selection
0 likes · 17 min read
Design and Implementation of Bilibili AI Compute Network: Topology, Hardware Selection, Load Balancing, and Monitoring