Baidu Intelligent Cloud Tech Hub
Author

Baidu Intelligent Cloud Tech Hub

We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.

133
Articles
0
Likes
189
Views
0
Comments
Recent Articles

Latest from Baidu Intelligent Cloud Tech Hub

100 recent articles max
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Sep 22, 2025 · Cloud Computing

How Mantle Breaks the Hierarchical Namespace Bottleneck in Cloud Object Storage

The Mantle system, presented in a SOSP'25 paper by Baidu's storage team and collaborators, delivers a distributed hierarchical namespace for cloud object storage that overcomes traditional scalability and performance limits, enabling massive data lake workloads with dramatically reduced latency and vastly increased throughput.

Distributed SystemsMetadata ManagementSOSP
0 likes · 8 min read
How Mantle Breaks the Hierarchical Namespace Bottleneck in Cloud Object Storage
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Sep 9, 2025 · Artificial Intelligence

How Baidu Built a 32,000‑Card AI Super‑Compute Cluster and Boosted Efficiency by 50%

This article details Baidu Intelligent Cloud's journey in designing, constructing, and operating a 32,000‑card hybrid AI compute cluster, covering challenges in power, cooling, networking, multi‑cluster scheduling, and security, and explains how innovative hardware, software, and operational strategies achieved over 50% MFU improvement and industry‑first performance records.

AI infrastructureGPU clustershybrid cloud
0 likes · 15 min read
How Baidu Built a 32,000‑Card AI Super‑Compute Cluster and Boosted Efficiency by 50%
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Sep 4, 2025 · Artificial Intelligence

Unlocking MoE Model Power: Baidu’s Baige 5.0 AI Platform’s FP8 and Distributed Innovations

Baidu’s Baige 5.0 AI Computing Platform introduces FP8 mixed‑precision training, MoE‑aware distributed strategies, adaptive parallelism, and a three‑tier KV‑Cache, delivering over 30% training speedup and 50% inference throughput gains while keeping token latency under half a second for large‑scale models.

AIFP8MoE
0 likes · 16 min read
Unlocking MoE Model Power: Baidu’s Baige 5.0 AI Platform’s FP8 and Distributed Innovations
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jul 25, 2025 · Operations

How Baidu’s Lingxi Agent Uses LLMs to Automate Network Fault Diagnosis

This article details Baidu's evolution from manual network fault analysis to a multi‑agent AI platform, describing how the Lingxi intelligent agent leverages large language models, MCP tools, and design patterns to automate latency queries, generate analysis reports, and integrate with existing monitoring services.

AI AgentsMCP protocolnetwork operations
0 likes · 19 min read
How Baidu’s Lingxi Agent Uses LLMs to Automate Network Fault Diagnosis
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
May 23, 2025 · Artificial Intelligence

How Baidu’s Kunlun Supernode Redefines AI Compute Density and Performance

This article explains how Baidu’s Kunlun supernode, built on high‑density liquid‑cooled cabinets and a modular 1U 4‑card design, breaks traditional 8‑card limits, boosts compute density four‑fold, improves power and cooling efficiency, and provides a scalable foundation for large‑model AI training and inference.

AI infrastructureGPU clusterHigh Performance Computing
0 likes · 13 min read
How Baidu’s Kunlun Supernode Redefines AI Compute Density and Performance
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
May 16, 2025 · Artificial Intelligence

How Baidu Cloud Achieved 4µs End-to-End Latency for Large-Scale PD Inference

Baidu Intelligent Cloud built a 4µs end-to-end low‑latency HPN cluster, optimized traffic management and communication operators, and introduced dynamic expert balancing to dramatically improve the performance of large‑scale PD‑separated inference services, showcasing the deep integration of network infrastructure with AI workloads.

AI inferenceAll-to-AllHPN
0 likes · 14 min read
How Baidu Cloud Achieved 4µs End-to-End Latency for Large-Scale PD Inference
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Apr 25, 2025 · Operations

How RapidFS Accelerates AI Model Training with 10 TiB/s Storage Performance

The article explains how RapidFS, a near‑compute storage acceleration solution built on BOS object storage, delivers up to 10 TiB/s throughput for massive AI model training, detailing its architecture, deployment on a 30,000‑card Kunlun cluster, and performance test results that show linear scaling from 20 to 70 nodes.

AI trainingHigh Performance ComputingRapidFS
0 likes · 6 min read
How RapidFS Accelerates AI Model Training with 10 TiB/s Storage Performance
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Apr 18, 2025 · Operations

How Baidu’s AI‑Powered Digital Immune System Reinvents SRE Risk Management

This article explains why modern SRE teams need a digital immune system, describes Baidu’s data‑driven approach to improve system resilience, outlines the three‑phase evolution from digital transformation to AI‑enhanced risk mining, and shares concrete results and future directions for sustainable operations.

AICloud NativeDigital Immune System
0 likes · 15 min read
How Baidu’s AI‑Powered Digital Immune System Reinvents SRE Risk Management