Author

Baidu Intelligent Cloud Tech Hub

We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.

133

Articles

Likes

189

Views

Comments

Latest from Baidu Intelligent Cloud Tech Hub

100 recent articles max

Baidu Intelligent Cloud Tech Hub

May 29, 2026 · Industry Insights

How Baidu’s Hanhai U Series Cuts 3 Million Yuan Cost for 10 MW High‑Density AI Data Centers

The article analyzes the power‑supply challenges of high‑density AI data centers, compares traditional UPS and 800 V DC architectures, and shows how Baidu’s Hanhai U series redesign delivers precise capacity matching, up to 2.5× higher power density, 55% space reduction and up to 15% cost savings.

AI computeBaiduHanhai U series

0 likes · 11 min read

How Baidu’s Hanhai U Series Cuts 3 Million Yuan Cost for 10 MW High‑Density AI Data Centers

Baidu Intelligent Cloud Tech Hub

May 27, 2026 · Artificial Intelligence

Optimizing Large Model Inference Architecture for the Agent Era: Engineering Practices and Challenges

The article analyzes the architectural challenges of large‑model inference in the Agent era—such as memory‑intensive MLA structures, MoE communication overhead, exploding KV‑Cache size, and tool‑call accuracy—and presents a series of engineering solutions including hierarchical KV‑Cache pooling, sequence parallelism, offloading strategies, and chip‑level adaptations to achieve higher throughput and lower token costs.

AI InfraAgentDeepSeek

0 likes · 15 min read

Optimizing Large Model Inference Architecture for the Agent Era: Engineering Practices and Challenges

Baidu Intelligent Cloud Tech Hub

May 26, 2026 · Operations

When CPUs Hide GPU Bottlenecks: How Btune 2.0 Automates Latency Analysis to Uncover Performance Issues

The article presents a real‑world migration case where a CPU‑XPU bottleneck limited inference QPS, explains how Btune 2.0’s new latency‑focused diagnostics pinpointed a kernel lock contention in the halolet component, and shows the AI Agent’s automated, cross‑process analysis that restored performance and reduced cost.

AI infrastructureCPU-GPU bottleneckCross-process analysis

0 likes · 11 min read

When CPUs Hide GPU Bottlenecks: How Btune 2.0 Automates Latency Analysis to Uncover Performance Issues

Baidu Intelligent Cloud Tech Hub

Apr 24, 2026 · Artificial Intelligence

LoongForge: Open‑Source Multimodal Training Framework Runs on GPU and Kunlun XPU with 45% Speedup

LoongForge is an open‑source, Megatron‑based multimodal training framework that unifies LLM, VLM, VLA and diffusion models, runs seamlessly on NVIDIA GPUs and Baidu Kunlun XPU, and delivers 15%‑45% end‑to‑end training acceleration while scaling linearly on thousands of cards.

GPUKunlun XPULoongForge

0 likes · 23 min read

LoongForge: Open‑Source Multimodal Training Framework Runs on GPU and Kunlun XPU with 45% Speedup

Baidu Intelligent Cloud Tech Hub

Apr 8, 2026 · Artificial Intelligence

Unlocking 8‑Hour Autonomous Coding: GLM‑5.1’s Leap with Kunlun XPU

The open‑source GLM‑5.1 model, adapted to Baidu Baige's Kunlun XPU via the vLLM‑Kunlun Plugin, delivers record‑breaking SWE‑bench scores, eight‑hour autonomous coding, long‑context handling up to 64K tokens, and scalable deployment across tens of thousands of chips, showcasing end‑to‑end AI acceleration.

GLM-5.1Kunlun XPUModel Deployment

0 likes · 8 min read

Unlocking 8‑Hour Autonomous Coding: GLM‑5.1’s Leap with Kunlun XPU

Baidu Intelligent Cloud Tech Hub

Apr 7, 2026 · Artificial Intelligence

How Baidu’s 7th‑Gen AI Confidential VM Achieves Full‑Stack Secure Compute

Baidu Intelligent Cloud’s seventh‑generation AI confidential virtual machine combines Intel TDX, NVIDIA GPUs, and BlueField DPUs to deliver end‑to‑end encrypted data paths, elastic multi‑GPU scaling, and near‑native performance, proving that high‑sensitivity AI workloads can run securely in the cloud without sacrificing speed.

AIConfidential ComputingVirtualization

0 likes · 17 min read

How Baidu’s 7th‑Gen AI Confidential VM Achieves Full‑Stack Secure Compute

Baidu Intelligent Cloud Tech Hub

Mar 23, 2026 · Artificial Intelligence

How vLLM‑Kunlun Unlocks Peak LLM Performance on Kunlun XPU

This article details the technical challenges of adapting the open‑source vLLM inference framework to Baidu's Kunlun XPU, outlines four major performance bottlenecks, and presents a multi‑dimensional optimization roadmap—including custom plugins, operator fusion, INT8 quantization, and CUDA‑Graph techniques—that together boost throughput by up to 8% and narrow the gap with leading GPU hardware.

CUDA GraphINT8 QuantizationKunlun XPU

0 likes · 13 min read

How vLLM‑Kunlun Unlocks Peak LLM Performance on Kunlun XPU

Baidu Intelligent Cloud Tech Hub

Mar 18, 2026 · Artificial Intelligence

How vLLM‑Kunlun Brings CUDA‑Like Inference to Kunlun XPU: Architecture, Adaptation, and Performance Wins

This article details the vLLM‑Kunlun open‑source project that adapts the high‑performance vLLM inference engine to Baidu's Kunlun XPU, covering platform overview, model‑porting workflow, plugin architecture, concrete case studies with MIMO‑Flash‑V2 and Qwen 3.5, and the performance‑tuning techniques that enable seamless, GPU‑level inference on domestic hardware.

AIHardwareKunlun

0 likes · 12 min read

How vLLM‑Kunlun Brings CUDA‑Like Inference to Kunlun XPU: Architecture, Adaptation, and Performance Wins

Baidu Intelligent Cloud Tech Hub

Mar 6, 2026 · Artificial Intelligence

How Baidu’s End‑to‑End Quantization Stack Supercharges Large‑Model Inference on Kunlun XPU

Baidu Baige built a full‑stack quantization pipeline that integrates model‑level, framework‑level, and hardware‑level optimizations on the Kunlun XPU platform, enabling FP16/BF16 large models to be compressed to 25‑50% of their original size while boosting inference speed by 30‑50% and dramatically reducing memory consumption for enterprise deployments.

AI inferenceINT4INT8

0 likes · 16 min read

How Baidu’s End‑to‑End Quantization Stack Supercharges Large‑Model Inference on Kunlun XPU

Baidu Intelligent Cloud Tech Hub

Feb 12, 2026 · Artificial Intelligence

Deploying GLM-5 on Baidu Kunlun P800 XPU with vLLM‑Kunlun Plugin

This article explains how Baidu's new GLM-5 large model is adapted to the Kunlun P800 XPU, detailing the async reinforcement learning framework Slime, optimization techniques like INT8 quantization and tensor‑parallelism, and provides step‑by‑step deployment commands using the open‑source vLLM‑Kunlun plugin.

AI accelerationGLM-5INT8 Quantization

0 likes · 6 min read

Deploying GLM-5 on Baidu Kunlun P800 XPU with vLLM‑Kunlun Plugin