Baidu Intelligent Cloud Tech Hub
Author

Baidu Intelligent Cloud Tech Hub

We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.

133
Articles
0
Likes
189
Views
0
Comments
Recent Articles

Latest from Baidu Intelligent Cloud Tech Hub

100 recent articles max
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Feb 6, 2026 · Artificial Intelligence

Accelerating GLM‑4.x Inference on Kunlun XPU with SGLang & vLLM

Baidu’s Baige team successfully adapted the GLM‑4.x series language models to the Kunlun XPU platform by leveraging SGLang and the vLLM‑Kunlun plugin, employing agile adaptation, precision alignment with torch_xray, and extensive performance tuning to achieve GPU‑level accuracy and superior inference speed.

AIXPUhardware acceleration
0 likes · 6 min read
Accelerating GLM‑4.x Inference on Kunlun XPU with SGLang & vLLM
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jan 27, 2026 · Artificial Intelligence

Deploying Qwen3 on Kunlun P800: Full‑Parameter DPO Training and Inference Guide

This guide walks through setting up a Kunlun P800 XPU host, preparing Docker containers, deploying Qwen3‑8B/‑32B/‑VL models with vLLM‑Kunlun, benchmarking performance, and running full‑parameter DPO training using LLaMA‑Factory, providing scripts, configuration files, and troubleshooting tips for AI engineers.

DPOKunlun P800LLaMA-Factory
0 likes · 32 min read
Deploying Qwen3 on Kunlun P800: Full‑Parameter DPO Training and Inference Guide
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jan 20, 2026 · Artificial Intelligence

How LoongFlow Enables Expert‑Level AI Agents to Outperform Human Mathematicians

LoongFlow is an open‑source AI agent framework that combines a Plan‑Execute‑Summarize (PES) paradigm with a Hybrid Evolutionary Memory system, allowing agents to perform directed, iterative problem solving and achieve state‑of‑the‑art results on mathematical challenges, Kaggle‑style benchmarks, and real‑world tasks with dramatically higher efficiency.

Evolutionary AlgorithmsLoongFlowbenchmarking
0 likes · 15 min read
How LoongFlow Enables Expert‑Level AI Agents to Outperform Human Mathematicians
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jan 16, 2026 · Artificial Intelligence

How LoongFlow Empowers Expert‑Level AI Agents to Surpass Human Mathematicians

LoongFlow is an open‑source AI agent framework that combines a Plan‑Execute‑Summarize (PES) paradigm with a Hybrid Evolutionary Memory system to enable agents to perform long‑range, complex reasoning, achieving record‑breaking results on mathematical challenges and real‑world ML benchmarks while dramatically improving efficiency.

LoongFlowPES paradigmevolutionary optimization
0 likes · 15 min read
How LoongFlow Empowers Expert‑Level AI Agents to Surpass Human Mathematicians
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jan 12, 2026 · Artificial Intelligence

How to Reduce Large‑Model Inference Cold‑Start to Seconds with vLLM Optimizations

This article details how Baidu Cloud's hybrid‑cloud team leveraged the vLLM framework to cut the cold‑start time of massive models like Qwen3‑235B‑A22B from minutes to a few seconds through accelerated weight loading, CUDA‑graph capture postponement, cross‑instance state reuse, fork‑based process startup, and guard‑instance pre‑warming techniques.

CUDA Graphcold-start optimizationlarge-model inference
0 likes · 16 min read
How to Reduce Large‑Model Inference Cold‑Start to Seconds with vLLM Optimizations
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jan 5, 2026 · Artificial Intelligence

How Baidu Tianchi Supernodes Supercharge Large‑Model Inference: Architecture, Deployment, and Optimization

This article details Baidu's Tianchi supernode design and software tuning—covering hardware scale‑up, deployment planning, Prefill and Decode stage optimizations, quantization strategies, and communication schemes—to dramatically boost large‑model inference throughput and latency while lowering token‑cost.

AI infrastructureParallelismPerformance Optimization
0 likes · 20 min read
How Baidu Tianchi Supernodes Supercharge Large‑Model Inference: Architecture, Deployment, and Optimization
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Dec 24, 2025 · Artificial Intelligence

How Context Parallelism Slashes LLM First‑Token Latency by 80% for 128K Tokens

The article explains how the newly merged Context Parallelism (CP) technique in SGLang, combined with DeepSeek V3.2's Sparse Attention architecture, reduces first‑token latency by up to 80% and alleviates memory pressure for ultra‑long 128K‑token sequences, detailing both algorithmic innovations and engineering solutions.

AI infrastructureContext ParallelismLLM
0 likes · 10 min read
How Context Parallelism Slashes LLM First‑Token Latency by 80% for 128K Tokens
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Dec 17, 2025 · Artificial Intelligence

How AFD Splits Attention and FFN to Boost DeepSeek‑V3 Inference by Up to 19%

The article details the Attention‑FFN Disaggregation (AFD) technique used by Baidu Baige to separate self‑attention and feed‑forward network stages in DeepSeek‑V3 models, describing multi‑stage scheduling, three‑batch overlap, communication optimizations, and performance results that achieve up to 19% throughput improvement under a 100 ms SLO.

3BOAFDAttention-FFN Disaggregation
0 likes · 17 min read
How AFD Splits Attention and FFN to Boost DeepSeek‑V3 Inference by Up to 19%
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Dec 15, 2025 · Artificial Intelligence

Baidu Baige’s Breakthrough: Orchestrating Giant LLM Inference with Silent Instances

The article details Baidu Baige’s next‑generation distributed inference platform for trillion‑parameter LLMs, explaining how automated orchestration, the FedDeployment abstraction, SplitService unified view, Adaptive HPA predictive scaling, Silent Instances for second‑level activation, and the Staggered Batched Scheduler eliminate scaling limits, reduce TTFT by 30‑40%, boost throughput by up to 20%, and achieve cost‑effective, elastic AI compute.

AutoscalingKubernetesLLM
0 likes · 23 min read
Baidu Baige’s Breakthrough: Orchestrating Giant LLM Inference with Silent Instances