Tagged articles
6 articles
Page 1 of 1
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
May 15, 2026 · Artificial Intelligence

How PD (Prefill‑Decode) Disaggregation Makes LLM Inference Faster and More Stable

The article explains PD (Prefill‑Decode) disaggregation, an architecture that separates the compute‑bound Prefill stage from the memory‑bound Decode stage onto different GPU pools, eliminating interference, enabling independent scaling, leveraging hardware specialization, and delivering up to 85% lower tail latency for large language model inference.

GPU scalingKV cache transportLLM inference
0 likes · 10 min read
How PD (Prefill‑Decode) Disaggregation Makes LLM Inference Faster and More Stable
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 25, 2026 · Artificial Intelligence

Scaling Multimodal Reinforcement Learning with NVIDIA Isaac Lab and TiledCamera

This article explains how to use NVIDIA Isaac Lab and the TiledCamera component to run large‑scale, multimodal reinforcement learning on GPU clusters, covering environment setup, noVNC visualization, command‑line execution, distributed training with torchrun, and performance analysis across multiple GPU configurations.

GPU scalingNVIDIA Isaac LabTiledCamera
0 likes · 12 min read
Scaling Multimodal Reinforcement Learning with NVIDIA Isaac Lab and TiledCamera
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 18, 2024 · Artificial Intelligence

Why Training on 1,000 GPUs Is Harder Than You Think—and How to Tame It

Training deep learning models on a thousand GPUs faces steep communication overhead, higher failure probability, and scaling inefficiencies, but by profiling each step, overlapping compute and communication, using gradient bucketing and accumulation, and employing elastic training techniques, practitioners can approach near‑linear performance while mitigating common pitfalls.

GPU scalingLarge ModelsPerformance Optimization
0 likes · 13 min read
Why Training on 1,000 GPUs Is Harder Than You Think—and How to Tame It
Kuaishou Tech
Kuaishou Tech
Jul 16, 2021 · Artificial Intelligence

Bagua: An Open‑Source Distributed Training Framework for Deep Learning

Bagua is a distributed training framework co‑developed by Kuaishou and ETH Zürich that combines algorithmic and system‑level optimizations—such as decentralized, asynchronous, and compressed communication—to achieve up to 60% higher performance than existing frameworks like PyTorch‑DDP, Horovod, and BytePS across various AI workloads.

BaguaGPU scalingPyTorch
0 likes · 15 min read
Bagua: An Open‑Source Distributed Training Framework for Deep Learning