Tag

Pipeline Parallel

0 views collected around this technical thread.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Apr 16, 2025 · Artificial Intelligence

Optimizing Multi‑Node Distributed LLM Inference with ACK Gateway and vLLM

This article presents a step‑by‑step guide for deploying and optimizing large‑language‑model inference across multiple GPU‑enabled nodes using ACK Gateway with Inference Extension, vLLM’s tensor‑ and pipeline‑parallel techniques, and Kubernetes resources such as LeaderWorkerSet, PVCs, and custom routing policies, followed by performance benchmarking and analysis.

ACK GatewayDistributed InferenceLLM
0 likes · 19 min read
Optimizing Multi‑Node Distributed LLM Inference with ACK Gateway and vLLM
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
May 10, 2024 · Artificial Intelligence

GPU Memory Analysis and Distributed Training Strategies

This article explains how GPU memory is allocated during model fine‑tuning, describes collective communication primitives, and compares data parallel, model parallel, ZeRO, pipeline parallel, mixed‑precision, and checkpointing techniques for reducing memory consumption in large‑scale AI training.

GPU memoryMixed PrecisionPipeline Parallel
0 likes · 9 min read
GPU Memory Analysis and Distributed Training Strategies