Kuaishou Large Model
Jul 11, 2024 · Artificial Intelligence
Pipeline-Aware Offloading & Balanced Checkpointing Accelerate LLM Training
Researchers from Kwai’s large-model team present a novel training system that combines pipeline-parallel-aware activation offloading with a compute-memory balanced checkpointing strategy, enabling lossless acceleration of large language models, achieving up to 42.7% MFU on 256 NVIDIA H800 GPUs while reducing memory usage.
GPU trainingKwaiactivation offloading
0 likes · 13 min read