How Tencent Engineers Shattered the 128‑GPU ImageNet Training Record in 2m31s
Tencent engineers broke the world record for training ImageNet with 128 V100 GPUs in just 2 minutes 31 seconds, detailing a suite of optimizations—including a new Light distributed training framework, single‑machine speed boosts, multi‑machine communication enhancements, and advanced batch convergence techniques—that together dramatically cut training time while maintaining high accuracy.
Recently, Tencent engineers broke the world record for training ImageNet with 128 GPUs in 2 minutes 31 seconds, 7 seconds faster than the previous record.
Using a 25 Gbps VPC network, 128 V100 GPUs, and the new Light large‑scale distributed multi‑machine multi‑GPU training framework, they completed 28 epochs with 93% top‑5 accuracy.
Motivation: AI models are becoming increasingly complex, with massive data volumes, deeper networks, billions of parameters, and long training times, leading to high costs.
Tencent aimed to push the limits of AI model training frameworks.
They developed the Light framework, optimizing single‑machine training speed, multi‑machine communication, and batch convergence.
Single‑machine speed improvements
Addressed slow remote storage access, CPU contention from many threads, and JPEG decoding bottlenecks by caching data on SSD/memory, auto‑tuning thread counts, and pre‑decoding images.
Multi‑machine communication optimization
Implemented adaptive gradient fusion, 2D communication with multiple streams, and gradient compression to reduce communication volume and improve bandwidth utilization, raising throughput to 3100 samples/second.
Batch convergence techniques
Used large‑batch training with multi‑stage resolution, gradient‑compression precision compensation, and AutoML (TianFeng) for hyper‑parameter search, achieving 93% top‑5 accuracy after only 28 epochs.
All these optimizations enabled the new world record, now integrated into Tencent Cloud’s Intelligent‑Ti AI platform, offering a one‑stop service for data preprocessing, model building, training, evaluation, and deployment.
Future work will continue to improve usability, training and inference performance for the broader AI community.
Tencent Tech
Tencent's official tech account. Delivering quality technical content to serve developers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.