Artificial Intelligence 9 min read

How Tencent Engineers Shattered the 128‑GPU ImageNet Training Record in 2m31s

Tencent engineers broke the world record for training ImageNet with 128 V100 GPUs in just 2 minutes 31 seconds, detailing a suite of optimizations—including a new Light distributed training framework, single‑machine speed boosts, multi‑machine communication enhancements, and advanced batch convergence techniques—that together dramatically cut training time while maintaining high accuracy.

Tencent Tech

Aug 26, 2020

How Tencent Engineers Shattered the 128‑GPU ImageNet Training Record in 2m31s

Recently, Tencent engineers broke the world record for training ImageNet with 128 GPUs in 2 minutes 31 seconds, 7 seconds faster than the previous record.

Using a 25 Gbps VPC network, 128 V100 GPUs, and the new Light large‑scale distributed multi‑machine multi‑GPU training framework, they completed 28 epochs with 93% top‑5 accuracy.

Motivation: AI models are becoming increasingly complex, with massive data volumes, deeper networks, billions of parameters, and long training times, leading to high costs.

Tencent aimed to push the limits of AI model training frameworks.

They developed the Light framework, optimizing single‑machine training speed, multi‑machine communication, and batch convergence.

Single‑machine speed improvements

Addressed slow remote storage access, CPU contention from many threads, and JPEG decoding bottlenecks by caching data on SSD/memory, auto‑tuning thread counts, and pre‑decoding images.

Multi‑machine communication optimization

Implemented adaptive gradient fusion, 2D communication with multiple streams, and gradient compression to reduce communication volume and improve bandwidth utilization, raising throughput to 3100 samples/second.

Batch convergence techniques

Used large‑batch training with multi‑stage resolution, gradient‑compression precision compensation, and AutoML (TianFeng) for hyper‑parameter search, achieving 93% top‑5 accuracy after only 28 epochs.

All these optimizations enabled the new world record, now integrated into Tencent Cloud’s Intelligent‑Ti AI platform, offering a one‑stop service for data preprocessing, model building, training, evaluation, and deployment.

Future work will continue to improve usability, training and inference performance for the broader AI community.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Machine Learning GPU Tencent Cloud large-scale AI ImageNet

Written by

Tencent Tech

Tencent's official tech account. Delivering quality technical content to serve developers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.