Tag

parallel training

1 views collected around this technical thread.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Apr 24, 2024 · Artificial Intelligence

Evolution and Challenges of AI Infrastructure: Scaling Large Models on Cloud GPUs

In this talk from the 2024 China Generative AI Conference, Li Peng outlines the escalating computational demands of large‑model training and inference, identifies power, memory and communication walls, and presents Alibaba Cloud’s DeepGPU solutions and best‑practice strategies for scaling AI workloads on cloud GPUs.

AI InfrastructureDeepGPUGPU performance
0 likes · 13 min read
Evolution and Challenges of AI Infrastructure: Scaling Large Models on Cloud GPUs