Alibaba Cloud Infrastructure
Mar 22, 2023 · Artificial Intelligence
CUTLASS Extreme Performance Optimization and Its Application in Alibaba's Recommendation System
At the GTC conference, the talk presents Alibaba Cloud’s heterogeneous computing platform and introduces the Open Deep Learning API (ODLA), then details how CUTLASS‑based operator fusion dramatically accelerates attention and MLP layers in large‑scale recommendation models, achieving multi‑fold performance gains in production.
CUTLASSGPU computingPerformance Optimization
0 likes · 5 min read