AI Compiler Forum at DataFun Summit 2023: Tile-Based Deep Learning Compilation, Graph Scheduling for Domain‑Specific Accelerators, and Triton on Hopper
The DataFun Summit 2023 AI Compiler Forum gathered leading researchers to present cutting‑edge techniques on tile‑based deep learning compilation, efficient graph scheduling for domain‑specific accelerators, large‑model deployment, and the latest advancements of OpenAI Triton on NVIDIA Hopper, offering practical insights for AI system developers.
As deep learning and AI applications rapidly expand, AI compilers have become essential for boosting performance and resource utilization across diverse hardware platforms.
On September 15, 2023, the DataFun Summit 2023 Large Model and AI Foundations Software Forum hosted an AI Compiler session featuring experts from academia and industry.
Speaker: Xue Jilong, Senior Researcher, Microsoft Research Asia Talk: "Efficient Deep Learning Compilation System Based on Tile Abstraction" Outline: Introduces a unified tile abstraction and presents a series of research projects (Rammer, Roller, Welder, Cocktailer) that improve hardware parallelism, compilation speed, global memory efficiency, and control‑flow execution for deep learning workloads.
Audience gains: a tile‑based compilation system, methods to enhance hardware parallelism, improve compilation efficiency, optimize global memory access, and unify data‑flow and control‑flow scheduling.
Speaker: Ma Lingxiao, Senior Researcher, Microsoft Research Asia Talk: (title not explicitly listed, but focuses on deep learning compilation frameworks) – discusses recent advances in deep learning compiler research published at OSDI, SOSP, and USENIX ATC.
Audience gains: insights into state‑of‑the‑art deep learning compilation techniques and research directions.
Speaker: Dan Xiaoqiang, Independent Scholar Talk: "Effectively Scheduling Computational Graphs of Deep Neural Networks toward Their Domain‑Specific Accelerators" Outline: Describes graph‑scheduling techniques that keep intermediate data on‑chip, reduce off‑chip bandwidth, and achieve up to 11× performance improvement over traditional kernel‑by‑kernel approaches on AI DSAs.
Audience gains: understanding of AI compiler challenges, graph‑scheduling methods, and how to rethink DSA architecture design.
Speaker: Feng Siyuan, Ph.D. Candidate, Shanghai Jiao Tong University Talk: "Deploying Large Models with Machine‑Learning‑Based Compilation Techniques" Outline: Covers ML‑based compilation (MLC), an overview of Apache TVM Unity, deployment of large models using MLC‑LLM, and a summary of practical deployment considerations.
Audience gains: knowledge of challenges in large‑model deployment, role of MLC, technical details of MLC‑LLM, and practical deployment strategies.
Speaker: Wang Biao, NVIDIA Architect Talk: "Triton on Hopper" Outline: Explains OpenAI Triton's balance of kernel development efficiency, flexibility, and performance; details API changes and implementation for NVIDIA Hopper support; reports performance gains of over 70% of cuBLAS half‑precision matrix multiplication on H100 and 1.4× speedup over A100.
Audience gains: understanding of Triton's Hopper API changes, technical path for Hopper support, and current performance plus future plans.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.