Cloud Computing 19 min read

2024 Alibaba Cloud Infrastructure Network Team: AI‑Scale Network Innovations, Academic Achievements, Open‑Source Contributions and Industry Outreach

The 2024 report of Alibaba Cloud's Infrastructure Network team details AI‑driven network breakthroughs, high‑performance protocol stacks, large‑scale monitoring systems, numerous top‑conference paper acceptances, open‑source ecosystem initiatives, and extensive industry outreach, highlighting the evolving AI infra landscape.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
2024 Alibaba Cloud Infrastructure Network Team: AI‑Scale Network Innovations, Academic Achievements, Open‑Source Contributions and Industry Outreach

In the AI era, large models are reshaping the digital world, driving unprecedented growth in model size, data mining, and compute power, while prompting a shift from pure scaling to optimized inference and mixed‑expert MoE models.

Technical Breakthroughs

• HPN7.0 EPOD architecture launched, delivering a 100% inference communication performance boost and becoming the first AI‑centric network paper accepted by SIGCOMM. • Self‑developed high‑performance network protocol stack Solar achieved an 18% end‑to‑end performance gain in congested scenarios and introduced the first UEC‑ready 400 G AI‑compute NIC. • Integrated training‑inference monitoring and fault‑diagnosis system reduced fault localisation time from hours/days to minutes, earning an NSDI 2025 paper. • ACCL + C4D communication library productized, improving full‑stack communication efficiency by over 30% and winning the HPCA 2025 best‑paper award. • eCore next‑generation DCI architecture began large‑scale deployment, providing a unified IPv6/SRv6 single‑stack solution for AI‑scale interconnect.

Academic Achievements

• Approximately 20 papers accepted at top venues (SIGCOMM, NSDI, OSDI, HPCA, OFC) covering AI‑compute cluster networking, high‑performance protocol stacks, communication libraries, serverless, fault diagnosis, storage, and optical networking. • Five SIGCOMM 2024 papers, including the first AI‑compute cluster architecture paper and the Crux scheduling paper (honorable mention). • Five NSDI 2025 papers on AI training fault diagnosis, large‑scale LLM simulation (SimAI), RDMA container scalability, GPU‑disaggregated serving, and AI‑optimized CDN congestion control. • One OSDI 2024 paper on burstable cloud block storage with DPUs. • One HPCA 2025 paper on the C4 solution for real‑time anomaly detection and communication optimization. • Four OFC 2024 papers on open/disaggregated optical networks, fiber loss statistics, DCI unavailability analysis, and QoT estimation using neural networks.

Open‑Source Ecosystem

• Founded the High‑Throughput Ethernet (ETH+) Consortium and released the ETH+ SPEC 1.0 network standard for AI‑scale workloads. • Joined the Ultra‑Ethernet Consortium (UEC) technical committee, influencing next‑generation AI network standards. • Initiated the SONiC "Phoenix Wing" plan, contributing SRv6 optimizations to the community. • Open‑sourced the full‑stack high‑precision AI cluster simulator SimAI (GitHub ★ 430) and the Artificial Intelligence Communication Benchmark (AICB) suite. • Released the Universal Network Platform (UNP) hardware specification, accelerating white‑box switch development and ecosystem diversification.

Industry Outreach

• Presented at major conferences (SIGCOMM/APNET, OFC, OCP Global Summit, CCF HPC, CCF ChinaNet, CCF Distributed Computing, World Internet Conference) and received awards such as the SIGCOMM 2024 Honorable Mention and the CCF HPC China Innovation Award. • Recognized with the World Internet Conference Leading Technology Award for AI Infra, marking the first AI‑infra award in the event’s history. • Engaged in round‑table discussions on AI and next‑generation Internet at the Wuzhen Summit.

cloud computingHigh Performance Computingopen sourceAI infrastructureConference Papersdata center networking
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.