How Ant Group’s Baoling Models Push Toward AGI with MoE and Multimodal Innovations
In a detailed AICon talk, Ant Group’s Baoling team leader Zhou Jun outlines their latest large‑model training techniques, MoE architecture optimizations, multimodal breakthroughs, open‑source releases, and the strategic roadmap needed to turn AI into a ubiquitous, “scan‑code‑level” everyday assistant.
Baoling Large‑Model Vision
At AICon in Shanghai, Zhou Jun, head of Ant Group’s Baoling large‑model project, discussed the need to make AI as essential as QR‑code payments by deepening fundamental large models, creating easy‑to‑use services, and continuously expanding the intelligence frontier.
Training and AGI Insights
He emphasized that large‑model training is a competition of hyper‑parameters and architecture. Ant’s Baoling language and multimodal models use a MoE (Mixture‑of‑Experts) design, focusing on scaling insights, speed improvements, and guarding against precision loss. True AGI, he argued, requires seamless audio‑visual‑text integration and better evaluation standards to avoid mere imitation.
Infrastructure and Open‑Source Models
Ant has built a distributed intelligent engine that boosts hardware efficiency and inference performance. They released several open‑source models:
Ling‑lite‑1.5 : a 2.75B‑parameter language model achieving SOTA performance on many benchmarks.
Ring‑lite‑1.5 : an inference model matching Qwen‑3 8B, featuring a mixed‑linear attention mechanism that cuts compute cost.
Ming‑lite‑omni : a full‑modal MoE model supporting audio, video, image, and text input/output, aiming for GPT‑4o‑level capabilities.
These models are built on three core capabilities: lightweight distributed training, heterogeneous‑hardware adaptive strategies, and fine‑grained MoE optimization, including expert‑routing and NormHead weight normalization.
MoE Architecture and Scaling Laws
The team highlighted that MoE can dramatically increase model capacity while keeping compute modest. By analyzing scaling laws, they showed that a 3‑B‑parameter MoE activation can match a 10‑12B dense model, and that careful scaling can predict benchmark performance.
Mixed Linear Attention
To handle long sequences efficiently, Ant introduced a hybrid attention that combines ~10% traditional Softmax attention with linear attention, achieving up to 2.2× speed‑up on 32K contexts without sacrificing accuracy.
Multimodal Advances
Ming‑lite‑omni employs dedicated routers for text, video, and audio, enabling precise cross‑modal resource allocation and supporting features like speech synthesis, dialect handling, and real‑time video understanding.
Future Directions
Beyond technical gains, the roadmap stresses creating AI that surpasses mere imitation, establishing robust evaluation standards, and fostering community collaboration through open‑source releases of models, reinforcement‑learning frameworks, and multi‑agent tools.
Conclusion
Ant’s Baoling effort illustrates a holistic approach—combining algorithmic innovation, system‑level engineering, and open collaboration—to accelerate the path toward AGI and make AI a ubiquitous, low‑cost utility for all users.
AntTech
Technology is the core driver of Ant's future creation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.