Artificial Intelligence 25 min read

How Ant Group’s Baoling Models Push Toward AGI with MoE and Multimodal Innovations

In a detailed AICon talk, Ant Group’s Baoling team leader Zhou Jun outlines their latest large‑model training techniques, MoE architecture optimizations, multimodal breakthroughs, open‑source releases, and the strategic roadmap needed to turn AI into a ubiquitous, “scan‑code‑level” everyday assistant.

AntTech
AntTech
AntTech
How Ant Group’s Baoling Models Push Toward AGI with MoE and Multimodal Innovations

Baoling Large‑Model Vision

At AICon in Shanghai, Zhou Jun, head of Ant Group’s Baoling large‑model project, discussed the need to make AI as essential as QR‑code payments by deepening fundamental large models, creating easy‑to‑use services, and continuously expanding the intelligence frontier.

Training and AGI Insights

He emphasized that large‑model training is a competition of hyper‑parameters and architecture. Ant’s Baoling language and multimodal models use a MoE (Mixture‑of‑Experts) design, focusing on scaling insights, speed improvements, and guarding against precision loss. True AGI, he argued, requires seamless audio‑visual‑text integration and better evaluation standards to avoid mere imitation.

Infrastructure and Open‑Source Models

Ant has built a distributed intelligent engine that boosts hardware efficiency and inference performance. They released several open‑source models:

Ling‑lite‑1.5 : a 2.75B‑parameter language model achieving SOTA performance on many benchmarks.

Ring‑lite‑1.5 : an inference model matching Qwen‑3 8B, featuring a mixed‑linear attention mechanism that cuts compute cost.

Ming‑lite‑omni : a full‑modal MoE model supporting audio, video, image, and text input/output, aiming for GPT‑4o‑level capabilities.

These models are built on three core capabilities: lightweight distributed training, heterogeneous‑hardware adaptive strategies, and fine‑grained MoE optimization, including expert‑routing and NormHead weight normalization.

MoE Architecture and Scaling Laws

The team highlighted that MoE can dramatically increase model capacity while keeping compute modest. By analyzing scaling laws, they showed that a 3‑B‑parameter MoE activation can match a 10‑12B dense model, and that careful scaling can predict benchmark performance.

Mixed Linear Attention

To handle long sequences efficiently, Ant introduced a hybrid attention that combines ~10% traditional Softmax attention with linear attention, achieving up to 2.2× speed‑up on 32K contexts without sacrificing accuracy.

Multimodal Advances

Ming‑lite‑omni employs dedicated routers for text, video, and audio, enabling precise cross‑modal resource allocation and supporting features like speech synthesis, dialect handling, and real‑time video understanding.

Future Directions

Beyond technical gains, the roadmap stresses creating AI that surpasses mere imitation, establishing robust evaluation standards, and fostering community collaboration through open‑source releases of models, reinforcement‑learning frameworks, and multi‑agent tools.

Conclusion

Ant’s Baoling effort illustrates a holistic approach—combining algorithmic innovation, system‑level engineering, and open collaboration—to accelerate the path toward AGI and make AI a ubiquitous, low‑cost utility for all users.

multimodal AIlarge language modelsMixture of ExpertsOpen Sourcescaling lawsAI Infrastructure
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.