Artificial Intelligence 16 min read

Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI

The Ant Group’s 10th Technical Open Day gathered leading AI experts who examined the current state and future directions of multimodal large models, embodied AI, world models, transformer architectures, and vertical applications, offering a comprehensive view of the challenges and opportunities on the path toward AGI.

AntTech

May 30, 2025

Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI

On May 27, Ant Group hosted its 10th Technical Open Day, bringing together senior AI practitioners and top researchers to discuss pressing AGI topics, share observations, and explore future directions.

Multimodal Future – Alibaba Vice President Xu Zhuhong defined multimodal large models, highlighted existing models such as CLIP, BLIP, BLIP‑2, Flamingo, LLaVA, and Omni‑LLM, and identified technical challenges. He emphasized the shift toward unified multimodal models that handle both understanding and generation, multimodal reasoning agents, and the need for improved perception, control, and long‑range reasoning.

Embodied Models – Guo Yandong, CEO of Zhihua, described the VLA embodied model that integrates textual commands with environmental perception to output robot actions, and introduced the GO‑VLA model and AlphaBot2 platform, outlining three development stages: breakthrough technology, system‑driven scaling, and ecosystem closure.

World Models – Lai Jie, CEO of StarDust, traced the evolution of world models from psychological mental models to modern computational approaches, stressing the necessity of robot‑world interaction loops, third‑person imagination, physical constraint understanding, and predictive reasoning for true closed‑loop learning.

Transformer Architecture and Future Models – HKU assistant professor Kong Lingpeng explained the generality of Transformers over LSTM/RNN/CNN and suggested combining them with Graph Neural Networks to address interpretability and hallucination. Sand.ai founder Cao Yue and Alibaba’s Lin Junyang discussed scaling laws, MoE, diffusion, linear and sparse attention, and video generation models such as Magi‑1, highlighting both opportunities and open research questions. Ant’s MoE‑based large models and the importance of algorithm‑data‑compute co‑design were also covered.

Vertical Applications – Ant’s medical AI lead Wei Peng presented the AI Health Assistant, detailing data acquisition, multi‑stage training (SFT + RL via GRAO), role‑play evaluation, and safety measures across medical, ethical, and compliance dimensions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

large language models Embodied AI AGI AI safety multimodal models world models transformer architecture

Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.