Insights from Ant Group’s 10th Technical Open Day: Multimodal, Embodied, and Future Model Architectures for AGI
The Ant Group’s 10th Technical Open Day gathered leading AI experts who examined the current state and future directions of multimodal large models, embodied AI, world models, transformer architectures, and vertical applications, offering a comprehensive view of the challenges and opportunities on the path toward AGI.
On May 27, Ant Group hosted its 10th Technical Open Day, bringing together senior AI practitioners and top researchers to discuss pressing AGI topics, share observations, and explore future directions.
Multimodal Future – Alibaba Vice President Xu Zhuhong defined multimodal large models, highlighted existing models such as CLIP, BLIP, BLIP‑2, Flamingo, LLaVA, and Omni‑LLM, and identified technical challenges. He emphasized the shift toward unified multimodal models that handle both understanding and generation, multimodal reasoning agents, and the need for improved perception, control, and long‑range reasoning.
Embodied Models – Guo Yandong, CEO of Zhihua, described the VLA embodied model that integrates textual commands with environmental perception to output robot actions, and introduced the GO‑VLA model and AlphaBot2 platform, outlining three development stages: breakthrough technology, system‑driven scaling, and ecosystem closure.
World Models – Lai Jie, CEO of StarDust, traced the evolution of world models from psychological mental models to modern computational approaches, stressing the necessity of robot‑world interaction loops, third‑person imagination, physical constraint understanding, and predictive reasoning for true closed‑loop learning.
Transformer Architecture and Future Models – HKU assistant professor Kong Lingpeng explained the generality of Transformers over LSTM/RNN/CNN and suggested combining them with Graph Neural Networks to address interpretability and hallucination. Sand.ai founder Cao Yue and Alibaba’s Lin Junyang discussed scaling laws, MoE, diffusion, linear and sparse attention, and video generation models such as Magi‑1, highlighting both opportunities and open research questions. Ant’s MoE‑based large models and the importance of algorithm‑data‑compute co‑design were also covered.
Vertical Applications – Ant’s medical AI lead Wei Peng presented the AI Health Assistant, detailing data acquisition, multi‑stage training (SFT + RL via GRAO), role‑play evaluation, and safety measures across medical, ethical, and compliance dimensions.
AntTech
Technology is the core driver of Ant's future creation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.