Insights from the 2024 Inclusion·Bund Conference: From Data for AI to AI for Data
The 2024 Inclusion·Bund conference brought together academia and industry leaders to discuss how data technologies are evolving and aligning with AI, covering trends in large‑model storage, synthetic data generation, AI‑enhanced databases, and Ant Group's emerging AI‑centric data ecosystem.
The 2024 Inclusion·Bund conference, co‑hosted by Ant Group, Shanghai Jiao Tong University and Fudan University, featured a forum titled “From Data for AI to AI for Data,” where scholars and industry experts examined the shifting value of data in the AI era.
Professor Zheng Weimin, an academician of the Chinese Academy of Engineering and professor at Tsinghua University, and Yan Shuicheng, chief scientist at Kunlun Wanwei & Tian Gong Intelligence and a Singaporean Academy of Engineering fellow, shared insights on the changing trends of data technology and its integration with AI.
They highlighted that large models demand tighter alignment with data technologies; the models’ voracious data consumption influences storage, production, processing, circulation, and consumption pipelines, prompting new technical solutions.
In the storage domain, Professor Zheng explained that every stage of a large‑model lifecycle—data acquisition, preprocessing, training checkpoints, and inference—poses distinct challenges for storage systems, such as handling massive multimodal files, frequent random reads of small samples, and efficient loading of model parameters.
As the era shifts from big data to AI, the rapid consumption of data makes synthetic‑data techniques increasingly important. A June report by Epoch AI warned that from 2026 onward, human‑generated data will be insufficient to feed models, and by 2028 large language models may exhaust available human data.
Yan Shuicheng suggested that future improvements may come from generating high‑quality synthetic data through dialogues, discussions, and evaluations among large models, rather than merely augmenting existing datasets.
Database expert Yang Chuanhui, CTO of the domestic distributed database OceanBase, described how modern databases are integrating AI capabilities, supporting both SQL and vector search, and leveraging AI to optimize development and management tools.
Chen Wenguang, director of Ant Technology Research Institute, emphasized that aligning AI requires changes at the hardware, programming language, and compiler levels, and introduced a fused AI‑big‑data‑science computing model (FABS).
Ant Group’s Vice President Luo Ji outlined the company’s AI‑era intelligent data system, built on a multimodal storage and compute engine, featuring row‑column hybrid storage, vector database capabilities for new search and interaction, and a high‑performance multimodal cache for large‑model training.
The core “fusion data lake” integrates structured, semi‑structured, and unstructured data, focusing on unified metadata, three‑line consistency, a single source of truth, and robust security and quality guarantees for unstructured data.
Higher‑level data applications include high‑value data production (ingestion, perception, labeling, synthesis), full‑modal data R&D, new feature services for machines and agents, and comprehensive data analysis and scientific experimentation frameworks.
Overall, the forum underscored that the data technology field is entering a new historical phase driven by AI alignment and value‑centered data strategies.
AntTech
Technology is the core driver of Ant's future creation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.