Big Data 19 min read

How Data Platforms Are Shifting from Cost Efficiency to Value in the AI Era

The talk reviews the evolution of data technologies from early database storage to today’s generative AI-driven era, highlighting how massive data, multimodal processing, and advanced analytics are transforming data systems from cost‑centered infrastructures to value‑focused ecosystems that empower intelligent agents, open data ecosystems, and new application paradigms.

Data Thinking Notes
Data Thinking Notes
Data Thinking Notes
How Data Platforms Are Shifting from Cost Efficiency to Value in the AI Era

This article compiles a speech by Luo Ji, head of Ant Group’s Data Platform and Services, delivered at the 2024 Bund Conference “From DATA for AI to AI for DATA” insight forum.

In the past two years, generative AI has achieved major breakthroughs, merging massive data with huge compute power to drive countless product innovations. Simultaneously, data technology is entering a new historical stage filled with unprecedented challenges and opportunities.

Historical Evolution of Data Technology

In the 1990s, the rise of the Internet spurred efficient database storage and management, laying the foundation for digital information in small and medium enterprises and supporting e‑commerce. Databases evolved toward higher performance and availability.

From 2003 to 2006, seminal papers on large‑scale distributed storage and computation—MapReduce, Bigtable, and Google File System—ushered in the industrial era of big data. The proliferation of mobile internet, smartphones, apps, and mini‑programs enriched data portraits, fueling personalized services at scale.

In 2017, the “Attention Is All You Need” paper provided a key foundation for generative AI. Large‑model‑centric intelligent technologies now enable every person to have a comprehensive AI assistant, marking the start of a data‑intelligence fusion era.

From Cost‑Center to Value‑Center

During the big‑data era, data technology focused on infrastructure: latency, throughput, resource utilization, and cost defined competitive advantage. In the emerging data‑intelligence fusion era, the scale, diversity, quality, and accuracy of data assets become critical to intelligent application outcomes, shifting the focus to data value.

Data production now expands beyond traditional web crawling to include fine‑grained capture from wearables, smart appliances, and IoT devices, turning every observation—human, machine, or embodied intelligence—into digital assets with immense value.

High‑quality, professionally annotated data and synthetic data are essential for training large models and maintaining competitive advantage.

1. Expansion of Data Production Methods

Traditional : Search‑recommendation and personalized services rely heavily on large‑scale web crawling of public data combined with proprietary private‑domain data.

New era : Data production extends to detailed daily‑life recordings from wearables, smart home devices, and IoT terminals, digitizing everything humans and machines encounter.

Professional data annotation and synthesis become increasingly vital to ensure data quality for large‑model training.

2. Evolution of Data Asset Processing and Services

Data form shift : From structured to increasingly unstructured, multimodal data (text, image, audio, video). IDC predicts that by 2027 unstructured data will account for 86.8% of total data (~250 ZB).

Challenges include multimodal conversion (e.g., video audio to text), cleaning, quality assessment, mining, and expert review of unstructured content.

Service shift : From user‑centric to machine‑ and agent‑centric. Agents require efficient semantic representations, multimodal encoding/decoding, and low‑latency, high‑throughput data transmission, especially under weak network conditions.

3. New Paradigms of Data Application Challenges

Mixed scalar‑vector retrieval : Emerging search and interaction scenarios demand hybrid retrieval of scalar and vector data, enabling natural‑language and multimodal queries (e.g., scanning a wine bottle with a phone).

Balancing recall, storage cost, latency, and indexing overhead for massive vector spaces is a key infrastructure challenge.

Uncertainty in intelligent applications : Generative AI introduces outcome uncertainty; reducing this uncertainty relies on rigorous data experimentation, evaluation, and feedback loops.

Open data ecosystem : Future intelligent applications will operate in an open ecosystem requiring secure, compliant data fusion, value discovery, circulation, and measurement.

Outlook: Ant’s Intelligent Data System in the Data‑Intelligence Fusion Era

Ant Group has been building a multimodal storage and compute engine, vector database capabilities, and mixed‑retrieval analytics to support large‑model training and intelligent services.

The unified data lake aims to seamlessly integrate structured, semi‑structured, and unstructured data, with strong governance, metadata management, and security for multimodal assets.

At the application layer, a “value‑driven data” philosophy guides high‑value data production, multimodal feature services, and AI‑assisted data insight agents, lowering the barrier for data analysis.

In this rapidly evolving intelligent era, the data technology field is transitioning from a cost‑&‑efficiency focus to a value‑centered paradigm, unlocking unprecedented opportunities.

artificial intelligenceBig Datamultimodaldata valuedata platforms
Data Thinking Notes
Written by

Data Thinking Notes

Sharing insights on data architecture, governance, and middle platforms, exploring AI in data, and linking data with business scenarios.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.