Big Data 13 min read

How MaxCompute Evolves into an AI‑Ready Data Platform: Architecture, Core Capabilities, and Real‑World Cases

The article details MaxCompute's transformation into a cloud‑native, AI‑centric data warehouse, covering multi‑modal storage, model management, heterogeneous CPU/GPU scheduling, SQL AI functions, the MaxFrame Python framework, and several production case studies that demonstrate performance gains of up to 50% and elastic resource scaling to 160 000 cores.

DataFunTalk
DataFunTalk
DataFunTalk
How MaxCompute Evolves into an AI‑Ready Data Platform: Architecture, Core Capabilities, and Real‑World Cases

MaxCompute, Alibaba Cloud's core big‑data compute platform, is undergoing a two‑track evolution toward AI‑centric workloads. Its architecture is divided into four layers—data, model, compute, and engine—each adding capabilities that enable unified multi‑modal data management and high‑performance AI processing.

Data Layer

The platform now supports structured and unstructured data via a BLOB field type, allowing audio, video, and image files to be stored alongside traditional tables. Object Table and external storage connectors (OSS, Hologres, etc.) provide seamless cross‑engine access without moving data. A unified metadata service (Max Meta) and Storage API enable metadata management and data access across storage back‑ends.

Model Layer

MaxCompute hosts traditional machine‑learning models such as XGBoost and LightGBM, as well as open‑source large models (e.g., Qwen‑3, DeepSeek). It also integrates commercial flagship models from the Bailei platform, offering a single point for model registration, versioning, and serving.

Compute Layer

Both CPU and GPU resources are available through a mixed‑resource scheduler. Users can declare required resources declaratively, satisfying the heavy compute demands of multi‑modal AI tasks.

Engine Layer

Two core compute interfaces are exposed:

SQL Engine : SQL AI functions let analysts invoke large models directly from SQL for offline inference, lowering the barrier for AI‑driven analytics.

MaxFrame : A native Python distributed‑compute framework compatible with Pandas, XGBoost, LightGBM, and other open‑source libraries. It offers heterogeneous scheduling, distributed operators, and tight integration with DataWorks for interactive development, custom image support, and SDK‑based model invocation.

Key Features of MaxFrame

Heterogeneous scheduling of CPU (CU) and GPU (GU) resources.

Distributed operators that automatically parallelize workloads across the cluster.

Stable, integrated development experience via DataWorks, custom Docker images, and OSS‑mounted AI assistants.

Real‑World Cases

Large‑model data preprocessing : A leading LLM provider processed petabyte‑scale data with a MinHash operator that achieved >50% performance improvement. A single pipeline run consumed 300 000 core‑seconds, scaling elastically to 160 000 cores—well beyond the required 100 000 cores—and dramatically shortened the PB‑level data processing cycle.

Automotive embodied‑intelligence : Using MaxFrame, a customer processed multi‑modal sensor data (images, video, radar, GPS) from ROS‑bag files. Distributed processing delivered >40% speedup over single‑node Python, and elastic resources handled workload spikes efficiently.

Multi‑modal data processing & image labeling : MaxFrame’s Object Table unified storage of images, videos, and metadata, while built‑in MinHash and custom Docker images (e.g., yolo11n) enabled fast deduplication and model‑driven labeling. SQL AI functions were used to generate embeddings for image retrieval.

Conclusion

By linking storage, SQL, and Python through a unified Data + AI stack, MaxCompute provides a cloud‑native, elastic, high‑performance foundation for building AI data assets and deploying intelligent applications across industries such as large‑model training, autonomous driving, and fintech.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MaxComputeDistributed computingData+AIMaxFrameLarge‑model preprocessingMulti‑modal storageSQL AI function
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.