Big Data 12 min read

MaxCompute’s AI‑Ready Evolution: Architecture, Features, and Real‑World Use Cases

This article examines how Alibaba Cloud’s MaxCompute platform has been transformed for AI workloads, detailing its multi‑layer architecture, multimodal data storage, SQL AI functions, the Python‑based MaxFrame framework, and real‑world deployments in large‑model preprocessing, autonomous driving, and multimodal image labeling.

DataFunTalk

May 25, 2026

MaxCompute’s AI‑Ready Evolution: Architecture, Features, and Real‑World Use Cases

Introduction

With the rapid growth of artificial‑intelligence techniques, data volumes and processing complexity have surged, putting traditional data warehouses under unprecedented pressure. MaxCompute, Alibaba Cloud’s core big‑data compute service, is evolving to meet AI‑centric demands by integrating storage, model, compute, and engine layers into a unified Data+AI platform.

Four‑Layer Architecture

Data layer : Supports structured and unstructured data via BLOB fields for audio, video, and other multimodal formats. It connects to external storage engines such as OSS and Hologres through Object Table and Storage API, enabling cross‑engine metadata management without data movement.

Model layer : Hosts traditional machine‑learning models (XGBoost, LightGBM) and open‑source large models (Qwen, DeepSeek). Integration with the Bailei platform allows commercial flagship models to be managed uniformly.

Compute layer : Provides hybrid CPU/GPU scheduling. Users declare required resources declaratively, allowing flexible allocation for multimodal workloads that demand heterogeneous compute.

Engine layer : Offers two primary interfaces. The SQL engine includes SQL AI functions that invoke large models directly on structured data. The Python‑native MaxFrame framework delivers a distributed computation environment compatible with Pandas, XGBoost, LightGBM, and other open‑source libraries.

Key Capabilities

Multimodal Data Management : BLOB and JSON columns enable one‑row‑multiple‑column mixed storage, simplifying AI inference and AIGC data organization.

SQL AI Function : Allows analysts to call large models via simple SQL statements for offline inference, lowering the barrier to AI adoption.

MaxFrame :

Heterogeneous compute scheduling (CPU CU and GPU GU) via programmable APIs.

Distributed operators compatible with Pandas, XGBoost, LightGBM, etc., enabling automatic scaling beyond local resources.

Deep integration with DataWorks for interactive development, custom image support, and OSS‑mounted AI assistants.

Development Experience

MaxFrame SDK is publicly available and can be installed with pip install maxframe. It works in local environments such as VS Code and Jupyter Notebook, as well as within DataWorks Notebook via magic commands that launch a MaxFrame session. DataWorks also provides PyODPS3 nodes for seamless job submission.

Real‑World Scenarios and Case Studies

1. Large‑Model Data Pre‑Processing

A leading large‑model company needed to process petabyte‑scale data, allocate over 100 k cores elastically, and manage fine‑grained permissions. Using MaxFrame pipelines, the MinHash operator achieved >50 % performance improvement, handling 300 k core‑seconds per run and scaling to 160 k cores, far exceeding the original 100 k‑core requirement.

2. Automotive Embodied‑Intelligence

In autonomous‑driving pipelines, massive multimodal streams (images, video, radar, GPS) stored as ROS bags caused configuration and resource‑scheduling pain points. MaxFrame’s elastic compute and distributed processing boosted data‑processing throughput by >40 % compared with single‑node Python solutions.

3. Multimodal Image Tagging

By leveraging SQL AI functions, images are automatically labeled and vectorized with embeddings, supporting downstream retrieval and analysis.

4. End‑to‑End AI Asset Construction

Across scenarios—large‑model preprocessing, autonomous driving, multimodal tagging—MaxCompute provides a cloud‑native, elastic, high‑performance foundation that spans from storage to SQL and Python interfaces, continuously empowering industry AI transformation.

Conclusion

MaxCompute’s full‑stack Data+AI capabilities—spanning storage, model management, hybrid compute, and both SQL and Python engines—form a robust foundation for building AI data assets and deploying intelligent applications at scale.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data AI Multimodal MaxCompute Distributed computing MaxFrame SQL AI

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.