Artificial Intelligence 18 min read

From Big Data to Large Models: Alibaba Cloud AI Platform Architecture and Practices for Search Recommendation

This presentation details Alibaba Cloud's AI platform, covering the end‑to‑end pipeline from big‑data processing and feature engineering to large‑model training, inference optimization, recommendation system architecture, and RAG applications, highlighting practical engineering solutions and performance gains.

DataFunTalk
DataFunTalk
DataFunTalk
From Big Data to Large Models: Alibaba Cloud AI Platform Architecture and Practices for Search Recommendation

Shi Xing, product architecture lead of Alibaba Cloud's Machine Learning Platform PAI, introduces the team's work spanning mature search‑recommendation systems, multimodal image/video labeling, large‑model platform development (Tongyi series), and RAG engineering and evaluation.

The talk is organized into three parts: the technical architecture of search‑recommendation advertising, engineering and algorithm practices in that scenario, and explorations of large‑model capabilities.

In the recommendation flow, a user opens an app (Taobao, Tmall, etc.), the backend receives a request, and the business engine together with an A/B system decides which items to recall, coarse‑rank, and fine‑rank, while the engine performs the actual retrieval and ranking.

Model services include classic models such as DeepFM and FM, which rely on feature platforms storing both user behavior and item attributes; real‑time features are generated by Flink and stored in an offline big‑data platform (day‑level samples), then used for training on the AI platform to produce embeddings and models for online inference.

At the resource layer, Alibaba Cloud provides heterogeneous compute (CPU, GPU) via ODPS Feitian clusters, container services, and the Lingjun intelligent compute cluster for large‑model workloads, supported by high‑bandwidth RDMA networking and fast storage to meet the demands of thousands of GPUs.

The integrated "big‑data + AI" PaaS comprises MaxCompute (Hadoop‑like), Hologres (real‑time OLAP), Flink for streaming, and EMR as the open‑source counterpart, together with AI services such as PAI‑iTAG, data cleaning, FeatureStore, interactive (PAI‑DSW) and visual (PAI‑Designer) development, distributed training (PAI‑DLC), dataset acceleration, network/operator optimizations, and model serving via PAI‑EAS.

The Bailei model service platform aggregates Tongyi multimodal models (image, speech, text) and open‑source ModelScope, enabling intelligent recommendation, open search, advertising growth, e‑commerce, and healthcare scenarios.

FeatureStore offers a unified, structured big‑data platform for managing and sharing machine‑learning features, synchronizing offline data (Hadoop/HDFS, MaxCompute) to online stores (Hologres, TableStore, FeatureDB), providing data consistency, lineage tracking, multi‑level caching, and feature generation to support low‑latency inference.

EasyRec, an open‑source recommendation algorithm library, abstracts compute resources (MaxCompute, Hadoop, Spark, local), supports diverse data sources (OSS, MaxCompute, HDFS, Hive), and provides auto‑ML, automatic feature generation/selection, model distillation, training acceleration, offline evaluation, and early stopping to reduce development effort.

Large‑model adoption introduces challenges: embeddings grow from tens of GB to hundreds of GB or even TB, sequence lengths expand dramatically, and both training and inference demand massive compute, leading to latency bottlenecks.

Training optimizations include multi‑level cache with automatic feature eviction, a WorkQueue producer‑consumer model for heterogeneous servers, feature selection, knowledge distillation, communication reduction (fusion, pipeline parallelism), and hardware acceleration (AVX/AMX, AllReduce, SOK, incremental embedding updates).

Inference optimizations comprise AVX/AMX‑accelerated lookup and string split on CPUs, bf16/int8 quantization on GPUs, AutoPlacement for operator scheduling, SessionGroup leveraging GPU multi‑stream, and feature caching, achieving roughly four times the QPS of native TF‑Serving in production.

LLM‑driven scenarios explored include e‑commerce item recommendation, content recommendation, enterprise knowledge‑base Q&A, and educational problem solving, illustrating how LLMs can augment traditional recommendation pipelines.

Prompt engineering examples cover category recommendation (e.g., generating related accessories for a phone), query rewriting for advertising (transforming vague queries into precise keywords), and synonym generation to improve search relevance.

The RAG workflow is modularized in PAI‑RAG, covering document extraction, indexing, pre‑retrieval query rewrite, retrieval, post‑retrieval processing, generation, and evaluation, supporting multimodal data, OCR, vector databases (ElasticSearch, Hologres, Milvus), and automated evaluation set creation via large‑model RefGPT.

A Gradio‑based front‑end demo showcases easy configuration of the RAG pipeline and interactive testing.

Big DataRAGRecommendation systemsLarge ModelsAI PlatformFeature Store
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.