Artificial Intelligence 34 min read

Advances in Alibaba Search Advertising Estimation: Model Deepening, Interaction, and System Efficiency (2021 Review)

The 2021 review of Alibaba’s Mama Search Advertising estimation platform details advances in model deepening—such as hash‑based embedding compression, adaptive dynamic parameters and graph neural networks—model interaction via a multi‑stage cascade with ranking distillation and oracle bias, and system efficiency gains from HPC training, mixed‑precision, multi‑hash embeddings, and fp16 quantization that deliver roughly a thirty‑fold speed‑up.

Alimama Tech
Alimama Tech
Alimama Tech
Advances in Alibaba Search Advertising Estimation: Model Deepening, Interaction, and System Efficiency (2021 Review)

The document presents a comprehensive technical review of the Alibaba Mama Search Advertising estimation platform for 2021, focusing on three major aspects: model deepening, model interaction, and system efficiency.

1. Model Deepening – The high‑water‑mark precision ranking (CTR) model is further optimized at both the Embedding Layer and Hidden Layer. In the Embedding Layer, binary‑code‑based hash embedding (BC) and adaptively‑masked twins‑based layer (AMTL) are introduced to achieve lossless compression of massive sparse features while preserving accuracy. The Hidden Layer explores new growth points beyond user‑behavior modeling, including an adaptive dynamic‑parameter model (AdaptPGM) that generates per‑traffic‑condition parameters, and a pre‑trained graph neural network (PCF‑GNN) for explicit cross‑feature learning.

2. Model Interaction – The multi‑stage cascade architecture (pre‑ranking → ranking → re‑ranking) is examined. For pre‑ranking, a ranking‑distillation‑based pre‑ranking model (RDPR) aligns pre‑ranking scores with downstream rank‑score signals. In ranking, the “oracle” capability is added to predict position bias and external context, enabling tighter coupling between ranking and re‑ranking. Creative selection is also integrated via a cascade architecture that places a dedicated creative‑ranking tower before the ranking model, using an Adaptive DropNet to balance ID and content features.

3. System Efficiency – To break the general‑purpose compute bottleneck, the team adopts high‑performance‑computing (HPC) training with large batch sizes, mixed‑precision communication, and multi‑hash embedding that reduces model size to ~30 GB, allowing full‑model training on a single GPU. Communication‑efficient All‑Reduce variants and fp16 quantization further accelerate training, achieving a 30× speed‑up.

The report also lists several peer‑reviewed papers (CIKM 2021, SIGIR 2021, WWW 2022) that detail the proposed methods, and provides a brief outlook on future research directions.

machine learningctrCVRGraph Neural Networksad techembedding compression
Alimama Tech
Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.