Overview of Recent Alibaba Mama Research Papers on AI and Large‑Scale Advertising Systems
The article surveys six Alibaba Mama papers accepted at CIKM 2021, presenting novel AI methods—including a heterogeneous graph neural network for keyword matching, a star‑topology multi‑domain CTR model, a compact hash embedding technique, adaptive masked twins layers, automated hierarchical conversion prediction, and a scalable multi‑view ad retrieval system—each demonstrating substantial online performance improvements and large‑scale deployment.
CIKM (The Conference on Information and Knowledge Management) is an ACM‑sponsored top‑tier international conference in information retrieval and data mining. The conference will be held online from November 1 to 5.
CIKM 2021 received a record 1,251 full papers and 626 short papers. Of these, 271 full papers and 177 short papers were accepted, resulting in acceptance rates of 21.7% and 28.3% respectively.
The Alibaba Mama technical team had two long papers and four short papers accepted. The team will invite the authors to present detailed explanations of the research ideas and technical achievements.
Heterogeneous Graph Neural Networks for Large‑scale Bid Keyword Matching
Abstract: Online advertising increasingly focuses on personalizing ads by mining users' historical behavior, search intent, and keyword bidding. Keyword recommendation for search ads is a core service for advertisers, but existing methods only model a single type of relationship (e.g., clicks or text similarity) and ignore auxiliary relations such as connections between ads/keywords and ordinary products. HetMatch, a heterogeneous graph neural network (HGNN)‑based model, introduces multi‑level GNN structures to fuse and enhance diverse auxiliary relations at both micro and macro levels, producing richer and more robust representations for ads and queries. To address cold‑start for new ads, a multi‑view framework incorporates additional samples. Offline experiments on Alibaba’s industrial dataset and online A/B tests on multiple keyword‑recommendation tools show significant improvements in consumption and adoption rates, and the model is now deployed across the entire system.
One Model to Serve All: Star Topology Adaptive Recommender for Multi‑Domain CTR Prediction (STAR)
Abstract: Traditional CTR models are trained and served for a single scenario, which is inefficient for large platforms that need predictions for many scenarios with overlapping user and ad pools. Training separate models ignores inter‑scenario similarity, while a fully shared model cannot capture scenario‑specific differences. STAR adopts a star‑shaped topology consisting of a shared central network and private per‑scenario networks; the final prediction for a scenario is obtained by element‑wise multiplication of shared and private parameters. This design simultaneously models similarity and divergence across scenarios. Deployed in Alibaba Mama’s display ad system in 2020, STAR achieved a 8.0% lift in CTR and a 6.0% increase in RPM.
Binary Code based Hash Embedding for Web‑scale Applications
Abstract: Deep learning models in web‑scale services (e.g., recommendation and advertising) rely heavily on embedding representations of ID‑type features. Storing a distinct vector for each feature value yields high accuracy but consumes prohibitive memory. This work proposes a binary‑code‑based Hash Embedding technique that compresses embeddings by orders of magnitude while preserving accuracy. Experiments show that even with a 1,000× reduction in storage size, the model retains 99% of its original performance.
Learning Effective and Efficient Embedding via an Adaptively‑Masked Twins‑based Layer
Abstract: In deep recommendation models, ID features are mapped to dense vectors of fixed dimension, which is sub‑optimal for both representation quality and storage. Existing dimension‑selection methods require extra knowledge or are hard to train. This paper introduces an Adaptive Masked Twins Layer (AMTL) placed after each embedding layer to prune unnecessary dimensions dynamically. The mask is learned end‑to‑end, supports hot‑starting of embeddings, and can be applied to various models. Experiments demonstrate superior accuracy over baselines while reducing storage by 60%.
AutoHERI: Automated Hierarchical Representation Integration for Post‑Click Conversion Rate Estimation
Abstract: Conversion Rate (CVR) prediction is crucial for ranking and bidding in advertising and recommendation systems. Existing methods jointly learn multiple prediction tasks using the user behavior sequence (impression → click → conversion). AutoHERI automatically aggregates hierarchical representations from upstream tasks to downstream CVR prediction, searching for optimal connection structures via one‑shot neural architecture search. Offline and online experiments on large‑scale real data confirm its superior performance and adaptability across scenarios.
SMAD: Scalable Multi‑view Ad Retrieval System for E‑Commerce Sponsored Search
Abstract: Building on Alibaba Mama’s open‑source distributed deep graph learning platform Euler, SMAD addresses the challenges of massive user behavior data and multiple views (e.g., co‑click, co‑bid, textual similarity) in e‑commerce search advertising. It proposes a category‑aware graph sampling and partitioning algorithm under category and relevance constraints for distributed training, and a parallel multi‑view training model that fuses information from different views. In Alibaba Mama’s search ad scenario, SMAD yields notable gains in relevance, coverage, and platform revenue.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.