Advances in Click‑Through Rate (CTR) Modeling: Optimizations Across Embedding, Hidden, and Output Layers
This article reviews recent academic and industrial advances in click‑through rate prediction, classifying optimization techniques for the three‑layer CTR architecture—Embedding, Hidden, and Output—while summarizing three SIGIR papers on graph‑based user behavior modeling, explicit semantic cross‑feature learning, and learnable feature selection for pre‑ranking.
Click‑through rate (CTR) prediction is critical for search, recommendation, and advertising systems. With rapid progress in deep learning, many new CTR model variants have emerged. This article categorizes optimization directions according to the three‑layer CTR architecture—Embedding, Hidden, and Output—and highlights interesting work from Alibaba Mama’s search advertising team.
CTR Model Architecture
The overall CTR model consists of three layers:
Embedding Layers : map high‑dimensional categorical features to low‑dimensional dense vectors.
Hidden Layers : provide strong non‑linear fitting capability.
Output Layers : express the specific task objective.
Different layers require distinct optimization paths, as illustrated in the classification diagram (image omitted).
SIGIR Papers Overview
GIN (SIGIR‑2019) : Targets the Hidden‑Layers → User Behavior Modeling path. It proposes an end‑to‑end graph‑learning solution that enriches user interests by combining sequence modeling with graph convolution, improving generalization for low‑activity users.
PCF (SIGIR‑2021) : Targets the Hidden‑Layers → Feature Interaction Modeling path. It introduces a pre‑trained graph neural network to learn explicit semantic cross‑feature representations, enabling model compression and better generalization.
FSCD (SIGIR‑2021) : Targets the Embedding‑Layers path. It presents a learnable feature‑selection method that uses a compute‑factor prior to balance efficiency and effectiveness, deriving a new pre‑ranking model that outperforms traditional vector‑dot approaches.
GIN – Graph‑Based User Behavior Modeling
In search scenarios, users express intent through queries, but short queries often fail to capture true intent. Implicit behavior signals (clicks, dwell time, etc.) are crucial. Alibaba Mama combines sequence learning (modeling private user behavior) with graph learning (leveraging public, community behavior) to capture both personalized and collective patterns.
The GIN framework expands each historical behavior item with graph‑based topological connections and applies multi‑layer graph convolutions, yielding richer user interest representations that have been deployed in the full traffic of the “Zhi‑tong‑che” advertising product.
PCF – Explicit Semantic Cross‑Feature Learning
Cross features are vital for CTR models. Existing methods mainly focus on implicit semantic modeling (embedding the co‑occurrence of IDs). PCF‑GNN treats each feature as a graph node and each cross‑feature’s historical interaction as an edge, using a pre‑trained GNN to predict edge weights and thus explicitly model semantic cross‑features. Experiments on internal and public datasets show significant model size reduction and improved generalization.
FSCD – Learnable Feature Selection for Pre‑Ranking
Large‑scale search and advertising systems use a multi‑stage cascade (recall → coarse‑ranking → fine‑ranking → re‑ranking). Traditional coarse‑ranking models rely on representation‑focused vector‑dot architectures, which sacrifice accuracy for speed. FSCD introduces a learnable feature‑selection mechanism in the Embedding layer, guided by a compute‑factor regularizer, to produce an interaction‑focused coarse‑ranking model that better balances efficiency and effectiveness.
Extensive online A/B tests in Alibaba Mama’s search advertising pipeline demonstrate notable gains in both latency and prediction quality.
Summary and Outlook
Over the past year, the Alibaba Mama search advertising algorithm team has continuously iterated on Hidden‑Layer feature interaction modeling and Embedding‑Layer feature selection, supporting rapid business growth while publishing the underlying techniques academically. Future work will explore innovations in the Output‑Layer direction.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.