Optimizing Sparse Feature Embedding for Large‑Scale Recommendation and CTR Prediction
The article reviews recent research on representing massive sparse features in click‑through‑rate (CTR) models, introducing Alibaba's Res‑embedding method and Google's Neural Input Search (NIS) approach, and discusses how these techniques improve embedding efficiency and model generalization in large‑scale recommendation systems.
CTR prediction and recommendation tasks involve massive sparse features, where many item IDs appear infrequently despite a huge overall feature set. Effective embedding of these sparse features is crucial for model performance.
1. Item Embedding in User Behavior Sequences – The Res‑embedding work (DLP‑KDD 2019) proves that the generalization error of a DNN CTR model correlates with the distribution of item embeddings: tighter clusters of interest‑related items lead to lower error. It proposes representing each item embedding as the sum of a shared Central Embedding for a user‑interest cluster and a Residual Embedding specific to the item:
Item Embedding = Central Embedding + Residual embeddingConstraining the residual’s magnitude keeps interest clusters compact, improving generalization. The paper also outlines three graph‑based methods (including a GNN) to assign items to interest clusters.
2. Feature Embedding for Non‑Behavioral Recommendation Tasks – Google’s Neural Input Search (NIS) tackles the allocation of embedding dimensions to features of varying frequencies. It partitions the two‑dimensional space of feature count vs. embedding size into blocks, forming a search space of possible allocation schemes. Using ENAS (Efficient Neural Architecture Search), NIS searches for policies that assign longer embeddings to high‑frequency, informative features while giving shorter or shared embeddings to low‑frequency ones. The reinforcement‑learning reward balances validation AUC improvement against total embedding memory usage.
Both Res‑embedding and NIS aim to reduce over‑parameterization and enhance model generalization; they can be combined for even better sparse feature representation in large‑scale DNN recommendation systems.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.