Multi-Embedding Paradigm for Scaling Recommendation Models: Mitigating Embedding Dimensional Collapse
This paper investigates the embedding dimensional collapse problem that hinders scaling of recommendation models and proposes a Multi-Embedding paradigm that learns multiple embeddings per feature with independent expert networks, demonstrating consistent performance gains across major CTR benchmarks and real‑world ad systems.
In modern recommendation systems, model parameters are dominated by feature embeddings, yet simply increasing embedding dimensions does not improve performance because of the Embedding Dimensional Collapse phenomenon, where many feature embeddings occupy only a low‑dimensional subspace.
The authors introduce the Information Abundance (IA) metric—defined as the sum of singular values divided by the largest singular value of an embedding matrix—to quantify collapse, and they formulate the Cross‑Collapse Law, stating that crossing a low‑IA feature with a high‑IA feature causes the latter’s embedding to collapse.
To overcome this, they propose the Multi‑Embedding paradigm: each feature learns multiple independent embeddings, each feeding a separate explicit cross‑feature expert network, whose outputs are combined via a Mixture‑of‑Experts (MoE) layer. This design increases the number of embeddings rather than their dimensionality, enabling true scaling of recommendation models.
Extensive experiments on the public Criteo and Avazu CTR datasets across various backbone models (DNN, IPNN, NFwFM, xDeepFM, DCN V2, FinalMLP) show that, for equal parameter budgets, Multi‑Embedding models consistently outperform single‑embedding counterparts, and their performance improves as model size grows, confirming a scaling law for recommendation systems.
Online deployment in Tencent Advertising’s ad‑delivery pipelines, including a heterogeneous multi‑embedding MoE architecture for pCTR prediction, yielded a 3.9% GMV lift, demonstrating the practical impact of the approach.
The study also analyzes embedding diversity using principal angles, visualizes transformation matrices of different experts, and confirms that Multi‑Embedding learns richer, less collapsed embedding spaces compared to single‑embedding setups.
In conclusion, the Multi‑Embedding paradigm effectively mitigates embedding collapse, enhances model scalability, and provides a valuable design principle for future deep learning‑based recommendation systems.
Tencent Advertising Technology
Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.