Artificial Intelligence 10 min read

Multi-Embedding Paradigm for Scaling Recommendation Models: Mitigating Embedding Dimensional Collapse

This paper investigates the embedding dimensional collapse problem that hinders scaling of recommendation models and proposes a Multi-Embedding paradigm that learns multiple embeddings per feature with independent expert networks, demonstrating consistent performance gains across major CTR benchmarks and real‑world ad systems.

Tencent Advertising Technology
Tencent Advertising Technology
Tencent Advertising Technology
Multi-Embedding Paradigm for Scaling Recommendation Models: Mitigating Embedding Dimensional Collapse

In modern recommendation systems, model parameters are dominated by feature embeddings, yet simply increasing embedding dimensions does not improve performance because of the Embedding Dimensional Collapse phenomenon, where many feature embeddings occupy only a low‑dimensional subspace.

The authors introduce the Information Abundance (IA) metric—defined as the sum of singular values divided by the largest singular value of an embedding matrix—to quantify collapse, and they formulate the Cross‑Collapse Law, stating that crossing a low‑IA feature with a high‑IA feature causes the latter’s embedding to collapse.

To overcome this, they propose the Multi‑Embedding paradigm: each feature learns multiple independent embeddings, each feeding a separate explicit cross‑feature expert network, whose outputs are combined via a Mixture‑of‑Experts (MoE) layer. This design increases the number of embeddings rather than their dimensionality, enabling true scaling of recommendation models.

Extensive experiments on the public Criteo and Avazu CTR datasets across various backbone models (DNN, IPNN, NFwFM, xDeepFM, DCN V2, FinalMLP) show that, for equal parameter budgets, Multi‑Embedding models consistently outperform single‑embedding counterparts, and their performance improves as model size grows, confirming a scaling law for recommendation systems.

Online deployment in Tencent Advertising’s ad‑delivery pipelines, including a heterogeneous multi‑embedding MoE architecture for pCTR prediction, yielded a 3.9% GMV lift, demonstrating the practical impact of the approach.

The study also analyzes embedding diversity using principal angles, visualizes transformation matrices of different experts, and confirms that Multi‑Embedding learns richer, less collapsed embedding spaces compared to single‑embedding setups.

In conclusion, the Multi‑Embedding paradigm effectively mitigates embedding collapse, enhances model scalability, and provides a valuable design principle for future deep learning‑based recommendation systems.

Artificial Intelligencedeep learningCTR predictionscaling lawembedding collapsemulti-embedding
Tencent Advertising Technology
Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.