Artificial Intelligence 10 min read

Multi-Embedding Paradigm for Scaling Recommendation Models: Mitigating Embedding Dimensional Collapse

This paper investigates the embedding dimensional collapse problem that hinders scaling of recommendation models and proposes a Multi-Embedding paradigm that learns multiple embeddings per feature with independent expert networks, demonstrating consistent performance gains across major CTR benchmarks and real‑world ad systems.

Tencent Advertising Technology

Jul 24, 2024

Multi-Embedding Paradigm for Scaling Recommendation Models: Mitigating Embedding Dimensional Collapse

In modern recommendation systems, model parameters are dominated by feature embeddings, yet simply increasing embedding dimensions does not improve performance because of the Embedding Dimensional Collapse phenomenon, where many feature embeddings occupy only a low‑dimensional subspace.

The authors introduce the Information Abundance (IA) metric—defined as the sum of singular values divided by the largest singular value of an embedding matrix—to quantify collapse, and they formulate the Cross‑Collapse Law, stating that crossing a low‑IA feature with a high‑IA feature causes the latter’s embedding to collapse.

To overcome this, they propose the Multi‑Embedding paradigm: each feature learns multiple independent embeddings, each feeding a separate explicit cross‑feature expert network, whose outputs are combined via a Mixture‑of‑Experts (MoE) layer. This design increases the number of embeddings rather than their dimensionality, enabling true scaling of recommendation models.

Extensive experiments on the public Criteo and Avazu CTR datasets across various backbone models (DNN, IPNN, NFwFM, xDeepFM, DCN V2, FinalMLP) show that, for equal parameter budgets, Multi‑Embedding models consistently outperform single‑embedding counterparts, and their performance improves as model size grows, confirming a scaling law for recommendation systems.

Online deployment in Tencent Advertising’s ad‑delivery pipelines, including a heterogeneous multi‑embedding MoE architecture for pCTR prediction, yielded a 3.9% GMV lift, demonstrating the practical impact of the approach.

The study also analyzes embedding diversity using principal angles, visualizes transformation matrices of different experts, and confirms that Multi‑Embedding learns richer, less collapsed embedding spaces compared to single‑embedding setups.

In conclusion, the Multi‑Embedding paradigm effectively mitigates embedding collapse, enhances model scalability, and provides a valuable design principle for future deep learning‑based recommendation systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Artificial Intelligence deep learning CTR prediction Scaling Law embedding collapse multi-embedding

Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.