Tencent Music Live Streaming Recommendation System: Architecture, Challenges, and Model Design
This article presents an in‑depth overview of Tencent Music's live‑streaming recommendation system, covering business background, system architecture, recall and ranking model designs, multi‑modal extensions, and advanced training techniques such as DSSM, ESMM, GradNorm, and CGC to improve user engagement and conversion.
Business Background – Live streaming is a comprehensive monetization tool for social entertainment apps; the key challenge is to cultivate user mindset and efficiently establish multiple connections (click, watch, follow, frequent view, frequent gifting) between users and anchors. Platforms address this by using personalized recommendation systems, and this article shares the technology and application of Tencent Music's K‑Song live recommendation system.
Recommendation System Architecture – The system consists of recall, coarse ranking, fine ranking, and re‑ranking layers, supported by platforms for profiling, feature engineering, training, A/B testing, debugging, and real‑time computation. Recall includes index‑based retrieval (e.g., user‑preferred accompaniment), social recall via friends, and strict anchor value evaluation using video and audio understanding.
Recall Model Design
1. Recall Model Iteration – Since 2018, four stages of model evolution culminated in the industry‑standard dual‑tower (DSSM) model, seeking a shared embedding space for users and items, with extensions such as auxiliary views for accompaniment and real‑time content understanding.
2. DSSM Dual‑Tower Model – DSSM offers high expressive power compared to traditional heuristic models, enabling fast, single‑request recall via efficient indexing.
3. Multi‑Modal Recall – Introduced SongView and multimedia View; leveraging song IDs and embeddings from QQ Music and internal multimedia libraries improves new‑user efficiency and overall click‑through rate.
Fine‑Ranking Model Design
1. Multi‑Dimensional Iteration – Optimized feature interaction using low‑order FM, high‑order Cross and AutoInt to achieve richer, more physical feature cross.
2. Feature Processing, Sampling, and Weighting – Handled numeric, categorical, ID, sequence, and embedding features; applied pooling/attention on sequence embeddings; incorporated multimedia embeddings as numeric inputs or direct anchor representations.
3. Feature Crossing – KFM, DeepKFM, Cross|AutoInt – Low‑order crossing via FM/NFM with Hadamard product; high‑order crossing via DCN and Bit‑level AutoInt, yielding notable AUC improvements.
4. CVR Estimation – ESMM, GradNorm – Shifted focus from CTR to CVR to ensure users truly engage with live content; employed ESMM to address selection bias and sparsity, and GradNorm to dynamically balance task losses during multi‑task learning.
Further enhancements included CGC (customized shared experts) to improve GradNorm’s effectiveness, yielding gains in click‑through, effective clicks, and watch time.
In summary, the live‑streaming recommendation system integrates multi‑modal features, advanced dual‑tower retrieval, sophisticated feature crossing, and multi‑task optimization to achieve higher user conversion and retention.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.