Artificial Intelligence 20 min read

Exploring QQ Music Recall Algorithms: Knowledge‑Graph Fusion, Sequence & Multi‑Interest Modeling, Audio Recall, and Federated Learning

This article presents a comprehensive overview of QQ Music's recall system, detailing business scenarios, challenges such as noisy user behavior and cold‑start, and four key solutions—including knowledge‑graph‑enhanced recall, sequence and multi‑interest modeling, audio‑based recall, and federated learning—along with experimental results, deployment details, and a Q&A session.

DataFunTalk
DataFunTalk
DataFunTalk
Exploring QQ Music Recall Algorithms: Knowledge‑Graph Fusion, Sequence & Multi‑Interest Modeling, Audio Recall, and Federated Learning

1. Business Introduction

QQ Music’s homepage offers diverse recommendation products (personal radio, daily 30 songs, single‑song recommendation, UGC playlists, AI playlists, etc.), each with distinct characteristics that create unique challenges for recall algorithms, such as differing optimization goals and sample construction.

QQ Music Recommendation Scenario Characteristics

The platform serves a broad user base across all ages, with sparse user attributes beyond basic demographics. User actions are dominated by full‑play and skip, with additional interactions like collect, block, follow, and playlist addition. Music consumption is highly repetitive, and product forms (audio, lyrics, artist, playlist titles, images) vary widely, posing specific recall challenges.

High noise in listening behavior makes raw samples inaccurate.

Heavy head‑item popularity reduces recommendation surprise.

Scarce user attributes increase cold‑start difficulty.

2. QQ Music Recall Solutions

To address the above problems, four solutions are proposed:

Fuse music knowledge‑graph recall.

Introduce sequence and multi‑interest recall.

Explore audio‑based recall for “listening‑sense” similarity.

Apply federated learning to mitigate sparse user attributes.

2.1 Knowledge‑Graph Fusion Recall

Music items contain rich side information (album, artist, genre, language). Traditional side‑info models (EGES, GraphSAGE) suffer from limited generalization and high training cost. QQ Music adopts a hybrid approach that integrates knowledge‑graph triples (e.g., song‑genre, song‑artist) into a Song2Vec framework, adding a gamma factor to control relation influence. This improves recall accuracy and reduces BadCase rate, as demonstrated on examples like Jay Chou’s “Dong Feng Po”.

2.2 Sequence & Multi‑Interest Recall

QQ Music uses SASRec with shared item/user embeddings to capture temporal and positional signals in user listening sequences, achieving a 2.5% accuracy lift over a YouTube‑style baseline (23.72% vs. 21.25%). Multi‑interest extraction builds on the MIND model, employing a capsule network to generate several interest vectors per user, which are then served by separate nearest‑neighbor indexes.

Optimizations include concatenating language/genre side‑info to song IDs (reducing model learning cost) and re‑initializing routing logits in the capsule layer, which markedly improves song embedding clustering and raises Hitrate@200 to 25.2%.

2.3 Audio Recall

Audio recall extracts four‑second segments and computes statistical descriptors (max, min, mean, variance, kurtosis, skewness) for 14 audio attributes (pure vocal, pure instrument, vocal+accompaniment, genre, etc.). These audio embeddings are used for single‑point recall, cold‑start new‑song distribution, and multi‑modal user‑audio metric learning. Experiments show a positive Pearson correlation between audio similarity and user full‑play rate, confirming audio relevance.

Audio‑based recall improves surprise and collection metrics, e.g., recommending songs similar to “Leave The Door Open” increased user collections.

2.4 Federated Learning Recall

Federated learning enables collaborative model training without sharing raw data, preserving privacy. QQ Music adopts vertical federated learning to combine its item‑side features (language, artist, version) with user‑side features from other Tencent services via a dual‑tower DSSM model. The item tower provides item embeddings for indexing; the user tower supplies user embeddings for online nearest‑neighbor retrieval, boosting cold‑start performance across multiple entry points.

The upgraded multi‑objective MMoE model further integrates cross‑business signals, delivering ~10% lift in per‑user listening duration for cold‑start scenarios while fully complying with privacy regulations.

3. Q&A Highlights

Recall samples are shared across all entry points and trained on global QQ Music data, while ranking models use point‑wise samples specific to each entry.

Long‑term interests are captured by deep sequence models; short‑term interests are addressed by recent‑behavior single‑point recall.

Multi‑interest recall allocates quotas per interest cluster to ensure fairness and diversity.

Audio features are heavily used in ranking models, improving collection rates and user surprise.

Technical stack includes ClickHouse + Superset for OLAP, Hive for batch processing, TensorFlow for model training, and C++/Go for serving.

4. Closing

Thanks to the audience for attending; the speaker encourages sharing, liking, and following the DataFunTalk community for more AI and big‑data resources.

multi‑interestKnowledge GraphFederated LearningAudio Embeddingmusic recommendationsequence modeling
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.