Relevance Modeling and Ranking for Cloud Music Video Search
The paper details Cloud Music’s video‑search pipeline—query understanding, recall, relevance, ranking and re‑ranking—highlighting challenges such as ambiguous content, timeliness and multi‑objective goals, and describes two deployed models (a twin‑tower aspect relevance network and a click‑graph propagator) that together boost click‑through rate by 1.5 % and effective CTR by 2.3 %.
The article presents a comprehensive analysis of video search in the Cloud Music platform, contrasting it with e‑commerce search. While e‑commerce search is largely non‑precise and relies heavily on personalization, Cloud Music video search must handle heterogeneous resources (artists, tracks, playlists, videos, etc.) and balance both precise and non‑precise scenarios.
Key challenges for video search include difficult content understanding (titles and descriptions do not fully reflect video content), high relevance requirements (especially for ambiguous queries), strong timeliness (hot videos need rapid exposure), and multiple optimization objectives (CTR, effective rate, play duration, likes, shares, etc.).
The overall algorithm system is divided into five modules: query understanding, recall & expansion, relevance, ranking, and re‑ranking. Data mining supplies foundational information such as new words, synonyms, and tags, which feed the query understanding module that performs text normalization, error correction, weight analysis, entity extraction, and intent detection.
Recall consists of a basic text search engine and multi‑path expansion (query rewriting and vector recall). The relevance module evaluates three sub‑dimensions—text relevance, semantic relevance, and intent matching—and defines four relevance levels (good, fair‑good, fair‑fair, bad) with concrete examples.
Model selection covers four major approaches: text relevance (e.g., BM25), attribute relevance, semantic relevance (deep neural models), and behavior relevance (click‑graph models). The article details two practical implementations:
Aspect Relevance Model : a twin‑tower architecture with shared word embeddings, CNN‑based self‑attention for each semantic dimension, and multimodal fusion (image/audio) via tensor fusion. Training uses a hierarchical CTR‑based labeling scheme and both easy and hard negative samples.
Click Graph Model : constructs a bipartite graph from recent search click logs, propagates relevance via graph algorithms, and generates word‑bag vectors for queries and items. Similarity is measured with cosine or KL divergence, achieving an AUC of 0.768.
Recall strategies include basic text recall, query‑rewrite recall, vector recall, and personalized recall. Ranking incorporates a rich feature set (query, video, user features, real‑time statistics, position bias, etc.) and supports multi‑objective optimization (CTR, CVR, play duration). Various multi‑task models were evaluated: single‑task, Share‑Bottom, ESMM, MMOE, and PLE. Experiments show MMOE with uncertainty‑weighted loss achieves the best offline AUC (CTR‑AUC 0.823, CVR‑AUC 0.721) and is selected for production.
Online results indicate a 1.5% increase in click‑through rate and a 2.3% rise in effective click‑through rate after applying relevance improvements. The article concludes with a summary of current pain points (content understanding, relevance, timeliness, multi‑objective optimization) and outlines future work focusing on additional objectives and personalized modeling.
NetEase Cloud Music Tech Team
Official account of NetEase Cloud Music Tech Team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.