Cold-Start Content Recommendation Practices at Kuaishou
This article describes Kuaishou's approach to cold-start content recommendation, outlining the problems addressed, challenges in modeling sparse new videos, and solutions including graph neural networks, I2U retrieval, TDM hierarchical retrieval, bias correction, and future research directions.
Kuaishou produces massive amounts of short videos daily, and the platform faces the challenge of efficiently distributing newly created content (cold-start videos) to appropriate audiences. The goal is to increase early exposure, improve interaction rates, and ultimately boost user engagement and DAU.
The cold-start problem is divided into four parts: (1) defining the problem, (2) challenges and solutions for cold-start modeling, (3) future outlook, and (4) a Q&A session.
Key metrics include short‑term traffic and interaction efficiency, long‑term discovery of high‑potential videos, and overall ecosystem health. Intermediate indicators track new‑video consumption (flow‑rate) and downstream metrics such as exploration, utilization, and ecosystem health, while long‑term effects are measured via Combo experiments on APP time, author DAU, and platform DAU.
Modeling challenges stem from three main issues: the sample space for cold‑start videos is far smaller than the true solution space, the data are extremely sparse leading to biased learning, and modeling the growth value of videos is difficult.
To address these challenges, Kuaishou employs several techniques:
Graph‑entropy self‑enhanced heterogeneous graph networks (GNN) that incorporate user, author, and item nodes, with semantic self‑enhanced edges to reduce information entropy.
An I2U retrieval service that dynamically finds interest groups for each video, builds a reverse index for U2I, and serves item lists via Redis.
A TDM (Tree‑based Deep Model) hierarchical retrieval that enriches user‑item interactions, mitigates the limitations of dual‑tower models, and reduces interest concentration.
Bias‑correction modules that decouple popularity bias from genuine interest by learning separate embeddings for heat and interest and applying orthogonal constraints.
Additional enhancements such as U2U interest expansion, attention‑based parent‑child node aggregation, and data‑augmentation strategies.
Future work focuses on finer‑grained crowd diffusion models (e.g., lookalike), causal bias correction for exposure and popularity, selective weighting of high‑popularity samples, long‑term value modeling of video lifecycles, and further data‑augmentation techniques.
The Q&A covered implementation details of the I2U service, methods to prevent head‑content over‑concentration, and how the platform evaluates the "quality‑rate" (优普率) of popular videos.
Overall, the presented solutions significantly improve cold‑start coverage, reduce sample‑space mismatch, and enhance both short‑term and long‑term recommendation performance.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.