Artificial Intelligence 15 min read

Evolution of Re‑ranking Techniques in Kuaishou Short‑Video Recommendation System

This article details Kuaishou's short‑video recommendation pipeline, explaining the challenges of large‑scale sequencing, the development of sequence re‑ranking, multi‑content mixing, on‑device re‑ranking, and reinforcement‑learning‑based strategies, and demonstrates how these innovations improve user engagement and business metrics.

DataFunSummit
DataFunSummit
DataFunSummit
Evolution of Re‑ranking Techniques in Kuaishou Short‑Video Recommendation System

Kuaishou is a leading short‑video and live‑streaming platform with massive daily active users and diverse business lines, generating huge interaction data that creates complex recommendation scenarios such as large‑scale estimation, reinforcement learning, and causal analysis.

The talk is organized into four parts: an overview of Kuaishou's recommendation scene, sequence re‑ranking, multi‑content mixing, and on‑device re‑ranking.

Sequence Re‑ranking recognizes that the value of a video sequence is not the sum of individual items; context and ordering heavily influence user behavior. Traditional point‑wise scoring, greedy dispersion, and MMR/DPP methods have limitations. The proposed solution integrates transformer or LSTM to embed upstream context, optimizes for sequence‑level objectives, and continuously discovers better ordering patterns.

The system adopts a generator‑evaluator paradigm: the generator creates diverse candidate sequences from the top‑50 items, and the evaluator, a unidirectional transformer followed by an auxiliary embedding model, scores the whole sequence. This approach has yielded significant online gains.

Multi‑Content Mixing addresses the problem of combining results from multiple business streams. Fixed‑position slots cause user‑experience and resource‑allocation issues. A listwise mixing method evaluates each placement using context extraction, cross‑domain conversion, and multi‑task prediction (CTR, CVR, etc.) to select the optimal mixed sequence. A reinforcement‑learning (RL) variant defines user session state, action space (ad placement positions), and uses a Duel DQN with dense action representations derived from supervised pre‑estimation to balance long‑term value and short‑term revenue.

On‑Device Re‑ranking tackles real‑time perception, feedback latency, personalized models per user, and compute allocation. By deploying lightweight models on the client that incorporate implicit signals (volume, orientation) and recent feedback, the system can adjust recommendations within a session. A hybrid cloud‑edge architecture splits embedding lookup to the server and interaction modeling to the device, enabling hourly model updates and improving CTR (+2.53 pp), LTR (+4.81 pp), and WTR (+1.36 pp).

The presentation concludes with a Q&A covering evaluation of generated sequences, the "one‑model‑per‑user" approach, and diversity constraints, emphasizing that while algorithmic optimization seeks overall utility, practical systems must also enforce diversity and business rules.

transformerrecommendation systemsreinforcement learningKuaishoumulti-content mixingon-device rankingsequence re-ranking
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.