Real-time Short Video Recommendation on Mobile Devices: System Design, Model Architecture, and Experimental Evaluation
The paper presents a lightweight on‑device re‑ranking system for short‑video recommendation that leverages real‑time user feedback and context‑aware generative ranking, detailing its architecture, feature engineering, beam‑search optimization, and both offline and online experimental results showing significant performance gains.
Background
Short‑video platforms receive massive user interaction through likes, shares, and other explicit/implicit feedback, creating a need for recommendation systems that can react instantly to changing user interests. Traditional server‑side pipelines suffer from latency and cannot exploit real‑time client features such as network speed or preload time.
System Framework
The proposed architecture consists of three parts: (1) a conventional server‑side recommender that provides candidate videos and a high‑capacity ranking score, (2) a model‑training pipeline that generates lightweight TFLite checkpoints for the client, and (3) a client‑side recommender that collects real‑time feedback, extracts device‑specific features, and performs on‑device re‑ranking.
Model Structure
A compact neural network (<6 MB) is deployed entirely on the mobile device. It incorporates candidate video features, server‑side ranking scores, and client‑side real‑time signals. The core component is a target‑attention mechanism that models interactions between the watched video sequence and each candidate, followed by a multi‑task Mixture‑of‑Experts (MMoE) predicting click‑through, effective view, and like probabilities.
Context‑aware Generative Re‑ranking
Instead of greedy point‑wise ranking, the system uses a beam‑search‑based generative re‑ranking algorithm that sequentially selects videos while considering the entire context. ListReward, a discounted cumulative gain metric, guides the search; adaptive beam size reduces computational cost without sacrificing quality.
Feature Engineering
Features are divided into three groups: (1) server‑side ranking score (pXTR), (2) static video attributes (category, duration), and (3) client‑side signals (user feedback, network speed, preload time). Cross‑features such as the difference between candidate and watched video pXTR, exposure time gap, and exposure position gap further enhance real‑time relevance.
Experiments
Offline comparisons on a production dataset show that the lightweight on‑device model outperforms simple DNN baselines and EdgeRec, especially when combined with server‑side scores and engineered features. Ablation studies confirm the importance of client‑side features (CSF), feature engineering (FE), and real‑time feedback sequences (RTS). Online A/B tests in the Kuaishou app demonstrate consistent lifts in key metrics for both point‑wise and context‑aware generative re‑ranking, with only modest increases in CPU and memory usage.
Conclusion
The on‑device re‑ranking system significantly improves recommendation latency and user experience by capturing real‑time feedback, and the proposed lightweight model and adaptive beam search provide a practical solution for large‑scale short‑video platforms.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.