Artificial Intelligence 12 min read

Real-time Short Video Recommendation on Mobile Devices: System Design, Model Architecture, and Experimental Evaluation

The paper presents a lightweight on‑device re‑ranking system for short‑video recommendation that leverages real‑time user feedback and context‑aware generative ranking, detailing its architecture, feature engineering, beam‑search optimization, and both offline and online experimental results showing significant performance gains.

Kuaishou Tech
Kuaishou Tech
Kuaishou Tech
Real-time Short Video Recommendation on Mobile Devices: System Design, Model Architecture, and Experimental Evaluation

Background

Short‑video platforms receive massive user interaction through likes, shares, and other explicit/implicit feedback, creating a need for recommendation systems that can react instantly to changing user interests. Traditional server‑side pipelines suffer from latency and cannot exploit real‑time client features such as network speed or preload time.

System Framework

The proposed architecture consists of three parts: (1) a conventional server‑side recommender that provides candidate videos and a high‑capacity ranking score, (2) a model‑training pipeline that generates lightweight TFLite checkpoints for the client, and (3) a client‑side recommender that collects real‑time feedback, extracts device‑specific features, and performs on‑device re‑ranking.

Model Structure

A compact neural network (<6 MB) is deployed entirely on the mobile device. It incorporates candidate video features, server‑side ranking scores, and client‑side real‑time signals. The core component is a target‑attention mechanism that models interactions between the watched video sequence and each candidate, followed by a multi‑task Mixture‑of‑Experts (MMoE) predicting click‑through, effective view, and like probabilities.

Context‑aware Generative Re‑ranking

Instead of greedy point‑wise ranking, the system uses a beam‑search‑based generative re‑ranking algorithm that sequentially selects videos while considering the entire context. ListReward, a discounted cumulative gain metric, guides the search; adaptive beam size reduces computational cost without sacrificing quality.

Feature Engineering

Features are divided into three groups: (1) server‑side ranking score (pXTR), (2) static video attributes (category, duration), and (3) client‑side signals (user feedback, network speed, preload time). Cross‑features such as the difference between candidate and watched video pXTR, exposure time gap, and exposure position gap further enhance real‑time relevance.

Experiments

Offline comparisons on a production dataset show that the lightweight on‑device model outperforms simple DNN baselines and EdgeRec, especially when combined with server‑side scores and engineered features. Ablation studies confirm the importance of client‑side features (CSF), feature engineering (FE), and real‑time feedback sequences (RTS). Online A/B tests in the Kuaishou app demonstrate consistent lifts in key metrics for both point‑wise and context‑aware generative re‑ranking, with only modest increases in CPU and memory usage.

Conclusion

The on‑device re‑ranking system significantly improves recommendation latency and user experience by capturing real‑time feedback, and the proposed lightweight model and adaptive beam search provide a practical solution for large‑scale short‑video platforms.

Feature Engineeringbeam searchmobile inferencecontext-awarereal-time rankingshort video recommendationonline A/B test
Kuaishou Tech
Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.