Short Video Recommendation System Design: Baidu Haokan Video Practice
The article outlines Baidu Haokan’s short‑video recommendation architecture, describing how a unified ranking pipeline uses user‑interest signals, multi‑objective MMOE and deep‑fusion models, and long‑term value estimation to balance personalized user experience, creator exposure, and advertiser goals across billions of daily video plays.
As of June 2020, China's short video users have reached 820 million, spending nearly 2 hours daily watching short videos. Baidu's short video platform, Haokan Video, boasts 70 minutes average usage time per user and over 3 billion video plays, making it a significant content source for Baidu Search.
1. What Problems Does a Recommendation System Solve?
There are three roles in recommendation platforms: users, creators, and advertisers. The system addresses experience issues for users and creators. For users, it aims to optimize experience and meet highly personalized consumption needs through rich user profiles and environmental awareness. For creators, it ensures quality content gets more distribution, retains creators, and implements content meritocracy.
2. Full Picture of Video Recommendation System
After creators upload content, it enters a unified ranking pipeline. When users open the app, the system recalls all relevant content (explicit and implicit) with the principle of "Make Everything Happens." Then through three funnels—rough ranking, fine ranking, and fusion—it selects the most preferred content and presents it through mechanism control.
3. User Interest Modeling in Video Product Interaction
With auto-play interaction (similar to TV), users swipe to the next if they dislike content. The system defines "harm" as swiping away quickly and "satisfaction" as watching longer or completing videos. It uses three-level signals—harm, duration, and completion rate—to model user interest. Additionally, interactions like follow, like, and favorite are categorized into a four-quadrant model and integrated into the recommendation system.
4. Multi-Objective Ranking Application
Multi-Objective Modeling: Evolved from basic shared-bottom DNN to MMOE, finally adopting population-based MMOE. Low, medium, and high activity experts are trained separately with joint decision-making to prevent high-activity population samples from dominating the entire model.
Multi-Objective Fusion Ranking: From simple polynomial fusion (basic but requires frequent adjustment) to deepES—scenario-personalized fusion. It obtains multiple parameter combinations by perturbing internal model parameters and selects optimal parameters based on designed Reward, introducing features like device type, state, and refresh rhythm.
5. Long-Term Value (LTV) Recommendation System
Current recommendation targets are based on immediate video consumption. However, viewing from a longer time sequence: user consumption spans past, present, and future. Past content serves as training samples and user interest features. Current recommendation targets future possible consumption as "continuation" of current interest, while current interest "stimulates" future interest. Future consumption value is attributed to current videos—Long Term Value.
LTV design involves two steps: 1) Find related content and design decay factors; 2) Fit LTV with models.
6. Discussion on Multi-Objective Video Recommendation
Key questions worth exploring: Are all multi-objective values equal (e.g., first like vs. tenth like)? Is the current target globally optimal? Can we design a system/model to characterize user retention directly?
Baidu Geek Talk
Follow us to discover more Baidu tech insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.