Artificial Intelligence 12 min read

Optimizing Coarse Ranking Models for Short Video Recommendation: From GBDT to Dual‑Tower DNN and Cascading

This article details the practical upgrades of iQIYI's short‑video recommendation coarse‑ranking pipeline, moving from a GBDT model to a dual‑tower DNN, applying knowledge distillation, embedding compression, inference optimizations, and finally a cascade architecture to align with the fine‑ranking model while reducing resource consumption.

DataFunTalk

Feb 27, 2021

Optimizing Coarse Ranking Models for Short Video Recommendation: From GBDT to Dual‑Tower DNN and Cascading

The industrial recommendation system typically consists of recall, coarse ranking, fine ranking, and re‑ranking stages, each acting as a funnel to filter massive item pools. The article introduces the iQIYI SuiKe basic recommendation team's practical improvements on the coarse‑ranking model for short‑video feeds.

Background : Traditional coarse‑ranking models prioritize performance and are categorized into simple score truncation, LR/decision‑tree based machine‑learning models, and the widely used dual‑tower DNN that computes user and item embeddings via deep networks. iQIYI originally used a GBDT model based on statistical features.

Dual‑Tower DNN Coarse‑Ranking Model : Chosen for its balance of computation and effectiveness, the model contains three fully‑connected layers on both user and item sides, outputting 512‑dimensional embeddings. Feature selection is heavily trimmed: user side uses a few basic profile, context, and historical behavior features; item side retains only video ID, uploader ID, and video tags.

Knowledge Distillation : To compensate for feature pruning, a teacher‑student framework is employed where the fine‑ranking wide&deep model serves as the teacher. The student loss, teacher loss, and distillation loss (MSE with a scaling hyper‑parameter λ that grows with training steps) are combined to train a lightweight yet expressive coarse‑ranking model.

Embedding Parameter Optimization : Embedding parameters are optimized with the sparse‑solver FTRL while other layers use AdaGrad. This reduces model size by 46.8%, cuts inference memory by 100%, and increases transmission speed nearly twofold.

Online Inference Optimizations : User‑side embedding computation is de‑duplicated so each user‑candidate pair passes the user network only once, reducing p99 latency by ~19 ms. Video‑side embeddings are cached for high‑frequency items, further accelerating scoring.

These optimizations allowed the dual‑tower DNN coarse‑ranking model to match the previous GBDT performance while delivering significant gains in user engagement metrics for both the iQIYI hotspot channel and the SuiKe homepage feed.

Cascade Model : To keep coarse‑ranking objectives aligned with evolving fine‑ranking goals, the team adopted a cascade architecture that directly learns the fine‑ranking model’s predictions as training targets, eliminating the distillation step and reducing training resources. This change yielded ~3 % improvements in exposure‑click rates and video watch metrics.

Future Plans : Continue exploring next‑generation coarse‑ranking systems (COLD), expand recall volume with more features, refine user/item embedding similarity calculations (potentially replacing cosine similarity with a shallow network), and further improve online performance.

References :

1. https://www.kdd.org/kdd2018/accepted-papers/view/modeling-task-relationships-in-multi-task-learning-with-multi-gate-mixture- 2. H.B. McMahan, Follow‑the‑regularized‑leader and mirror descent: Equivalence theorems and L1 regularization, AISTATS 2011 3. https://arxiv.org/abs/1503.02531 4. https://arxiv.org/abs/2007.16122

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

recommendation system coarse ranking Knowledge Distillation dual-tower DNN embedding optimization cascading model

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.