Artificial Intelligence 16 min read

Pre‑Ranking in Recommendation Systems: Model and Sample Optimization Practices at Zhuanzhuan Home Page

This article reviews the role of pre‑ranking in multi‑stage recommendation pipelines, compares dual‑tower and fully‑connected DNN models, discusses negative and positive sample selection strategies, and presents Zhuanzhuan's practical improvements in model architecture and traffic‑pool allocation to boost precision and diversity.

Zhuanzhuan Tech

Oct 24, 2024

Pre‑Ranking in Recommendation Systems: Model and Sample Optimization Practices at Zhuanzhuan Home Page

1 Introduction to Pre‑Ranking

Modern recommendation systems adopt a multi‑stage cascade consisting of recall (match), pre‑ranking, ranking, and re‑ranking. Each stage acts as a funnel that progressively reduces the candidate set. Pre‑ranking, the second stage, quickly filters thousands of recalled items down to a few hundred, balancing quality and diversity while operating under stricter efficiency constraints than the final ranking stage.

2 Industry Work

2.1 Model Optimization

Current industrial pre‑ranking models are deep‑learning based and fall into two categories: vector‑product dual‑tower models and fully‑connected DNN models.

2.1.1 Dual‑Tower Model

The dual‑tower architecture separates user and item towers, each processing respective features through DNNs to produce embeddings that are then compared for scoring. Item embeddings are pre‑computed offline, allowing complex item‑side networks without online latency penalties. However, the independence of the towers prevents cross‑features and delays fine‑grained interaction, limiting accuracy.

To mitigate late feature interaction, SENet is added on the embedding layer. SENet squeezes each embedding to a scalar, excites it to obtain a weight, and re‑weights the original embedding, amplifying important features and suppressing noisy ones.

2.1.2 Fully Connected DNN Model

Unlike the efficiency‑focused dual‑tower, fully connected DNN models aim for higher accuracy by freely using cross‑features, similar to the final ranking stage. To keep inference fast, algorithmic and engineering optimizations are required. Representative works include Alibaba's COLD and FSCD.

COLD employs feature selection and computation‑graph optimizations, using SE Block to rank feature importance, then retraining on the selected features. It also leverages heterogeneous and columnar computing and reduced‑precision GPU arithmetic, achieving higher effectiveness at the cost of lower QPS compared to dual‑tower models.

FSCD moves the feature‑selection process into the loss function by assigning a dropout probability to each feature; higher‑complexity features receive lower retention probabilities. Training proceeds in two stages: first learning embeddings, network parameters, and dropout probabilities; then fixing the top‑k valuable feature domains and fine‑tuning the model. FSCD slightly outperforms COLD online but still lags behind dual‑tower efficiency.

2.2 Sample Optimization

Using CTR models as an example, ranking models are typically trained on exposure data (clicked = positive, not clicked = negative). For pre‑ranking, the online input comes from recall results, so training only on exposure data creates a sample selection bias (SSB) between training and prediction distributions.

2.2.1 Negative Sample Sampling

Negative sampling aims to align training data with online distribution. Common strategies include:

Global Random Selection

Randomly pick items from the whole candidate pool (e.g., YouTube DNN). This can make negatives too easy and ignores the “rich‑get‑richer” effect where popular items dominate clicks; therefore the sampling probability is often adjusted using word2vec‑style popularity weighting.

Batch Random Selection

Within each training batch, for a given user, randomly sample other items (excluding the current positive) as negatives, effectively using other users' positives as negatives (e.g., Google dual‑tower recall).

Difficult Negative Selection

Hard negatives are items similar to positives; adding them forces the model to learn finer distinctions. Various works mine hard negatives from business logic (Airbnb), from mid‑position recall items (Facebook), or by distilling ranking results (Meituan).

2.2.2 Positive Sample Sampling

Positive samples are usually exposed items with positive feedback. When multiple scenarios exist (e.g., search vs. recommendation), cross‑scenario clicks or conversions can be transferred as additional positives. Taobao’s main search introduces “corrected” samples (turning a negative conversion into positive if it occurs in another scenario) and “supplementary” samples (adding out‑of‑scenario conversions as new positives), improving hit‑rate.

3 Zhuanzhuan Home‑Page Recommendation Pre‑Ranking Practice

The pre‑ranking pipeline consists of a model that re‑orders recalled items within each category to maximize estimated precision, and a traffic‑pool module that allocates quota across categories to balance efficiency and diversity.

3.1 Model Optimization Practice

The pre‑ranking model is a CTR model built on a dual‑tower base, refined through sample and network improvements.

Sample Optimization

Initially, positives and negatives matched the ranking stage (exposed‑click vs. exposed‑no‑click). Later, positive samples were expanded by borrowing clicks from other scenarios (similar to Taobao’s approach). Two schemes were tested: correcting exposed‑no‑click samples and supplementing unexposed samples; the latter showed larger gains. To alleviate SSB, additional unexposed random negatives were added, split into “exposed‑no‑click in other scenarios” and “never exposed” groups. Offline hit‑rate evaluation and online A/B tests confirmed that combining both negative types yields the best performance.

Network Structure Optimization

Because the model serves many categories with varying traffic and feature distributions, a multi‑scenario single‑task architecture is needed. Inspired by PEPNet, the item side incorporates EPNet for feature selection and fusion. Shared features (price, category) and category‑specific features (CPU, GPU) are weighted per‑category, strengthening important signals and weakening irrelevant ones.

3.2 Traffic‑Pool Introduction

Since the ranking stage is not truncated, the diversity of items output by pre‑ranking directly determines whether downstream re‑ranking can apply diversity strategies. The traffic‑pool guarantees category‑level diversity.

The pool intervenes in recall and pre‑ranking. In recall, it ensures sufficient recall for categories the user has previously clicked, promoting diverse recall results. In pre‑ranking, the pool first splits the scored list by category, then computes a quota for each category based on user behavior and item supply, finally merging the top items from each category to ensure all categories have exposure opportunities.

3.3 Future Plans

Future work will continue to improve both the model (exploring more advanced architectures and richer features) and the traffic‑pool (refining quota allocation to better balance efficiency and diversity).

References

[1] SENet Dual‑Tower Model in Recommendation: https://zhuanlan.zhihu.com/p/358779957

[2] COLD: Towards the Next Generation of Pre‑Ranking System: https://arxiv.org/pdf/2007.16122

[3] Towards a Better Trade‑off between Effectiveness and Efficiency in Pre‑Ranking: https://arxiv.org/pdf/2105.07706

[4] Deep Neural Networks for YouTube Recommendations: https://dl.acm.org/doi/pdf/10.1145/2959100.2959190

[5] Sampling‑bias‑corrected Neural Modeling for Large Corpus Item Recommendations: https://dl.acm.org/doi/10.1145/3298689.3346996

[6] Real‑time Personalization using Embeddings for Search Ranking at Airbnb: https://dl.acm.org/doi/abs/10.1145/3219819.3219885

[7] Embedding‑based Retrieval in Facebook Search: https://arxiv.org/pdf/2006.11632

[8] Meituan Search Pre‑Ranking Optimization: https://mp.weixin.qq.com/s/u3sw_PatpwkFC0AtkssmPA

[9] Unified Pre‑Ranking for Main Search based on Global Funnel Analysis: https://zhuanlan.zhihu.com/p/587353144

[10] PEPNet: Parameter and Embedding Personalized Network for Infusing with Personalized Prior Information: https://arxiv.org/pdf/2302.01115

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

model optimization recommendation system dual-tower pre‑ranking sample selection bias traffic pool

Written by

Zhuanzhuan Tech

A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.