Artificial Intelligence 17 min read

Evolution of Zhihu Search Ranking Models: From GBDT to DNN, Multi‑Goal and Context‑Aware LTR

This article reviews the development of Zhihu's search system, describing the transition from early GBDT ranking to deep neural networks, the introduction of multi‑objective and position‑bias‑aware learning‑to‑rank methods, context‑aware techniques, end‑to‑end training, personalization, and future research directions.

DataFunTalk

Jan 25, 2021

Evolution of Zhihu Search Ranking Models: From GBDT to DNN, Multi‑Goal and Context‑Aware LTR

01 Zhihu Search Development History

Zhihu is a large Chinese Q&A community with over 40 million questions and more than 200 million answers, providing rich knowledge for users. Search is the primary way to retrieve information, and the format of results has expanded from text to include videos.

02 Ranking Algorithm Iterative Upgrades

1. Upgrade from GBDT to DNN

In August 2019 the ranking model was switched from GBDT to a deep neural network (DNN) to handle larger data volumes and to enable more complex architectures, which also allowed the adoption of newer research results such as multi‑goal ranking. The online performance improved after the upgrade.

2. DNN Ranking Model Structure

The DNN ranking model is built on Google’s open‑source TF‑Ranking framework, which provides common loss functions (e.g., pairwise loss, NDCG) and metrics. The code is modularized into feature input, feature transformation, model scoring, and loss calculation components.

3. Multi‑Goal Ranking

Initially the ranking model optimized only click‑through rate, but clicks alone do not reflect user satisfaction. Additional post‑click behaviors (reading time, likes, shares, etc.) were incorporated as extra objectives, combined with CTR to form a multi‑goal loss that improved relevance.

① Hard‑Sharing First Version – shared bottom layers across tasks with task‑specific top layers; each task used its own loss weight determined by A/B testing.

② MMOE Second Version – replaced the shared layer with multiple experts weighted per task, yielding further gains after deployment.

4. Unbias LTR

User click logs contain position bias because users tend to click higher‑ranked results. Two mitigation strategies were explored:

Reduce the weight of top‑position samples (either by a pre‑estimated weight or by learning it during training). Both approaches failed to produce positive effects.

Model position bias explicitly: predict true click probability and position bias separately, using a shallow tower for the bias and the main model for relevance. This method significantly improved click‑through rate when applied to the click objective.

5. Context‑Aware LTR

Three learning paradigms are discussed: point‑wise (independent scoring), pair‑wise (relative ordering), and list‑wise (optimizing the whole list). List‑wise loss consistently outperforms the others. Simple context features such as the average or median of list‑level statistics can be added to improve performance.

SE‑Block inspired architecture integrates list‑level information into per‑document scoring, but care must be taken to align offline training and online inference sets.

6. End‑to‑End Learning

Earlier models used a separate relevance model to generate features. Attempts to incorporate BERT directly into the LTR pipeline (first loading only the first three Transformer layers, then later adding BERT embeddings with a dense fine‑tuning layer) eventually yielded modest online gains.

7. Personalized LTR

Short‑term user behavior (search queries and clicked titles within the last 30 minutes) is encoded with BERT and combined with the current query embedding via inner product to produce a matching score. This simple feature delivered strong online improvements, while more complex interaction structures performed worse.

03 Some Unreleased Experiments

1. GBDT Feature Encoding Model

Inspired by Facebook, a GBDT model was trained on click data to generate leaf‑node IDs as additional features for the ranking model, but no online benefit was observed.

2. IRGAN – Generative Adversarial Ranking

A GAN‑based approach generated hard negative samples to improve the discriminator, yet training diverged after a few steps and did not achieve satisfactory performance.

04 Future Directions

Online learning – increasing model update frequency (daily updates already help; real‑time updates could further improve freshness).

Graph embedding – constructing a query‑document interaction graph to extract richer relational signals.

Personalization – leveraging long‑term user interest profiles to tailor results when query intent is broad.

Edge computing – performing lightweight, on‑device adjustments using fine‑grained user interactions to boost ranking quality.

References

https://github.com/tensorflow/ranking

Ma, Jiaqi, et al. "Modeling task relationships in multi‑task learning with multi‑gate mixture‑of‑experts." KDD 2018.

Zhao, Zhe, et al. "Recommending what video to watch next: a multitask ranking system." RecSys 2019.

Hu, Jie, Li Shen, and Gang Sun. "Squeeze‑and‑excitation networks." CVPR 2018.

Wang, RuiXing, et al. "SERank: Optimize Sequencewise Learning to Rank Using Squeeze‑and‑Excitation Network." arXiv 2020.

He, Xinran, et al. "Practical lessons from predicting clicks on ads at Facebook." WSDM 2014.

Wang, Jun, et al. "IRGAN: A minimax game for unifying generative and discriminative information retrieval models." SIGIR 2017.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

GBDT personalization deep learning search ranking DNN Learning-to-Rank position bias multi‑task learning

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.