Artificial Intelligence 13 min read

Applying Deep Learning to Sogou Mobile Search Advertising: Multi‑Model Fusion for CTR Prediction

This article presents how deep learning techniques are applied to Sogou's mobile search advertising, detailing the system architecture, feature design, multi‑model fusion strategies, engineering implementation, evaluation metrics, and future directions for improving CTR prediction performance.

Ctrip Technology

Jul 29, 2016

Applying Deep Learning to Sogou Mobile Search Advertising: Multi‑Model Fusion for CTR Prediction

Search engine advertising is a major revenue source, and traditional shallow models can no longer meet market demands; this talk explains how deep learning is leveraged in Sogou's wireless search ads.

The basic ad serving pipeline consists of a Bidding Server, Retriever Server, and Quality Server, where each stage can benefit from deep learning, especially in click‑through‑rate (CTR) estimation.

The CTR prediction workflow extracts raw click and query logs, builds features (query, ad, and match features), trains both linear (e.g., Logistic Regression) and non‑linear models (e.g., DNN, GBDT), and serves predictions online.

Feature design includes sparse discrete features (One‑Hot encoding) and dense continuous features; each has trade‑offs in sparsity, dimensionality, and model complexity.

Model categories are:

Linear models – simple, stable, handle large feature spaces but cannot capture feature interactions.

Non‑linear models – learn complex relationships but are computationally heavier.

To combine strengths, two fusion schemes are discussed: CTR bagging (averaging outputs of multiple models) and cross‑model fusion (using one model’s output as features for another). The latter was chosen for its greater improvement potential.

Engineering implementation introduces the concept of ModelFeature, treating each model as an abstract feature that can depend on other features, enabling configurable fusion, bagging, and cross‑model interactions while avoiding redundant computation.

Offline evaluation relies heavily on AUC, but the authors note survivorship bias and feature coverage issues that can cause discrepancies between offline metrics and online revenue.

Parallel training is required for large‑scale DNNs; the team evaluated frameworks (Caffe, TensorFlow, MXNet) and found MXNet suitable for multi‑node, multi‑GPU training.

Future work includes exploring more business‑specific deep‑learning applications, further model fusion experiments, and addressing coverage and stability challenges in production.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

feature engineering deep learning CTR prediction Model Fusion search advertising

Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.