Artificial Intelligence 13 min read

Applying Deep Learning to Sogou Mobile Search Advertising: Multi‑Model Fusion for CTR Prediction

This article presents how deep learning techniques are applied to Sogou's mobile search advertising, detailing the system architecture, feature design, multi‑model fusion strategies, engineering implementation, evaluation metrics, and future directions for improving CTR prediction performance.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Applying Deep Learning to Sogou Mobile Search Advertising: Multi‑Model Fusion for CTR Prediction

Search engine advertising is a major revenue source, and traditional shallow models can no longer meet market demands; this talk explains how deep learning is leveraged in Sogou's wireless search ads.

The basic ad serving pipeline consists of a Bidding Server, Retriever Server, and Quality Server, where each stage can benefit from deep learning, especially in click‑through‑rate (CTR) estimation.

The CTR prediction workflow extracts raw click and query logs, builds features (query, ad, and match features), trains both linear (e.g., Logistic Regression) and non‑linear models (e.g., DNN, GBDT), and serves predictions online.

Feature design includes sparse discrete features (One‑Hot encoding) and dense continuous features; each has trade‑offs in sparsity, dimensionality, and model complexity.

Model categories are:

Linear models – simple, stable, handle large feature spaces but cannot capture feature interactions.

Non‑linear models – learn complex relationships but are computationally heavier.

To combine strengths, two fusion schemes are discussed: CTR bagging (averaging outputs of multiple models) and cross‑model fusion (using one model’s output as features for another). The latter was chosen for its greater improvement potential.

Engineering implementation introduces the concept of ModelFeature , treating each model as an abstract feature that can depend on other features, enabling configurable fusion, bagging, and cross‑model interactions while avoiding redundant computation.

Offline evaluation relies heavily on AUC, but the authors note survivorship bias and feature coverage issues that can cause discrepancies between offline metrics and online revenue.

Parallel training is required for large‑scale DNNs; the team evaluated frameworks (Caffe, TensorFlow, MXNet) and found MXNet suitable for multi‑node, multi‑GPU training.

Future work includes exploring more business‑specific deep‑learning applications, further model fusion experiments, and addressing coverage and stability challenges in production.

machine learningfeature engineeringDeep LearningCTR predictionModel FusionSearch Advertising
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.