Artificial Intelligence 14 min read

Adversarial Gradient Driven Exploration for Deep Click-Through Rate Prediction

The authors introduce AGE, an adversarial‑gradient‑driven exploration framework that injects uncertainty‑scaled perturbations into ad embeddings to approximate the downstream learning effect, combines Monte‑Carlo dropout uncertainty, a dynamic gating unit, and achieves up to 15 % offline gains and 6 % online CTR improvement over strong baselines.

Alimama Tech
Alimama Tech
Alimama Tech
Adversarial Gradient Driven Exploration for Deep Click-Through Rate Prediction

This work tackles click‑through rate (CTR) prediction in large‑scale advertising and recommendation systems, where cold‑start and data‑loop issues hinder model performance. The authors propose a novel exploration‑exploitation strategy called AGE (Adversarial Gradient driven Exploration) that simulates the downstream impact of explored samples on the model by adding adversarial perturbations to input representations.

Background: In streaming CTR models, the data used for training is generated by the deployed model itself, creating a feedback loop. New or long‑tail ads suffer from insufficient exposure, leading to high uncertainty and poor estimates. Traditional E&E methods rely on uncertainty as a proxy for potential reward (e.g., UCB, Thompson Sampling) but do not model how explored data influences subsequent learning.

Method: AGE introduces a Pseudo‑Exploration module that injects a fixed‑norm adversarial gradient—scaled by the model’s predicted uncertainty—into the input embedding, thereby approximating the change in the model’s score after training on the explored sample. A Dynamic Gating Unit (DGU) filters low‑value ads by comparing the predicted CTR with an estimated average CTR, preventing wasteful exploration. The overall architecture combines MC‑Dropout for uncertainty estimation, fast gradient (FGM) or projected gradient descent (PGD) for adversarial gradients, and the gating mechanism.

Implementation details: Uncertainty is obtained via Monte‑Carlo Dropout; adversarial gradients are computed without altering the original model parameters; the DGU uses only ad‑side features to predict average CTR and decides whether to explore.

Experiments: Offline benchmarks on public datasets show AGE outperforms random, gradient‑based, UCB, and Thompson Sampling baselines, achieving up to 15.3% relative gain over the best baseline and a 28% lift over a non‑exploration model. Ablation studies confirm the necessity of the pseudo‑exploration, adversarial gradient, and gating components. Online A/B tests on Alibaba’s display‑ad platform demonstrate significant improvements: +6.4% CTR, +3.0% PV, closer-to‑1 PCOC, and a 5.5% increase in advertiser satisfaction (AFR).

Conclusion: By modeling the closed‑loop effect of exploration, AGE provides a more effective E&E solution for CTR prediction, validated both offline and in production, and is slated for broader deployment.

Deep LearningCTR predictionrecommendation systemsonline learningadversarial gradientexploration-exploitation
Alimama Tech
Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.