Deep GSP: Multi‑Objective Deep Learning Based Advertising Auction Mechanism

Deep GSP is a multi‑objective, deep‑learning ad auction that jointly learns rank scores while enforcing game‑theoretic constraints—monotonicity, incentive compatibility, and Nash equilibrium—and a smooth‑transition penalty, using DDPG reinforcement learning to outperform traditional GSP across revenue, clicks, conversions, and add‑to‑cart metrics.

Alimama Tech

Mar 16, 2022

Deep GSP: Multi‑Objective Deep Learning Based Advertising Auction Mechanism

Background : Online e‑commerce advertising uses auction mechanisms to allocate scarce traffic efficiently. Modern feed‑based ads must balance multiple objectives such as advertiser demand, user experience, and platform revenue, requiring mechanisms that are both revenue‑aware and stable over time.

The problem differs from classic multi‑objective optimization or auction design because it involves a dynamic game among multiple stakeholders (advertisers, users, platform) with rational agents interacting in a changing environment.

Problem Definition

We define the Multiple Stakeholders' Ad Performance Objectives Optimization problem, where a mechanism (allocation and pricing rule) must optimize a weighted aggregation of stakeholder metrics (e.g., platform revenue, clicks, conversions, add‑to‑cart, transaction volume) while satisfying two key constraints:

Game Equilibrium Constraint : Under the mechanism, all advertisers should reach a Nash equilibrium where no advertiser can improve its utility by changing its bid. This includes classic Myerson incentive‑compatibility for single‑slot auctions and symmetric Nash equilibrium (SNE) for multi‑slot auctions.

Smooth Transition Constraint : When the mechanism switches from optimizing one objective to another, the change in stakeholder metrics should be gradual, not abrupt.

THEOREM 1 (Single‑Slot Incentive‑Compatible) A single‑slot auction is incentive‑compatible iff the allocation rule is monotone in the bid and the pricing rule is based on the critical bid.

THEOREM 2 (Multi‑Slot Symmetric Nash Equilibrium) An auction satisfies SNE iff each bidder prefers its allocated slot to any other slot, i.e., the inherent click‑through rate of the slot times the bid is maximal for the assigned slot.

Deep GSP Mechanism

Deep GSP augments the traditional GSP auction with a deep neural network rank‑score function. Features such as advertiser bids, ad attributes, user demographics, and contextual signals are mapped to a continuous rank score. The mechanism must embed desirable properties into the end‑to‑end training:

Game Equilibrium Constraint

We enforce monotonicity of the rank‑score with respect to the bid (Point‑wise Monotonicity Loss, PML) and ensure the payment equals the minimal bid required to retain the allocated slot (Approximate Inverse Operation, AIO).

Smooth Transition Constraint

The reward combines a weighted sum of multi‑objective metrics and a penalty measuring utility volatility, enabling smooth adaptation when objectives change.

Implementation

The problem is cast as a reinforcement learning (RL) task. State includes bids, ad features, user features, and context; action is the rank‑score; reward is the multi‑objective weighted aggregation; and the transition is episodic (episode length = 1). We employ Deep Deterministic Policy Gradient (DDPG) with a policy network (the rank‑score model) and a value network to evaluate state‑action pairs.

Experiments

Offline simulations using the XRL platform calibrated click‑through, conversion, and add‑to‑cart rates show that Deep GSP outperforms GSP and uGSP across all metrics (RPM/CTR, RPM/ACR, RPM/CVR, RPM/GPM). Table 2 demonstrates that monotonicity (Spearman correlation) and inverse‑payment error are close to the ideal value of 1, and the data‑driven incentive‑compatibility metric approaches 1 as well.

When switching the optimization target from CTR to RPM, advertiser utility changes gradually, confirming the smooth transition property.

Comparison with Existing Work

Prior academic works (e.g., RegretNet, reinforcement‑learning based auctions) focus on simulated bidder values and single‑objective settings. Industrial solutions often use handcrafted ranking formulas with deep models for prediction but lack integrated incentive‑compatible constraints. Deep GSP bridges this gap by jointly learning the allocation function and embedding game‑theoretic properties.

Outlook

Future work will enhance model expressiveness, explore long‑term RL horizons, study learning‑based incentive compatibility, and improve interpretability of learned mechanisms.

References

[1] Myerson, R. B. (1981). Optimal auction design. [2] Varian, H. R. (2007). Position auctions. [3] Yuan Deng et al. (2020). Data‑driven metric of incentive compatibility. [4] Dütting et al. (2019). Optimal auctions through deep learning. [5] Tacchetti et al. (2019). Neural architecture for truthful auctions. [6] Shen et al. (2019). Automated mechanism design via neural networks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

reinforcement learning mechanism design multi-objective optimization advertising auction

Written by

Alimama Tech

Official Alimama tech channel, showcasing all of Alimama's technical innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.