Artificial Intelligence 14 min read

Multi‑Objective Deep Reinforcement Learning Framework for E‑commerce Traffic Allocation (MODRL‑TA)

The article presents a CIKM‑2024 paper that introduces MODRL‑TA, a multi‑objective deep reinforcement learning system combining multi‑objective Q‑learning, a cross‑entropy‑based decision‑fusion algorithm, and a progressive data‑augmentation pipeline to dynamically allocate search traffic on JD.com, with both offline and online experiments showing substantial gains in CTR, CVR, and overall platform performance.

DataFunSummit

Jan 5, 2025

Multi‑Objective Deep Reinforcement Learning Framework for E‑commerce Traffic Allocation (MODRL‑TA)

The JD.com search team’s paper, accepted at CIKM 2024, addresses the problem of traffic control in e‑commerce search, where adjusting the post‑ranking position of items reallocates natural traffic to maximize merchant growth, satisfy customer demand, and balance platform interests.

Existing ranking‑learning methods ignore the long‑term value of traffic allocation, while standard reinforcement‑learning approaches struggle to balance multiple objectives and suffer from cold‑start issues in real‑world data. To overcome these challenges, the authors propose a Multi‑Objective Deep Reinforcement Learning framework (MODRL‑TA) consisting of three key components:

Multi‑Objective Q‑Learning (MOQ): Independent deep Q‑network models are trained for each objective (e.g., click‑through rate, conversion rate). Each model estimates the long‑term value of its target and decides the insertion position of a product in the ranked list.

Decision‑Fusion Module (DFM): A cross‑entropy method (CEM) dynamically adjusts the weights of the objectives, allowing the system to respond to time‑varying merchant preferences and to mitigate cold‑start problems.

Progressive Data‑Augmentation (PDA): Initially trains MOQ on simulated offline logs; as real‑world interactions are collected, PDA progressively replaces simulated data with authentic user feedback, smoothing distribution shift and eliminating the cold‑start bottleneck.

The state representation includes user profile features, query attributes, historical user‑item interactions, contextual item features, and aggregated feedback signals. Actions correspond to the insertion position of a selected item (a_t ∈ R_L). Rewards are defined per objective, e.g., higher reward for higher click probability or order probability.

Training employs standard DQN loss minimization with separate evaluation and target networks for stability. The overall loss aggregates the individual objective losses, enabling shared representation learning while preserving objective‑specific parameters.

Extensive offline experiments on JD’s main search platform show that MODRL‑TA outperforms a MORL‑FR baseline, achieving up to 12.20 × CTR reward and 2.25 × CVR reward when using 100 % real data. Online A/B testing over two weeks demonstrates up to 18.0 % increase in impressions, 4.2 % rise in CTR, and 5.1 % boost in CVR compared with the PID algorithm, confirming the framework’s practical impact for over 600 million daily active users.

The authors conclude with future directions, emphasizing the need for finer‑grained algorithm design, stronger computational resources, and robust multi‑objective learning in dynamic, uncertain environments.

Team information and author bios are provided, along with a call for talent to join JD’s search algorithm team.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

e-commerce deep learning traffic allocation reinforcement learning multi-objective optimization online A/B testing cross-entropy method

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.