Artificial Intelligence 16 min read

Dynamic Knapsack Optimization for Multi‑Channel Sequential Advertising Using Long‑Term Value

The article presents a novel multi‑channel sequential advertising framework that models budget‑constrained GMV optimization as a dynamic knapsack problem, introduces a long‑term value‑based RL solution (MSBCB), and validates its superiority through extensive offline and online experiments showing up to 10% ROI improvement.

DataFunTalk

Aug 15, 2020

Dynamic Knapsack Optimization for Multi‑Channel Sequential Advertising Using Long‑Term Value

In e‑commerce advertising, optimizing GMV under budget constraints is a core goal for advertisers and platforms. Traditional per‑request greedy strategies ignore the cumulative effect of multiple user‑ad interactions over time, leading to sub‑optimal outcomes.

The authors define an advertising sequence as repeated contacts between the same user and ad, and introduce the concept of long‑term value (LTV) —the total expected value of the remaining sequence. By modeling the budget‑constrained optimization as a dynamic knapsack problem , where each item’s value is the LTV of a pair and its weight is the expected cost, they propose a two‑stage iterative solution: (1) fix the policy π and optimize the selection X via a greedy value‑to‑cost ratio; (2) fix X and improve π using reinforcement learning (policy iteration) to estimate LTV.

The resulting algorithm, named MSBCB (Multi‑channel Sequential Budget Constrained Bidding) , incorporates action reduction to shrink the discrete bidding space and a conversion of continuous bids to discrete actions without loss of precision.

Offline experiments compare MSBCB against several baselines (Greedy + DDPG, Greedy + DQN, Greedy + PPO, Myopic Greedy, enumeration‑based optimal methods). MSBCB consistently outperforms baselines, matches the enumeration‑based optimal solution, and demonstrates faster convergence.

Online A/B tests on Taobao across nine scenarios (e.g., first‑guess, post‑purchase) show that MSBCB achieves roughly a 10% increase in GMV with less than 1% cost increase, translating to a near‑10% ROI uplift. Detailed analyses reveal higher ROI per day, broader distribution of positive ROI across stores, longer user‑ad interaction sequences, and more efficient budget allocation toward high‑ROI scenes such as checkout and cart.

The work, accepted at ICML‑2020 (paper: “Dynamic Knapsack Optimization Towards Efficient Multi‑Channel Sequential Advertising”), concludes that long‑term value‑driven dynamic knapsack modeling provides a theoretically sound and practically effective solution for budget‑constrained advertising, and suggests extending the framework to other long‑term metrics like favorites, add‑to‑cart, and multi‑objective optimization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Advertising reinforcement learning online experiments budget optimization long-term value dynamic knapsack

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.