Dynamic Knapsack Optimization for Multi‑Channel Sequential Advertising Using Long‑Term Value
The article presents a novel multi‑channel sequential advertising framework that models budget‑constrained GMV optimization as a dynamic knapsack problem, introduces a long‑term value‑based RL solution (MSBCB), and validates its superiority through extensive offline and online experiments showing up to 10% ROI improvement.
In e‑commerce advertising, optimizing GMV under budget constraints is a core goal for advertisers and platforms. Traditional per‑request greedy strategies ignore the cumulative effect of multiple user‑ad interactions over time, leading to sub‑optimal outcomes.
The authors define an advertising sequence as repeated contacts between the same user and ad, and introduce the concept of long‑term value (LTV) —the total expected value of the remaining sequence. By modeling the budget‑constrained optimization as a dynamic knapsack problem , where each item’s value is the LTV of a pair and its weight is the expected cost, they propose a two‑stage iterative solution: (1) fix the policy π and optimize the selection X via a greedy value‑to‑cost ratio; (2) fix X and improve π using reinforcement learning (policy iteration) to estimate LTV.
The resulting algorithm, named MSBCB (Multi‑channel Sequential Budget Constrained Bidding) , incorporates action reduction to shrink the discrete bidding space and a conversion of continuous bids to discrete actions without loss of precision.
Offline experiments compare MSBCB against several baselines (Greedy + DDPG, Greedy + DQN, Greedy + PPO, Myopic Greedy, enumeration‑based optimal methods). MSBCB consistently outperforms baselines, matches the enumeration‑based optimal solution, and demonstrates faster convergence.
Online A/B tests on Taobao across nine scenarios (e.g., first‑guess, post‑purchase) show that MSBCB achieves roughly a 10% increase in GMV with less than 1% cost increase, translating to a near‑10% ROI uplift. Detailed analyses reveal higher ROI per day, broader distribution of positive ROI across stores, longer user‑ad interaction sequences, and more efficient budget allocation toward high‑ROI scenes such as checkout and cart.
The work, accepted at ICML‑2020 (paper: “Dynamic Knapsack Optimization Towards Efficient Multi‑Channel Sequential Advertising”), concludes that long‑term value‑driven dynamic knapsack modeling provides a theoretically sound and practically effective solution for budget‑constrained advertising, and suggests extending the framework to other long‑term metrics like favorites, add‑to‑cart, and multi‑objective optimization.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.