Tag

online A/B testing

1 views collected around this technical thread.

DataFunSummit
DataFunSummit
Jan 5, 2025 · Artificial Intelligence

Multi‑Objective Deep Reinforcement Learning Framework for E‑commerce Traffic Allocation (MODRL‑TA)

The article presents a CIKM‑2024 paper that introduces MODRL‑TA, a multi‑objective deep reinforcement learning system combining multi‑objective Q‑learning, a cross‑entropy‑based decision‑fusion algorithm, and a progressive data‑augmentation pipeline to dynamically allocate search traffic on JD.com, with both offline and online experiments showing substantial gains in CTR, CVR, and overall platform performance.

E-commercecross-entropy methoddeep learning
0 likes · 14 min read
Multi‑Objective Deep Reinforcement Learning Framework for E‑commerce Traffic Allocation (MODRL‑TA)
DataFunSummit
DataFunSummit
Dec 27, 2023 · Artificial Intelligence

Two-Stage Constrained Actor-Critic for Short‑Video Recommendation and a Reinforcement‑Learning Multi‑Task Framework

This article presents a two‑stage constrained actor‑critic (TSCAC) algorithm that models short‑video recommendation as a constrained reinforcement‑learning problem, details its theoretical formulation and optimization loss, and validates its superiority through extensive offline and online experiments, followed by a multi‑task reinforcement‑learning framework (RMTL) that further improves multi‑objective recommendation performance.

Recommendation systemsconstrained optimizationmulti-task learning
0 likes · 16 min read
Two-Stage Constrained Actor-Critic for Short‑Video Recommendation and a Reinforcement‑Learning Multi‑Task Framework
Kuaishou Tech
Kuaishou Tech
Apr 27, 2023 · Artificial Intelligence

Two-Stage Constrained Actor‑Critic (TSCAC) for Short‑Video Recommendation

The paper models short‑video recommendation as a constrained Markov decision process and introduces a two‑stage constrained actor‑critic algorithm that jointly maximizes watch time while satisfying multiple interaction constraints, demonstrating superior offline and online performance on the KuaiRand dataset and Kuaishou app.

Recommendation systemsactor-criticconstrained optimization
0 likes · 7 min read
Two-Stage Constrained Actor‑Critic (TSCAC) for Short‑Video Recommendation
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
May 18, 2022 · Artificial Intelligence

Sliding Spectrum Decomposition for Diversified Recommendation in Feed Systems

The paper introduces Sliding Spectrum Decomposition (SSD), a tensor‑based method that quantifies feed diversity through singular‑value volume within sliding windows, integrates quality‑exploration trade‑offs, and employs a hybrid CB2CF model for item embeddings, achieving superior offline and online performance versus DPP in Xiaohongshu’s feed.

Machine LearningRecommendation systemsdiversity
0 likes · 10 min read
Sliding Spectrum Decomposition for Diversified Recommendation in Feed Systems