Designing Safe, Sample-Efficient, and Robust Reinforcement Learning for Ranking and Diffusion Models

This paper proposes a reinforcement‑learning framework that simultaneously ensures safety, sample efficiency, and robustness, applying a contextual‑bandit perspective to ranking/recommendation systems and text‑to‑image diffusion models, and introduces novel algorithms for safe deployment, variance‑reduced off‑policy estimation, and a LOOP method for generative RL.

RobustnessSafetycontextual bandits

0 likes · 5 min read

Designing Safe, Sample-Efficient, and Robust Reinforcement Learning for Ranking and Diffusion Models

Alimama Tech

Sep 8, 2021 · Artificial Intelligence

Deep Uncertainty-Aware Learning (DUAL) for Click‑Through Rate Prediction and Exploration Strategies

The paper presents Deep Uncertainty‑Aware Learning (DUAL), a scalable Bayesian deep‑learning framework that combines a neural feature extractor with a Gaussian‑process prior to model CTR prediction uncertainty, mitigates feedback‑loop bias, and enables confidence‑driven exploration (UCB and Thompson sampling) that improves long‑term utility while preserving accuracy.

Gaussian ProcessUncertainty Modelingcontextual bandits

0 likes · 15 min read

Deep Uncertainty-Aware Learning (DUAL) for Click‑Through Rate Prediction and Exploration Strategies

Tencent Advertising Technology

Apr 12, 2021 · Artificial Intelligence

GuideBoot: A Guided Bootstrap Method for Solving Exploration‑Exploitation in Online Advertising

The article explains the exploration‑exploitation dilemma in recommendation systems, introduces the GuideBoot algorithm—an innovative guided bootstrap approach for contextual bandits—describes its Bayesian and non‑Bayesian foundations, presents experimental results on synthetic and real advertising data, and discusses an online learning extension.

Exploration-ExploitationGuideBootcontextual bandits

0 likes · 11 min read

GuideBoot: A Guided Bootstrap Method for Solving Exploration‑Exploitation in Online Advertising

DataFunTalk

May 9, 2020 · Artificial Intelligence

Hierarchical Adaptive Contextual Bandits for Resource-Constrained Recommendation (HATCH): A Detailed Review

This article provides a comprehensive review of the HATCH method—Hierarchical Adaptive Contextual Bandits for Resource‑Constraint based Recommendation—detailing its hierarchical architecture, objective function, resource allocation and personalization layers, cumulative regret analysis, experimental results on simulated and real data, and future directions.

Cumulative RegretHATCHcontextual bandits

0 likes · 9 min read

Hierarchical Adaptive Contextual Bandits for Resource-Constrained Recommendation (HATCH): A Detailed Review

DataFunTalk

Apr 19, 2020 · Artificial Intelligence

Bandit Algorithms for Recommendation Systems: Context‑Free, Thompson Sampling, and Contextual Approaches

This article explains how multi‑armed bandit methods such as Upper Confidence Bound, Thompson Sampling, and their contextual extensions can address cold‑start, diversity, and bias problems in large‑scale recommendation systems, describing practical update mechanisms, offline evaluation techniques, and deployment experiences at Ctrip.

AIBandit AlgorithmsExploration‑exploitation

0 likes · 15 min read

Bandit Algorithms for Recommendation Systems: Context‑Free, Thompson Sampling, and Contextual Approaches