Artificial Intelligence 17 min read

Seven Kuaishou Papers Accepted at WWW 2023 on Reinforcement Learning and Recommendation Systems

On January 25, Kuaishou’s community science team announced that seven of its papers were accepted at the ACM Web Conference 2023 (WWW’23), covering reinforcement‑learning‑based user retention, constrained actor‑critic recommendation, divide‑and‑conquer embedding retrieval, causal embedding with contrastive learning, latent action space exploration, dual‑interest factorization attention, and multi‑task reinforcement learning for recommendation.

Kuaishou Tech

Feb 10, 2023

Seven Kuaishou Papers Accepted at WWW 2023 on Reinforcement Learning and Recommendation Systems

On January 25, the Kuaishou community science team announced that seven of its papers were accepted at the ACM Web Conference 2023 (WWW’23), an A‑class international conference recommended by the China Computer Federation (CCF).

Paper 01: Reinforcing User Retention in a Billion‑Scale Short Video Recommender System (Industry Track)

Download: https://arxiv.org/pdf/2302.01724.pdf

Authors: Qingpeng Cai (Kuaishou), Shuchang Liu (Kuaishou), Xueliang Wang (Kuaishou), Tiantian Zuo (Kuaishou), Wentao Xie (Kuaishou), Bin Yang (Kuaishou), Dong Zheng (Kuaishou), Peng Jiang (Kuaishou)

Abstract: The core goal of short‑video recommendation is to improve user retention, a long‑term feedback after multiple interactions that cannot be decomposed to a single item or list. The paper models retention optimization as a reinforcement‑learning problem, defining an infinite‑horizon request‑level MDP and proposing the RLUR algorithm to address uncertainty, bias, and long‑delay challenges. Offline experiments on KuaiRand and online deployment on Kuaishou show significant gains in next‑day retention and DAU.

Paper 02: Two‑Stage Constrained Actor‑Critic for Short Video Recommendation (Research Track)

Download: https://arxiv.org/pdf/2302.01680.pdf

Authors: Qingpeng Cai (Kuaishou), Zhenghai Xue (Kuaishou), Chi Zhang (Kuaishou), Wanqi Xue (Kuaishou), Shuchang Liu (Kuaishou), Ruohan Zhan (Hong Kong University of Science and Technology), Xueliang Wang (Kuaishou), Tiantian Zuo (Kuaishou), Wentao Xie (Kuaishou), Dong Zheng (Kuaishou), Peng Jiang (Kuaishou)

Abstract: In short‑video feed recommendation, the system must maximize total watch time while satisfying interaction‑rate constraints. The authors formulate this as a constrained MDP and introduce the Two‑Stage Constrained Actor‑Critic (TSCAC) algorithm, which first learns separate policies for each auxiliary signal and then a second‑stage policy that optimizes watch time under distance constraints to the first‑stage policies. Offline and online results demonstrate substantial improvements over Pareto and STOA baselines.

Paper 03: Divide and Conquer: Towards Better Embedding‑based Retrieval for Recommender Systems from a Multi‑task Perspective (Industry Track)

Download: https://arxiv.org/pdf/2302.02657.pdf

Authors: Yuan Zhang (Kuaishou), Xue Dong (Shandong University), Weijie Ding (Kuaishou), Biao Li (Kuaishou), Peng Jiang (Kuaishou)

Abstract: Embedding‑based retrieval (EBR) is widely used in the recall stage of recommender systems but struggles to distinguish “hard negatives” from positives at large scale. The paper proposes a divide‑and‑conquer framework that clusters the candidate set, performs EBR within each cluster, and aggregates results, turning the problem into a multi‑task learning setting. Prompt‑tuned task‑adaptation learns a unified model that excels in both offline and online A/B tests.

Paper 04: Disentangled Causal Embedding With Contrastive Learning For Recommender System (Industry Track)

Download: https://arxiv.org/pdf/2302.03248.pdf

Authors: Wei‑qi Zhao (Kuaishou), Dian Tang (Kuaishou), Xin Chen (Kuaishou), Da‑wei Lv (Kuaishou), Dao‑li Ou (Kuaishou), Biao Li (Kuaishou), Peng Jiang (Kuaishou)

Abstract: User interactions are driven by both genuine interest and conformity effects, which can bias recommendation models. The authors introduce a causal‑based framework (DCCL) that disentangles these factors using contrastive learning and popularity‑aware contrastive loss, achieving better out‑of‑distribution robustness and mitigating head‑item dominance in both Yelp and Kuaishou datasets.

Paper 05: Exploration and Regularization of the Latent Action Space in Recommendation (Research Track)

Download: https://arxiv.org/pdf/2302.03431.pdf

Authors: Shuchang Liu (Kuaishou), Qingpeng Cai (Kuaishou), Bowen Sun (Peking University), Yuhang Wang (City University of Hong Kong), Ji Jiang (Peking University), Dong Zheng (Kuaishou), Peng Jiang (Kuaishou), Xiangyu Zhao (City University of Hong Kong), Yong‑feng Zhang (Rutgers University)

Abstract: Reinforcement learning for recommendation faces a huge, dynamic action space. The paper proposes a hyper‑actor‑critic framework that first generates a continuous latent “hyper‑action” and then uses it to parameterize an item scoring function, producing the effective recommendation list. An inverse mapping aligns the two action spaces, and a supervised module stabilizes early‑stage exploration. Experiments on KuaiRand and other public datasets show clear gains over SOTA RL and supervised baselines.

Paper 06: Dual‑interest Factorization‑heads Attention for Sequential Recommendation (Research Track)

Download: https://arxiv.org/pdf/2302.03965.pdf

Authors: Guan‑yu Lin (Tsinghua University), Chen Gao (Tsinghua University), Yu Zheng (Tsinghua University), Jian‑xin Chang (Kuaishou), Yanan Niu (Kuaishou), Yang Song (Kuaishou), Zhiheng Li (Tsinghua University), Depeng Jin (Tsinghua University), Yong Liu (Tsinghua University)

Abstract: Sequential recommendation for video feeds must handle both positive (non‑skip) and negative (skip) feedback, which are not simply opposite. The proposed FeedRec model uses a feedback‑aware encoding layer, a dual‑interest factorization‑head attention mechanism, and separate towers with BPR loss to disentangle positive and negative interests, achieving superior performance on multiple benchmarks.

Paper 07: Multi‑Task Recommendations with Reinforcement Learning (Research Track)

Download: https://arxiv.org/pdf/2302.03328.pdf

Authors: Zirui Liu (City University of Hong Kong), Jiejie Tian (City University of Hong Kong), Qingpeng Cai (Kuaishou), Xiangyu Zhao (City University of Hong Kong), Tong‑teng Gao (City University of Hong Kong), Shuchang Liu (Kuaishou), Dayou Chen (City University of Hong Kong), Tonghao He (City University of Hong Kong), Dong Zheng (Kuaishou), Peng Jiang (Kuaishou)

Abstract: Existing multi‑task learning (MTL) recommendation models ignore session‑level dynamics. The authors present RMTL, a reinforcement‑learning‑based MTL framework that constructs a session‑level MTL environment, trains a multi‑task actor‑critic network, and uses critic‑generated dynamic weights to balance task losses. Experiments on KuaiRand and other public datasets show significant AUC improvements over state‑of‑the‑art MTL baselines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI reinforcement learning short video Kuaishou WWW 2023

Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.