Kuaishou Tech
May 14, 2025 · Artificial Intelligence
StableReinforce and R1-Reward: Enhancing Multimodal Reward Models with Reinforcement Learning
This article presents StableReinforce and the R1-Reward model, demonstrating how reinforcement learning techniques can stabilize training and significantly improve the performance of multimodal reward models for large language models across several benchmarks.
AILLMR1-Reward
0 likes · 15 min read