Tag

label-sensitive reward

1 views collected around this technical thread.

Tencent Advertising Technology
Tencent Advertising Technology
Aug 15, 2024 · Artificial Intelligence

Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding

This paper introduces RLLR, a label‑sensitive reward reinforcement learning method that improves natural language understanding tasks by aligning training objectives with label accuracy, and demonstrates its effectiveness across eight public NLU datasets and real‑world advertising feature evaluation, outperforming standard RLHF and SFT baselines.

RLHFadvertisinglabel-sensitive reward
0 likes · 14 min read
Enhancing Reinforcement Learning with Label-Sensitive Reward for Natural Language Understanding