Data Party THU
May 30, 2026 · Artificial Intelligence
How USTC’s Tiny LCPO Training Cuts Large Model Overthinking in Half
The paper introduces LCPO, a lightweight preference‑optimization technique that uses only 800 training examples and 50 steps to teach large language models to produce concise, accurate answers, halving inference length while often improving accuracy and reducing training cost by up to two orders of magnitude.
Efficient InferenceLCPOLow-Resource Training
0 likes · 8 min read
