Tagged articles
2 articles
Page 1 of 1
Data Party THU
Data Party THU
May 30, 2026 · Artificial Intelligence

How USTC’s Tiny LCPO Training Cuts Large Model Overthinking in Half

The paper introduces LCPO, a lightweight preference‑optimization technique that uses only 800 training examples and 50 steps to teach large language models to produce concise, accurate answers, halving inference length while often improving accuracy and reducing training cost by up to two orders of magnitude.

Efficient InferenceLCPOLow-Resource Training
0 likes · 8 min read
How USTC’s Tiny LCPO Training Cuts Large Model Overthinking in Half
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
May 20, 2026 · Artificial Intelligence

How 800 Data Points Halve LLM Chain‑of‑Thought Length and Boost Accuracy

The ICLR‑2026 paper introduces LCPO, a lightweight preference‑optimization technique that uses only 800 curated examples and 50 training steps to cut large‑model chain‑of‑thought generation length by about 50% while maintaining or even improving answer accuracy, dramatically reducing training and inference costs.

Efficient InferenceLCPOLow-Resource Training
0 likes · 8 min read
How 800 Data Points Halve LLM Chain‑of‑Thought Length and Boost Accuracy