How On-Policy Distillation Cuts LLM Training Cost by 90%
Thinking Machines Lab introduces On-Policy Distillation, a post‑training technique that matches reinforcement‑learning performance while reducing compute cost by up to tenfold, and demonstrates its effectiveness through extensive experiments on reasoning, personalization, and catastrophic‑forgetting mitigation.
