Tagged articles
2 articles
Page 1 of 1
Data Party THU
Data Party THU
May 12, 2026 · Artificial Intelligence

MathForge: Leveraging Hard Problems in RL to Boost Large‑Model Mathematical Reasoning (ICLR 2026)

MathForge tackles the long‑standing question of which math problems deserve focus in reinforcement‑learning‑based training, introducing a difficulty‑aware optimizer (DGPO) and multi‑aspect question reformulation (MQR) that together prioritize harder‑but‑learnable questions, yielding consistent performance gains across model sizes and modalities.

DGPODifficulty‑Aware OptimizationLarge Language Models
0 likes · 11 min read
MathForge: Leveraging Hard Problems in RL to Boost Large‑Model Mathematical Reasoning (ICLR 2026)
Machine Heart
Machine Heart
Apr 26, 2026 · Artificial Intelligence

How MathForge Uses Hard Problems to Boost Large‑Model Mathematical Reasoning via Reinforcement Learning

MathForge tackles the overlooked issue of training large language models on mathematically challenging yet learnable problems by introducing a difficulty‑aware group policy optimization (DGPO) and multi‑aspect question reformulation (MQR), achieving consistent gains across model sizes and modalities.

DGPODifficulty‑Aware OptimizationLarge Language Models
0 likes · 13 min read
How MathForge Uses Hard Problems to Boost Large‑Model Mathematical Reasoning via Reinforcement Learning