100k‑Token Natural‑Language Reasoning Enables a 30B‑A3B Model to Reach Olympiad Gold Level
A 30B‑A3B model, trained with reverse‑perplexity supervised fine‑tuning, two‑stage reinforcement learning, and a multi‑round generate‑verify‑revise inference loop, achieves gold‑medal performance on IMO, USAMO and IPhO contests using over 100 k token natural‑language reasoning without external tools.
