How a 10M‑Parameter Model Beats Large Models on Sudoku and ARC with Multi‑Trajectory Reasoning
The GRAM model introduced by Yoshua Bengio’s team replaces deterministic recursive updates with probabilistic multi‑trajectory sampling, enabling a 10 M‑parameter network to achieve 97 % accuracy on Sudoku‑Extreme, 52 %/11 % on ARC‑AGI, and near‑perfect results on N‑Queens and graph‑coloring, while also supporting unconditional generation tasks.
The paper "Generative Recursive Reasoning" (arXiv:2605.19376) proposes GRAM, a generative recursive reasoning model that transforms the traditional deterministic recursive update into a probabilistic multi‑trajectory process.
Model Architecture
GRAM decouples the hidden state into a high‑level component h and a low‑level component l. The low‑level state performs K deterministic updates while the high‑level state remains fixed, then a Gaussian perturbation (mean and variance) is added to h. The mean guides the reasoning direction, while the variance controls exploration. Randomness is applied only to the high‑level state; attempts to inject noise into the low‑level state did not improve performance.
Training Procedure
During training, GRAM uses a truncated‑gradient deep‑supervision mechanism that optimizes a surrogate objective after truncation. This deep supervision aligns the evidence lower bound (ELBO) with the truncated proxy, but the authors note that the sequential deep‑supervision limits training efficiency and memory usage.
Experimental Results
On the Sudoku‑Extreme benchmark, GRAM reaches 97.0 % accuracy, surpassing large‑scale models such as DeepSeek‑R1, Claude 3.7 16k, and o3‑mini‑high, which all score 0.0 % under comparable settings. On ARC‑AGI, GRAM obtains 52.0 % (ARC‑AGI‑1) and 11.1 % (ARC‑AGI‑2), while the same large models achieve 0.0 %.
In N‑Queens, deterministic recursive models HRM and TRM achieve 80.70 % and 72.90 % respectively. Adding deep supervision and stochastic guidance (+DS+SG) pushes accuracy to 100 %, and the full GRAM model attains 99.69 %. Ablation shows that removing the guidance signal drops accuracy to 50.27 %, and removing randomness altogether drops it to 0.0 %.
For graph‑coloring (8‑node graphs), GRAM reduces the number of conflicting edges to 2.7 (compared with 3.3 for the deterministic baseline and 19.0/61.3 for autoregressive generators).
Unconditional generation experiments demonstrate that GRAM can generate valid Sudoku boards with 99.05 % validity using only 10.9 M parameters and 16 supervised steps, outperforming the discrete diffusion model D3PM (55.1 M parameters, 1000 denoising steps, 91.33 % validity). In binary‑MNIST generation, increasing recursion steps from 8 to 256 lowers the FID from 84.08 to 73.34 and improves the Inception Score.
Ablation Studies
Random guidance and mean guidance must act together; removing either harms performance. The study also shows that data augmentation and inference‑time sampling provide complementary benefits rather than additive gains.
Inference Extension and Multi‑Solution Tasks
GRAM supports width‑dimension expansion during inference: with 16 recursion iterations and 20 parallel trajectories, it reaches 97.0 % Sudoku accuracy, far fewer than the 320 iterations required by deterministic baselines (which achieve 90.5 %). On multi‑solution tasks such as N‑Queens, GRAM attains 99.7 % accuracy and covers 90.3 % of distinct solutions.
Limitations and Future Work
The current evaluation is limited to controlled tasks (Sudoku, ARC‑AGI, N‑Queens, graph coloring, binary‑MNIST). The authors acknowledge that deep‑supervision‑based sequential training restricts scalability to larger base models.
Conclusion
GRAM’s key contribution is replacing a single deterministic recursion trajectory with a learnable, parallel‑sampled probabilistic process, which markedly improves exploration and constraint satisfaction in structured reasoning and multi‑solution problems. The width‑based parallel sampling also decouples inference cost from recursion depth.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Machine Learning Algorithms & Natural Language Processing
Focused on frontier AI technologies, empowering AI researchers' progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
