How a 10M‑Parameter Model Beats Large Models on Sudoku and ARC with Multi‑Trajectory Reasoning

The GRAM model introduced by Yoshua Bengio’s team replaces deterministic recursive updates with probabilistic multi‑trajectory sampling, enabling a 10 M‑parameter network to achieve 97 % accuracy on Sudoku‑Extreme, 52 %/11 % on ARC‑AGI, and near‑perfect results on N‑Queens and graph‑coloring, while also supporting unconditional generation tasks.

Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
How a 10M‑Parameter Model Beats Large Models on Sudoku and ARC with Multi‑Trajectory Reasoning

The paper "Generative Recursive Reasoning" (arXiv:2605.19376) proposes GRAM, a generative recursive reasoning model that transforms the traditional deterministic recursive update into a probabilistic multi‑trajectory process.

Model Architecture

GRAM decouples the hidden state into a high‑level component h and a low‑level component l. The low‑level state performs K deterministic updates while the high‑level state remains fixed, then a Gaussian perturbation (mean and variance) is added to h. The mean guides the reasoning direction, while the variance controls exploration. Randomness is applied only to the high‑level state; attempts to inject noise into the low‑level state did not improve performance.

Training Procedure

During training, GRAM uses a truncated‑gradient deep‑supervision mechanism that optimizes a surrogate objective after truncation. This deep supervision aligns the evidence lower bound (ELBO) with the truncated proxy, but the authors note that the sequential deep‑supervision limits training efficiency and memory usage.

Experimental Results

On the Sudoku‑Extreme benchmark, GRAM reaches 97.0 % accuracy, surpassing large‑scale models such as DeepSeek‑R1, Claude 3.7 16k, and o3‑mini‑high, which all score 0.0 % under comparable settings. On ARC‑AGI, GRAM obtains 52.0 % (ARC‑AGI‑1) and 11.1 % (ARC‑AGI‑2), while the same large models achieve 0.0 %.

In N‑Queens, deterministic recursive models HRM and TRM achieve 80.70 % and 72.90 % respectively. Adding deep supervision and stochastic guidance (+DS+SG) pushes accuracy to 100 %, and the full GRAM model attains 99.69 %. Ablation shows that removing the guidance signal drops accuracy to 50.27 %, and removing randomness altogether drops it to 0.0 %.

For graph‑coloring (8‑node graphs), GRAM reduces the number of conflicting edges to 2.7 (compared with 3.3 for the deterministic baseline and 19.0/61.3 for autoregressive generators).

Unconditional generation experiments demonstrate that GRAM can generate valid Sudoku boards with 99.05 % validity using only 10.9 M parameters and 16 supervised steps, outperforming the discrete diffusion model D3PM (55.1 M parameters, 1000 denoising steps, 91.33 % validity). In binary‑MNIST generation, increasing recursion steps from 8 to 256 lowers the FID from 84.08 to 73.34 and improves the Inception Score.

Ablation Studies

Random guidance and mean guidance must act together; removing either harms performance. The study also shows that data augmentation and inference‑time sampling provide complementary benefits rather than additive gains.

Inference Extension and Multi‑Solution Tasks

GRAM supports width‑dimension expansion during inference: with 16 recursion iterations and 20 parallel trajectories, it reaches 97.0 % Sudoku accuracy, far fewer than the 320 iterations required by deterministic baselines (which achieve 90.5 %). On multi‑solution tasks such as N‑Queens, GRAM attains 99.7 % accuracy and covers 90.3 % of distinct solutions.

Limitations and Future Work

The current evaluation is limited to controlled tasks (Sudoku, ARC‑AGI, N‑Queens, graph coloring, binary‑MNIST). The authors acknowledge that deep‑supervision‑based sequential training restricts scalability to larger base models.

Conclusion

GRAM’s key contribution is replacing a single deterministic recursion trajectory with a learnable, parallel‑sampled probabilistic process, which markedly improves exploration and constraint satisfaction in structured reasoning and multi‑solution problems. The width‑based parallel sampling also decouples inference cost from recursion depth.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

structured reasoningSudokuARC‑AGIGRAMmulti-trajectory reasoningrecursive modelsvariational training
Machine Learning Algorithms & Natural Language Processing
Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.