Bengio’s New Parallel Multi‑Trajectory Reasoning Paradigm
The article introduces GRAM (Generative Recursive Reasoning Models), a parallel multi‑trajectory inference framework that replaces deterministic single‑track recursion with stochastic latent transitions and width scaling, achieving state‑of‑the‑art results on Sudoku‑Extreme, ARC‑AGI, N‑Queens and Graph Coloring benchmarks.
1. Single‑track limitation of recursive reasoning
Recursive Reasoning Models (RRMs) such as HRM, TRM and Looped Transformer refine a latent state repeatedly, decoupling reasoning depth from model size. All existing RRMs are deterministic: given identical input and initialization they follow the same latent trajectory and converge to a single prediction, which can trap the model in a local optimum without alternative solutions.
2. Core idea of GRAM: probabilistic multi‑trajectory exploration
GRAM (Generative Recursive reAsoning Models) replaces each deterministic latent‑state update with a stochastic transition. At every recursion step the model first computes a deterministic proposal, then samples a “guidance” noise from a state‑dependent Gaussian distribution. The sampled noise perturbs the abstract state, yielding a distribution over reasoning trajectories. Parallel sampling of multiple trajectories enables:
Multi‑hypothesis maintenance : explore different solution strategies simultaneously.
Width scaling : improve performance by increasing the number of parallel samples rather than only deepening recursion.
Unconditional generation : when the input is empty the model can generate samples from the prior.
3. Architecture – hierarchical fast‑slow double loop
GRAM separates reasoning into two hierarchical loops:
High‑level (slow) loop : maintains an abstract reasoning state, updated once per outer‑loop step. The update is stochastic (Gaussian noise) and steers the overall reasoning direction.
Low‑level (fast) loop : performs deterministic refinements of the abstract state at every inner‑loop iteration, providing fine‑grained computation without additional randomness.
This design confines stochasticity to the slow abstract layer, preserving stability of the fast computations while allowing the model to explore diverse reasoning paths.
4. Training and inference
4.1 Training objective
GRAM treats inference as a latent‑variable generative model and maximises the Evidence Lower Bound (ELBO). Because gradients must be back‑propagated through the entire latent trajectory, the authors adopt a truncated‑gradient strategy: gradients are propagated only from the final supervision step of each trajectory, reducing memory consumption while keeping training stable.
4.2 Inference scaling: depth × width
Existing RRMs increase test‑time compute solely by adding more recursion steps (depth). GRAM adds a width dimension by sampling N parallel latent trajectories and selecting the best answer with a Latent Process Reward Model (LPRM) or majority voting. This width scaling is complementary to depth scaling.
5. Experiment 1 – Structured puzzles (Sudoku‑Extreme, ARC‑AGI)
On benchmarks that require hard constraint propagation and abstract visual reasoning, GRAM outperforms deterministic baselines:
Sudoku‑Extreme: GRAM 97.0 % vs TRM 87.4 % vs HRM 55.0 %.
ARC‑AGI‑1: GRAM 52.0 % vs TRM 44.6 %.
ARC‑AGI‑2: GRAM 11.1 % vs TRM 7.8 %.
Large language models (DeepSeek‑R1 671B, Claude 3.7, o3‑mini‑high) score 0 % on Sudoku‑Extreme, indicating the benchmark measures constraint‑propagation ability rather than pre‑trained knowledge.
Width scaling efficiency: with N=20 samples and 16 recursion steps, GRAM surpasses TRM’s 320‑step performance (97.0 % vs 90.5 %). Parallel sampling reduces sequential latency while achieving higher accuracy.
6. Experiment 2 – Multi‑solution tasks (N‑Queens, Graph Coloring)
These tasks have multiple valid solutions per input, testing a model’s ability to cover the solution space.
Deterministic RRMs (HRM, TRM, Looped Transformer) achieve at most 36.1 % coverage; accuracy drops sharply as the number of solutions grows (mode collapse).
GRAM with 20 parallel samples attains 99.7 % accuracy and 90.3 % coverage on 8×8 N‑Queens.
Graph Coloring conflicts reduced to 2.7 (8‑node) and 3.3 (10‑node), far better than autoregressive models (19.0 and 61.3 respectively).
https://ahn-ml.github.io/gram-website
https://arxiv.org/pdf/2605.19376
Generative Recursive ReasoningSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
