Bengio’s New Parallel Multi‑Trajectory Reasoning Paradigm

The article introduces GRAM (Generative Recursive Reasoning Models), a parallel multi‑trajectory inference framework that replaces deterministic single‑track recursion with stochastic latent transitions and width scaling, achieving state‑of‑the‑art results on Sudoku‑Extreme, ARC‑AGI, N‑Queens and Graph Coloring benchmarks.

PaperAgent
PaperAgent
PaperAgent
Bengio’s New Parallel Multi‑Trajectory Reasoning Paradigm

1. Single‑track limitation of recursive reasoning

Recursive Reasoning Models (RRMs) such as HRM, TRM and Looped Transformer refine a latent state repeatedly, decoupling reasoning depth from model size. All existing RRMs are deterministic: given identical input and initialization they follow the same latent trajectory and converge to a single prediction, which can trap the model in a local optimum without alternative solutions.

2. Core idea of GRAM: probabilistic multi‑trajectory exploration

GRAM (Generative Recursive reAsoning Models) replaces each deterministic latent‑state update with a stochastic transition. At every recursion step the model first computes a deterministic proposal, then samples a “guidance” noise from a state‑dependent Gaussian distribution. The sampled noise perturbs the abstract state, yielding a distribution over reasoning trajectories. Parallel sampling of multiple trajectories enables:

Multi‑hypothesis maintenance : explore different solution strategies simultaneously.

Width scaling : improve performance by increasing the number of parallel samples rather than only deepening recursion.

Unconditional generation : when the input is empty the model can generate samples from the prior.

3. Architecture – hierarchical fast‑slow double loop

GRAM separates reasoning into two hierarchical loops:

High‑level (slow) loop : maintains an abstract reasoning state, updated once per outer‑loop step. The update is stochastic (Gaussian noise) and steers the overall reasoning direction.

Low‑level (fast) loop : performs deterministic refinements of the abstract state at every inner‑loop iteration, providing fine‑grained computation without additional randomness.

This design confines stochasticity to the slow abstract layer, preserving stability of the fast computations while allowing the model to explore diverse reasoning paths.

4. Training and inference

4.1 Training objective

GRAM treats inference as a latent‑variable generative model and maximises the Evidence Lower Bound (ELBO). Because gradients must be back‑propagated through the entire latent trajectory, the authors adopt a truncated‑gradient strategy: gradients are propagated only from the final supervision step of each trajectory, reducing memory consumption while keeping training stable.

4.2 Inference scaling: depth × width

Existing RRMs increase test‑time compute solely by adding more recursion steps (depth). GRAM adds a width dimension by sampling N parallel latent trajectories and selecting the best answer with a Latent Process Reward Model (LPRM) or majority voting. This width scaling is complementary to depth scaling.

5. Experiment 1 – Structured puzzles (Sudoku‑Extreme, ARC‑AGI)

On benchmarks that require hard constraint propagation and abstract visual reasoning, GRAM outperforms deterministic baselines:

Sudoku‑Extreme: GRAM 97.0 % vs TRM 87.4 % vs HRM 55.0 %.

ARC‑AGI‑1: GRAM 52.0 % vs TRM 44.6 %.

ARC‑AGI‑2: GRAM 11.1 % vs TRM 7.8 %.

Large language models (DeepSeek‑R1 671B, Claude 3.7, o3‑mini‑high) score 0 % on Sudoku‑Extreme, indicating the benchmark measures constraint‑propagation ability rather than pre‑trained knowledge.

Width scaling efficiency: with N=20 samples and 16 recursion steps, GRAM surpasses TRM’s 320‑step performance (97.0 % vs 90.5 %). Parallel sampling reduces sequential latency while achieving higher accuracy.

Depth vs width scaling curve on Sudoku‑Extreme
Depth vs width scaling curve on Sudoku‑Extreme

6. Experiment 2 – Multi‑solution tasks (N‑Queens, Graph Coloring)

These tasks have multiple valid solutions per input, testing a model’s ability to cover the solution space.

Deterministic RRMs (HRM, TRM, Looped Transformer) achieve at most 36.1 % coverage; accuracy drops sharply as the number of solutions grows (mode collapse).

GRAM with 20 parallel samples attains 99.7 % accuracy and 90.3 % coverage on 8×8 N‑Queens.

Graph Coloring conflicts reduced to 2.7 (8‑node) and 3.3 (10‑node), far better than autoregressive models (19.0 and 61.3 respectively).

N‑Queens task example
N‑Queens task example
Graph Coloring task example
Graph Coloring task example
https://ahn-ml.github.io/gram-website
https://arxiv.org/pdf/2605.19376
Generative Recursive Reasoning
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

parallel inferenceYoshua BengioGRAMGenerative Recursive Reasoningrecursive reasoning modelswidth scaling
PaperAgent
Written by

PaperAgent

Daily updates, analyzing cutting-edge AI research papers

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.