Bengio’s New Paper Pushes Recursive Reasoning Limits with Parallel Trajectories

The paper introduces GRAM (Generative Recursive Reasoning Models), a probabilistic multi‑trajectory recursive reasoning framework that injects learnable randomness into each recursion step, enabling parallel sampling and achieving higher accuracy than deterministic baselines across tasks such as Sudoku‑Extreme, N‑Queens, ARC‑AGI and unconditional generation.

Machine Heart
Machine Heart
Machine Heart
Bengio’s New Paper Pushes Recursive Reasoning Limits with Parallel Trajectories

Reasoning efficiency and quality are enduring challenges in large‑model research. Most mainstream inference models generate many intermediate tokens using chain‑of‑thought prompting, which increases latency and cost as reasoning depth grows.

In a recent podcast, LeCun reiterated that autoregressive generation is not the path to AGI; true intelligence should arise from planning and reasoning in latent space. Building on this view, Turing Award laureate Yoshua Bengio proposed a new parallel scheme.

GRAM: Generative Recursive Reasoning Models

GRAM converts deterministic recursive latent reasoning into probabilistic multi‑trajectory computation. At each recursion step the model samples a direction in latent space, allowing exploration of multiple solution paths.

Experimental results show that with only 16 recursion steps and 20 parallel samples, GRAM surpasses all deterministic baselines that require 320 serial steps.

From Single‑Track Determinism to Multi‑Track Probability

Existing Recursive Reasoning Models (RRMs) share a transition function that iteratively refines a persistent latent state, decoupling depth from parameter size. However, because they are deterministic, they follow a single trajectory and struggle with tasks that have multiple valid solutions or where a single path may get stuck in a local optimum.

GRAM introduces learnable randomness at each recursion step. Concretely, a deterministic module proposes an update u_t, then a state‑dependent Gaussian distribution provides a random guide ε_t. The new latent state is z_{t+1}=u_t+ε_t, where the mean μ_θ encodes a guided direction and the variance σ²_θ controls exploration breadth.

Hierarchical Architecture

GRAM employs a two‑level latent state: a high‑level component h and a low‑level component l. The low‑level component is updated K times per transition for fine‑grained computation, while the high‑level component updates once, carrying abstract reasoning state. Randomness is injected only into the high‑level component, guiding the overall trajectory without disturbing low‑level precision.

Training via Variational Inference

GRAM is trained as a probabilistic model using variational inference. It defines a prior p_θ(τ|x) used at inference time and a variational posterior q_φ(τ|x,y) that sees the correct answer y during training. The objective maximizes the evidence lower bound (ELBO) with a reconstruction term encouraging correct predictions from sampled trajectories and a KL term regularizing the posterior‑prior distance.

During training the posterior learns which sampled directions lead to correct solutions; at inference only the learned prior is used to sample trajectories.

Dual‑Axis Inference Expansion: Depth × Width

GRAM proposes two complementary extensions:

Depth extension (serial): increase the number of recursion steps, with adaptive computation time allowing each trajectory to stop early.

Width extension (parallel): sample multiple independent trajectories from the prior, decode a candidate answer from each, and select the best via majority voting or a learned latent‑process reward model (LPRM) that predicts trajectory quality.

Parallel sampling mitigates the latency bottleneck of deeper recursion by covering a larger solution space within the same wall‑clock time.

Experimental Evaluation

GRAM was evaluated on structured reasoning (Sudoku‑Extreme, ARC‑AGI), multi‑solution constraint satisfaction (N‑Queens, Graph Coloring), and unconditional generation (Sudoku, binarized MNIST).

Structured Reasoning

On Sudoku‑Extreme (9×9 with minimal clues) GRAM achieved 97.0% accuracy with 16 recursion steps and 20 parallel samples, far exceeding deterministic baselines TRM (90.5% at 320 steps) and HRM (55.0%). Similar gains were observed on ARC‑AGI.

Multi‑Solution Tasks

For 8×8 N‑Queens, GRAM maintained high accuracy and coverage across many sampled solutions, while deterministic models’ coverage dropped sharply. On Graph Coloring, GRAM reduced conflict edges to 2.7 (8‑node) and 3.3 (10‑node) compared to 19.0 and 61.3 for autoregressive baselines.

Unconditional Generation

GRAM generated valid Sudoku puzzles with 99.05% validity using only 10.9 M parameters and 16 supervised steps, surpassing the 55.1 M‑parameter D3PM diffusion model that required 1,000 denoising steps. On binarized MNIST, GRAM matched D3PM’s FID (73.34) and improved generation quality as recursion steps increased.

Ablation Studies

Ablations on Sudoku‑Extreme and N‑Queens showed that removing either the learned guidance direction or the stochastic component drastically harms performance (e.g., accuracy drops to 0% without guidance). Adding random decoding or initialization to deterministic baselines yielded no improvement, indicating that GRAM’s gains stem from the synergistic combination of variational framework, stochasticity, and learned guidance.

Conclusion

GRAM establishes probabilistic multi‑trajectory recursion as a design principle for future reasoning architectures. Its three core contributions are: (1) formalizing recursive reasoning as a latent variable generation process, (2) introducing width‑based parallel inference expansion, and (3) empirically validating the framework on diverse reasoning and generation benchmarks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

variational inferenceAI modelsstructured reasoningprobabilistic inferencerecursive reasoningparallel samplingGRAM
Machine Heart
Written by

Machine Heart

Professional AI media and industry service platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.