10M‑Parameter Model Solves ARC and Sudoku – Bengio Team Bets on Multi‑Trajectory Reasoning

A 10‑million‑parameter GRAM model from Bengio, KAIST, Mila and NYU achieves 97% accuracy on Sudoku‑Extreme and competitive scores on ARC‑AGI tasks by replacing deterministic recursive updates with a probabilistic multi‑trajectory process, and extensive ablations show that both random guidance and depth‑supervised training are essential for its performance.

ARC‑AGIGRAMGenerative Recursive Reasoning

0 likes · 9 min read

10M‑Parameter Model Solves ARC and Sudoku – Bengio Team Bets on Multi‑Trajectory Reasoning

Machine Learning Algorithms & Natural Language Processing

May 22, 2026 · Artificial Intelligence

How a 10M‑Parameter Model Beats Large Models on Sudoku and ARC with Multi‑Trajectory Reasoning

The GRAM model introduced by Yoshua Bengio’s team replaces deterministic recursive updates with probabilistic multi‑trajectory sampling, enabling a 10 M‑parameter network to achieve 97 % accuracy on Sudoku‑Extreme, 52 %/11 % on ARC‑AGI, and near‑perfect results on N‑Queens and graph‑coloring, while also supporting unconditional generation tasks.

ARC‑AGIGRAMSudoku

0 likes · 9 min read

How a 10M‑Parameter Model Beats Large Models on Sudoku and ARC with Multi‑Trajectory Reasoning

Architect

Mar 17, 2025 · Artificial Intelligence

Can a 7B Language Model Solve Sudoku with Reinforcement Learning? Findings and Lessons

This article details a reinforcement‑learning experiment that teaches 7B‑ and 3B‑parameter language models to solve Sudoku, covering data preparation, GRPO‑based reward design, training configurations, performance comparisons, key insights, and future research directions.

GRPOLanguage ModelsModel Scaling

0 likes · 15 min read

Can a 7B Language Model Solve Sudoku with Reinforcement Learning? Findings and Lessons

Baobao Algorithm Notes

Mar 16, 2025 · Artificial Intelligence

Can a 7B LLM Master Sudoku From Scratch Using Reinforcement Learning?

This article details how a 7B parameter language model, fine‑tuned with DeepSeek's GRPO reinforcement‑learning algorithm and a carefully crafted multi‑component reward system, learned to solve Sudoku puzzles without any cold‑start data, outperforming a comparable 3B model and revealing key insights for structured reasoning tasks.

AI trainingGRPOQwen

0 likes · 15 min read

Can a 7B LLM Master Sudoku From Scratch Using Reinforcement Learning?

Python Crawling & Data Mining

Jul 11, 2021 · Artificial Intelligence

Can Python Solve Sudoku in Seconds? Automate with Selenium & Backtracking

Learn how to automate a web-based Sudoku game using Python's Selenium for data extraction, implement an efficient backtracking and bitwise algorithm to solve the puzzle, and programmatically fill the solution back into the browser, achieving performance improvements from 17 seconds to just over 3 seconds.

AutomationBacktrackingPython

0 likes · 11 min read

Can Python Solve Sudoku in Seconds? Automate with Selenium & Backtracking