Humans Achieve 100% While Top AI Models Score Below 0.4% on ARC‑AGI‑3 Benchmark
In the ARC‑AGI‑3 test, 486 random humans solved all 150+ game‑based puzzles with a perfect 100% success rate in a median of 7.4 minutes, whereas leading models such as GPT‑5, Claude Opus 4.6, Gemini 3.1 Pro and Grok 4.20 managed at most 0.37%, exposing a stark gap in meta‑cognitive reasoning.
