Tagged articles
13 articles
Page 1 of 1
Machine Heart
Machine Heart
May 11, 2026 · Artificial Intelligence

How PRISM Enables Efficient Test‑Time Scaling for Discrete Diffusion Language Models

The article analyzes how the PRISM framework redesigns test‑time scaling for discrete diffusion language models by replacing costly Best‑of‑N sampling with a three‑stage hierarchical search, local branching via partial remasking, and self‑verified feedback, achieving large accuracy gains on math and code benchmarks while cutting inference compute by up to four‑fold.

Discrete DiffusionHierarchical SearchInference Optimization
0 likes · 11 min read
How PRISM Enables Efficient Test‑Time Scaling for Discrete Diffusion Language Models
Machine Heart
Machine Heart
Apr 25, 2026 · Artificial Intelligence

Open‑Source Models Dominate 21 Scientific Discovery Tasks with SimpleTES

The SimpleTES framework decomposes trial‑and‑error into three scalable dimensions—Concurrency, Length, and Candidates—enabling test‑time scaling that lets open‑source models outperform closed‑source rivals across 21 diverse scientific benchmarks, from LASSO regression to quantum circuit compilation.

AI for ScienceOpen-source modelsScientific Discovery
0 likes · 13 min read
Open‑Source Models Dominate 21 Scientific Discovery Tasks with SimpleTES
Machine Heart
Machine Heart
Apr 25, 2026 · Artificial Intelligence

Can Multi-Model Co-Evolution Shatter the Single-Model Ceiling? Squeeze Evolve Achieves Validator-Free SOTA Inference

The paper introduces Squeeze Evolve, a validator‑free multi‑model evolutionary framework that orchestrates diverse large language models to break the performance ceiling of any single model, delivering up to 23‑point accuracy improvements and 1.4‑3.3× cost reductions across math, vision, and scientific benchmarks.

AI researchInference OptimizationSqueeze Evolve
0 likes · 8 min read
Can Multi-Model Co-Evolution Shatter the Single-Model Ceiling? Squeeze Evolve Achieves Validator-Free SOTA Inference
Old Zhang's AI Learning
Old Zhang's AI Learning
Jan 27, 2026 · Artificial Intelligence

Qwen3‑Max‑Thinking Boosts Performance with Test‑Time Scaling—Why It Still Isn’t Open‑Source

Alibaba’s new Qwen3‑Max‑Thinking model adds inference‑time scaling and adaptive tool use, delivering large gains on math, coding, and agent benchmarks while remaining closed‑source, and it offers drop‑in OpenAI‑compatible API access at the cost of higher latency and token usage.

AI BenchmarkAdaptive Tool UseLarge Language Model
0 likes · 7 min read
Qwen3‑Max‑Thinking Boosts Performance with Test‑Time Scaling—Why It Still Isn’t Open‑Source
Data Party THU
Data Party THU
Oct 29, 2025 · Artificial Intelligence

Can Test-Time Scaling Unlock More Reliable Vision‑Language‑Action Robots?

The paper introduces RoboMonkey, a framework that applies a generate‑and‑verify paradigm and test‑time scaling to Vision‑Language‑Action models, showing that increasing sampling and verification at inference dramatically reduces action error across multiple VLA architectures, and presents scalable verifier training, synthetic data augmentation, and efficient deployment strategies.

AI researchAction VerificationRoboMonkey
0 likes · 8 min read
Can Test-Time Scaling Unlock More Reliable Vision‑Language‑Action Robots?
vivo Internet Technology
vivo Internet Technology
Aug 25, 2025 · Artificial Intelligence

How DiMo-GUI Boosts Multimodal LLMs for GUI Grounding Without Training

DiMo-GUI is a plug‑and‑play framework that dramatically improves multimodal large language models' ability to locate GUI elements by using a hierarchical dynamic visual reasoning loop and modality‑aware optimization, achieving up to double the performance on high‑resolution GUI benchmarks without any additional training data.

GUI groundingTest-Time Scalingdynamic visual reasoning
0 likes · 7 min read
How DiMo-GUI Boosts Multimodal LLMs for GUI Grounding Without Training
Kuaishou Large Model
Kuaishou Large Model
Jul 3, 2025 · Artificial Intelligence

How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search

The EvoSearch method introduced by HKUST and Kuaishou’s KuaLing team leverages test‑time scaling to dramatically improve diffusion‑based image and video generation without training, using evolutionary search along the denoising trajectory, achieving state‑of‑the‑art results on SD2.1, Flux‑1‑dev and other models.

Evolutionary SearchTest-Time ScalingVideo Generation
0 likes · 8 min read
How EvoSearch Boosts Image & Video Generation with Test‑Time Evolutionary Search
Kuaishou Tech
Kuaishou Tech
Jul 2, 2025 · Artificial Intelligence

How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search

EvoSearch, a test‑time evolutionary search method, dramatically improves image and video generation by increasing inference compute without extra training, outperforming existing scaling techniques on diffusion and flow models while maintaining robustness and diversity across multiple benchmarks.

AI researchEvolutionary SearchTest-Time Scaling
0 likes · 8 min read
How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search
Fighter's World
Fighter's World
Jun 28, 2025 · Artificial Intelligence

What Is the Generator‑Verifier Gap and Why It Matters for LLM Reasoning

The article explains the Generator‑Verifier Gap (GVG)—the asymmetry where verifying a solution is far cheaper than generating it—covers its origin, its impact on test‑time scaling for large language models, reinforcement‑learning approaches, and how the concept can shape agent architectures and AI product strategy.

Agent ArchitectureGenerator-Verifier GapLLM
0 likes · 21 min read
What Is the Generator‑Verifier Gap and Why It Matters for LLM Reasoning
AI Frontier Lectures
AI Frontier Lectures
Apr 4, 2025 · Artificial Intelligence

Why Test‑Time Scaling Is Revolutionizing LLM Reasoning in 2025

This article surveys the latest research on large language model reasoning, highlighting test‑time scaling methods, chain‑of‑thought variants, and novel inference‑time techniques that boost performance while exposing trade‑offs, costs, and future directions for AI developers.

AILLMTest-Time Scaling
0 likes · 26 min read
Why Test‑Time Scaling Is Revolutionizing LLM Reasoning in 2025
Architect
Architect
Feb 19, 2025 · Artificial Intelligence

Does Scaling Law Still Hold for Grok 3? A Deep Dive into LLM Training Economics

The article critically examines whether the pre‑training Scaling Law still applies to Grok 3, compares its compute usage and model size with DeepSeek and OpenAI models, evaluates the cost‑effectiveness of pre‑training, RL and test‑time scaling, and explores how these insights shape future large‑language‑model development strategies.

Grok-3Pre‑trainingRL scaling
0 likes · 11 min read
Does Scaling Law Still Hold for Grok 3? A Deep Dive into LLM Training Economics
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 29, 2024 · Artificial Intelligence

Decoding OpenAI o1: Test‑Time Scaling, PRM Search & Inference Strategies

This article analyses the training tricks behind OpenAI's o1 model, explaining test/inference‑time scaling laws, post‑training techniques, process‑supervised reward models (PRM), various inference‑time search methods, data‑collection pipelines, and the trade‑offs between allocating compute to pre‑training versus inference.

LLM inferenceOpenAI o1Reward Model
0 likes · 34 min read
Decoding OpenAI o1: Test‑Time Scaling, PRM Search & Inference Strategies