Artificial Intelligence 12 min read

Gemini 3.5 Flash Launches with 4× Speed, Beats Gemini 3.1 Pro in Coding Benchmarks

Google unveiled Gemini 3.5 Flash at I/O 2026, claiming roughly four times faster token output than comparable frontier models, half the price, and benchmark results that surpass its own Gemini 3.1 Pro in coding, agent, and multimodal tasks, while noting trade‑offs in deep reasoning and long‑context performance.

AI Insight Log

May 19, 2026

Gemini 3.5 Flash Launches with 4× Speed, Beats Gemini 3.1 Pro in Coding Benchmarks

Google introduced Gemini 3.5 Flash during the early‑morning session of I/O 2026, positioning it as a model that combines frontier‑level intelligence with exceptional speed and a lower price point.

Speed is the main theme

Output token rate about 4× faster than other frontier‑level models.

On Google’s Antigravity platform, custom inference tricks achieve up to 12× speed compared with direct API calls.

Official price is less than half of comparable frontier models.

For AI‑coding scenarios, token‑output latency is a critical bottleneck; faster models amplify the overall workflow experience.

Antigravity’s demo shows a one‑minute‑plus pipeline that generates pixel‑art assets from a photo, orchestrates multiple agents to write sprite‑registration logic, and runs automated browser‑based rendering tests, all completed in just over a minute.

Benchmarks: Flash overtakes Pro

Google released a side‑by‑side comparison of Gemini 3.5 Flash with Gemini 3, Gemini 3.1 Pro, Claude Sonnet 4.6, Claude Opus 4.7, and GPT‑5.5.

Coding

Terminal‑bench 2.1: 76.2% (3.1 Pro 70.3%, Claude Opus 4.7 66.1%, GPT‑5.5 78.2%).

SWE‑Bench Pro: 55.1% (3.1 Pro 54.2%, Claude Opus 4.7 64.3%, GPT‑5.5 58.6%).

Flash approaches GPT‑5.5 on terminal tasks and clearly outperforms 3.1 Pro and Claude Opus on these metrics.

Agent / Tool Use

MCP Atlas: 83.6% (highest, surpassing GPT‑5.5’s 75.3%).

Toolathlon: 56.5% (slightly above GPT‑5.5’s 55.6%).

OSWorld‑Verified: 78.4% (close to GPT‑5.5’s 78.7%).

Flash leads the MCP multi‑step workflow, making it a strong low‑cost option for developer‑focused agent tasks.

Multimodal

CharXiv Reasoning: 84.2%.

MMMU‑Pro: 83.6%.

Blueprint‑Bench 2: 33.6% (near GPT‑5.5’s 36.2%, far above 3.1 Pro’s 26.5%).

Flash holds the leading position in multimodal reasoning among the compared models.

Expert Tasks & Reasoning

Finance Agent v2: 57.9% (top of this comparison).

Humanity’s Last Exam: 40.2% (Claude Opus 4.7 leads with 46.9%).

ARC‑AGI‑2: 72.1% (GPT‑5.5 leads with 84.6%).

Flash does not surpass Claude Opus 4.7 or GPT‑5.5 in deep academic or abstract reasoning tasks.

Long Context

MRCR v2 (128k average): 77.3% (below 3.1 Pro’s 84.9%).

MRCR v2 (1M pointwise): 26.6% (slightly better than 3.1 Pro).

GPT‑5.5 dominates the 128k context benchmark with 94.8%.

Overall, Flash lowers price and speed barriers, achieving parity or superiority over its predecessor in coding and agent tasks, leading in multimodal reasoning, but still trails in deep reasoning and very long‑context scenarios.

Antigravity is another main line

Google packaged the model with its Agent‑first development environment Antigravity.

Limited‑time 12× speed : custom inference tricks make Flash on Antigravity three times faster than direct API calls.

Parallel sub‑agents : the primary agent can dispatch multiple browser sub‑agents to run tasks concurrently, such as testing different UI renderings.

Multi‑step workflow orchestration : combines the model’s MCP tool‑calling ability with Antigravity’s task scheduler.

Developers can shrink a task that previously took ten minutes to one or two minutes by parallelizing sub‑agents.

Gemini Spark: a 24/7 Private AI Agent on Gemini 3.5

Runs on Gemini 3.5.

Built on the Antigravity architecture.

Executes on dedicated Google Cloud VMs; user laptops can be shut down while tasks continue.

Deeply integrated with Google’s own tools, with future MCP‑based third‑party integrations.

Design includes explicit user confirmation before major actions.

Spark is currently available to trusted testers and will open to Google AI Ultra subscribers next week, starting in a single region.

Additional announcements

Gemini Omni : paid‑subscription feature supporting arbitrary combinations of text, images, and video.

Neural Expressive : visual‑language overhaul for Gemini App, updated on web and mobile.

Daily Brief : a daily‑report Agent experience inside the Gemini App.

MCP third‑party integration : Spark will later connect to external tool ecosystems via MCP.

Some evaluations

Programming‑related baseline tasks (terminal, terminal Agent, MCP tool use) indeed improve over 3.1 Pro.

Multimodal reasoning is the clear leader in this comparison.

Speed‑price combo offers strong competitiveness in Agent scenarios.

Deep‑reasoning tasks (SWE‑Bench Pro, Humanity’s Last Exam, ARC‑AGI‑2) still lose to Claude Opus 4.7 and GPT‑5.5.

Performance on 128k long‑context falls behind 3.1 Pro.

12× speed is a limited‑time Antigravity feature; direct API calls cannot achieve it.

Gemini 3.5 Pro has not been released, so only the lower bound of the 3.5 series is visible.

For developers, the most relevant aspect is MCP tool usage and Agent orchestration. Flash’s lead on MCP Atlas suggests it is a cost‑effective choice for writing MCP servers and chaining Agent workflows. If you already run lightweight models on Antigravity or other Agent frameworks, swapping in 3.5 Flash for a speed experiment is worthwhile.

Conclusion

Gemini 3.5 Flash does not dominate every frontier leaderboard, but Google has carefully balanced speed, price, and sufficient intelligence for practical development. Coupled with Antigravity and Spark, the release appears aimed at establishing a solid foundation for the upcoming Agent era rather than merely showcasing raw power.

The forthcoming 3.5 Pro next month will test the series’ upper limits; Flash sets a high baseline for the line.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Agent benchmark Gemini Multimodal Antigravity

Written by

AI Insight Log

Focused on sharing: AI programming | Agents | Tools

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.