Webwright Lets Browser Agents Move Beyond Guessing the Next Click

Microsoft's newly open‑sourced Webwright framework replaces the traditional step‑by‑step LLM decision loop with code‑generated Playwright scripts, stores all state locally, achieves SOTA benchmark results on Online‑Mind2Web and Odysseys, integrates with major agent ecosystems, and offers auditability and reusable automation.

AI Engineering
AI Engineering
AI Engineering
Webwright Lets Browser Agents Move Beyond Guessing the Next Click

Core design: abandoning single‑step loops

Most browser agents follow a fixed pipeline—observe page state → predict next click or input → execute—invoking an LLM at every step. This works when LLMs are weak but becomes a bottleneck as code‑generation ability improves.

Webwright adopts a workflow that mirrors how engineers automate browsers:

Let the LLM directly generate runnable Playwright scripts, turning web actions into reusable Python programs.

Persist all artefacts—scripts, screenshots, logs—in a local workspace; the browser session can be started, inspected, or discarded at any time, rather than being the sole state carrier.

Maintain an ultra‑minimal architecture of three modules (~1,500 lines total): Runner (≈150 lines), Model Endpoint (≈550 lines), Environment (≈300 lines), depending only on httpx, pydantic, playwright, and typer.

The agent leaves behind a modifiable, shareable automation script instead of a one‑off execution trace.

Performance reaches SOTA level

Webwright achieved the best open‑source scores on two mainstream browser‑agent benchmarks under a 100‑step budget:

Online‑Mind2Web (300 real‑world tasks): GPT‑5.4 attained 86.7% accuracy, the highest among open‑source harnesses; Claude Opus 4.7 reached 84.7% and outperformed GPT‑5.4 on difficult cases (80.5% vs 76.6%).

Odysseys (200 long‑running tasks, average 76.1 steps): GPT‑5.4 achieved a 60.1% completion rate, improving the previous SOTA by 15.6 percentage points and surpassing the coordinate‑prediction baseline by 26.6 points.

A small model such as Qwen‑3.5‑9B, when paired with the provided tool scripts, reaches 66.2% completion on the hard cases of Online‑Mind2Web, enabling low‑cost deployment.

Odysseys long‑task evaluation comparison
Odysseys long‑task evaluation comparison
Online‑Mind2Web evaluation comparison
Online‑Mind2Web evaluation comparison

Ecosystem integration and extra features

Claude Code: install via the plugin market and use /webwright:run for one‑off tasks or /webwright:craft to generate reusable parameterised scripts.

OpenAI Codex: after installing the plugin, invoke the agent with @webwright.

OpenClaw, Hermes Agent: share the same skill directory and load directly.

Two additional utilities are provided:

Task2UI mode – automatically renders task results as an interactive HTML app, eliminating manual visualisation work.

Full auditability – every run’s trace, screenshots, and logs are stored locally for debugging and replay.

Key differences from similar projects

Paradigm : Webwright treats the browser as a disposable runtime and the agent as a code‑generation engine, unlike Stagehand’s mixed code + NL primitive or agent‑browser’s CLI‑only approach.

Action space : Users write free‑form Python Playwright scripts, whereas others rely on predefined command sets or index‑based clicks.

State carrier : Webwright persists code, screenshots, and logs in a local workspace; other tools keep state inside the browser session.

Loop shape : Webwright follows a write‑code → execute → screenshot → fix‑code cycle, contrasting with the observe‑predict‑act loop of traditional agents.

Industry consensus: agents must leave the single‑step trap

Developers note that most automation bottlenecks lie in the decision loop, not click speed; compressing this gap creates a fundamentally new category.

Lakshman Turlapati, author of Full Self Browsing (FSB), affirms that agents should expose the full browser session, DOM, screenshots, and recovery mechanisms in a single control layer, exactly what Webwright provides.

Other engineers describe Webwright as the first streamlined, official solution for “coding agents” that they previously cobbled together with Copilot CLI + Playwright MCP.

Quick start

Basic run

Requirements: Python 3.10+, Playwright‑installed Chromium, and an API key for OpenAI/Anthropic/OpenRouter.

# Install
pip install -e .
playwright install chromium

# Run example task
python -m webwright.run.cli \
    -c base.yaml -c model_openai.yaml \
    -t "Search for flights from SEA to JFK on 2026-08-15 to 2026-08-20" \
    --start-url https://www.google.com/flights \
    --task-id demo_openai \
    -o outputs/default

Claude Code plugin installation

# Add plugin market
/plugin marketplace add microsoft/Webwright
# Install plugin
/plugin install webwright@webwright

Related links

Webwright GitHub repository: https://github.com/microsoft/webwright

Webwright official blog: https://www.microsoft.com/en-us/research/articles/webwright-a-terminal-is-all-you-need-for-web-agents/

FSB website: https://full-selfbrowsing.com/agents

FSB GitHub repository: https://github.com/lakshmanturlapati/FSB

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonMicrosoftPlaywrightbrowser agentsLLM automationWebwrightSOTA benchmarks
AI Engineering
Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.