Artificial Intelligence 9 min read

Webwright Lets Browser Agents Move Beyond Guessing the Next Click

Microsoft's newly open‑sourced Webwright framework replaces the traditional step‑by‑step LLM decision loop with code‑generated Playwright scripts, stores all state locally, achieves SOTA benchmark results on Online‑Mind2Web and Odysseys, integrates with major agent ecosystems, and offers auditability and reusable automation.

AI Engineering

May 28, 2026

Webwright Lets Browser Agents Move Beyond Guessing the Next Click

Core design: abandoning single‑step loops

Most browser agents follow a fixed pipeline—observe page state → predict next click or input → execute—invoking an LLM at every step. This works when LLMs are weak but becomes a bottleneck as code‑generation ability improves.

Webwright adopts a workflow that mirrors how engineers automate browsers:

Let the LLM directly generate runnable Playwright scripts, turning web actions into reusable Python programs.

Persist all artefacts—scripts, screenshots, logs—in a local workspace; the browser session can be started, inspected, or discarded at any time, rather than being the sole state carrier.

Maintain an ultra‑minimal architecture of three modules (~1,500 lines total): Runner (≈150 lines), Model Endpoint (≈550 lines), Environment (≈300 lines), depending only on httpx, pydantic, playwright, and typer.

The agent leaves behind a modifiable, shareable automation script instead of a one‑off execution trace.

Performance reaches SOTA level

Webwright achieved the best open‑source scores on two mainstream browser‑agent benchmarks under a 100‑step budget:

Online‑Mind2Web (300 real‑world tasks): GPT‑5.4 attained 86.7% accuracy, the highest among open‑source harnesses; Claude Opus 4.7 reached 84.7% and outperformed GPT‑5.4 on difficult cases (80.5% vs 76.6%).

Odysseys (200 long‑running tasks, average 76.1 steps): GPT‑5.4 achieved a 60.1% completion rate, improving the previous SOTA by 15.6 percentage points and surpassing the coordinate‑prediction baseline by 26.6 points.

A small model such as Qwen‑3.5‑9B, when paired with the provided tool scripts, reaches 66.2% completion on the hard cases of Online‑Mind2Web, enabling low‑cost deployment.

Odysseys long‑task evaluation comparison

Ecosystem integration and extra features

Claude Code: install via the plugin market and use /webwright:run for one‑off tasks or /webwright:craft to generate reusable parameterised scripts.

OpenAI Codex: after installing the plugin, invoke the agent with @webwright.

OpenClaw, Hermes Agent: share the same skill directory and load directly.

Two additional utilities are provided:

Task2UI mode – automatically renders task results as an interactive HTML app, eliminating manual visualisation work.

Full auditability – every run’s trace, screenshots, and logs are stored locally for debugging and replay.

Key differences from similar projects

Paradigm : Webwright treats the browser as a disposable runtime and the agent as a code‑generation engine, unlike Stagehand’s mixed code + NL primitive or agent‑browser’s CLI‑only approach.

Action space : Users write free‑form Python Playwright scripts, whereas others rely on predefined command sets or index‑based clicks.

State carrier : Webwright persists code, screenshots, and logs in a local workspace; other tools keep state inside the browser session.

Loop shape : Webwright follows a write‑code → execute → screenshot → fix‑code cycle, contrasting with the observe‑predict‑act loop of traditional agents.

Industry consensus: agents must leave the single‑step trap

Developers note that most automation bottlenecks lie in the decision loop, not click speed; compressing this gap creates a fundamentally new category.

Lakshman Turlapati, author of Full Self Browsing (FSB), affirms that agents should expose the full browser session, DOM, screenshots, and recovery mechanisms in a single control layer, exactly what Webwright provides.

Other engineers describe Webwright as the first streamlined, official solution for “coding agents” that they previously cobbled together with Copilot CLI + Playwright MCP.

Quick start

Basic run

Requirements: Python 3.10+, Playwright‑installed Chromium, and an API key for OpenAI/Anthropic/OpenRouter.

# Install
pip install -e .
playwright install chromium

# Run example task
python -m webwright.run.cli \
    -c base.yaml -c model_openai.yaml \
    -t "Search for flights from SEA to JFK on 2026-08-15 to 2026-08-20" \
    --start-url https://www.google.com/flights \
    --task-id demo_openai \
    -o outputs/default

Claude Code plugin installation

# Add plugin market
/plugin marketplace add microsoft/Webwright
# Install plugin
/plugin install webwright@webwright