How to Empower AI with Agent‑Browser: Full Command Guide & Real‑World Use Cases
This article introduces agent‑browser, a CLI tool that lets large‑language‑model agents control browsers, explains its 15+ command categories, demonstrates navigation, data extraction, smart waiting, screenshot annotation, authentication, multi‑tab sessions, network interception, batch execution, and shows three practical scenarios for testing, scraping, and front‑end debugging.
Overview
agent-browser (GitHub: https://github.com/vercel-labs/agent-browser) provides a CLI that lets AI agents control a Chromium browser. It parses pages into an Accessibility Tree, assigns stable element identifiers ( @eN), and exposes commands for navigation, interaction, data extraction, waiting, screenshot, session management, multi‑tab, network interception, and batch execution.
Key Concepts
Element identifiers – After snapshot, each interactive node receives an identifier such as @e1, @e2. Subsequent commands can refer to these IDs directly, avoiding CSS selectors.
Command Reference
Basic navigation & interaction
agent-browser open https://example.com
agent-browser snapshot
agent-browser click @e2
agent-browser fill @e3 "[email protected]"
agent-browser press Enter
agent-browser screenshot page.png
agent-browser closeInformation extraction
agent-browser get text @e1
agent-browser get html @e1
agent-browser get value @e3
agent-browser get title
agent-browser get url
agent-browser get attr @e1 hrefSmart waiting
agent-browser wait "#loading"
agent-browser wait 2000
agent-browser wait --text "加载完成"
agent-browser wait --url "**/dashboard"
agent-browser wait --load networkidleScreenshot & annotation
agent-browser screenshot
agent-browser screenshot --full
agent-browser screenshot --annotate
agent-browser pdf report.pdfCookie & authentication management
# List Chrome profiles
agent-browser profiles
# Reuse a profile (preserves login state)
agent-browser --profile Default open https://gmail.com
# Persistent session across restarts
agent-browser --session-name myapp open https://myapp.com
# Secure credential vault (encrypted, invisible to the agent)
agent-browser auth save mysite
agent-browser auth login mysiteMulti‑tab & multi‑session
# Open new tabs
agent-browser tab new https://docs.example.com
agent-browser tab new --label api https://api.example.com
agent-browser tab api # switch by label
# Isolated sessions
agent-browser --session agent1 open https://site-a.com
agent-browser --session agent2 open https://site-b.com
agent-browser session listNetwork interception & mocking
# Mock API response
agent-browser network route "*/api/user" --body '{"name":"test"}'
# Abort ad requests
agent-browser network route "*/ads/*" --abort
# List failing requests
agent-browser network requests --status 4xx
# Record HAR
agent-browser network har startBatch execution
agent-browser batch \
"open https://example.com" \
"wait --load networkidle" \
"snapshot -i" \
"screenshot result.png"Browser modes
Headless Chromium (default) – runs without UI, suitable for CI/CD and background automation.
Headed mode ( --headed) – opens a visible window for debugging or demos.
Remote cloud browsers – connect to services such as Browserless, Browserbase, or AWS AgentCore via -p flag, enabling massive concurrency without a local browser.
agent-browser -p browserless open https://example.com
agent-browser -p browserbase open https://example.com
agent-browser -p agentcore open https://example.comInstallation
npm i -g agent-browser
agent-browser install
npx skills add vercel-labs/agent-browser@agent-browser -g -yPractical scenarios
Scenario 1 – Automated web‑app testing
Goal: verify the login flow of https://ruoyi.eleadmin.com/ with a single prompt.
Open the login page.
Take a snapshot and locate the username and password fields.
Fill in test credentials.
Click the login button.
Wait for navigation to the dashboard.
Capture a screenshot to confirm successful login.
Scenario 2 – Bulk data extraction
Task: scrape pricing tables from three Chinese LLM providers.
请使用 agent-browser 完成以下任务:
1. 打开 https://platform.minimaxi.com/docs/pricing/overview
2. 打开 https://open.bigmodel.cn/pricing
3. 打开 https://dashscope.console.aliyun.com/billing
对每个页面执行:
- 等待加载完成
- snapshot 并定位定价表格
- 提取模型名称、输入价格、输出价格等字段
- 如有多个模型版本,全部提取
- screenshot 保存为 pricing_厂商名_日期.png
最终汇总为 Markdown 表格并保存为 大模型定价对比.mdScenario 3 – Front‑end development debugging
After building a feature, let the agent launch the local dev server, verify functionality, and collect performance metrics.
# Measure core web vitals
agent-browser vitals http://localhost:3000
# Enable React DevTools integration
agent-browser open --enable react-devtools http://localhost:3000
agent-browser react tree
agent-browser react inspect <fiberId>Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Software Product Manager
Daily updates of Xiaomi's latest AI internal materials
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
