Turning a Web‑Only StarAgent WebTerminal into a Full‑Featured CLI
This article details how the author refactored the StarAgent/Drogo WebTerminal from a purely web‑based UI into a stable CLI‑driven execution layer (wt), adding black‑screen commands (wsh/wcp), session reuse, file‑transfer APIs, interactive debugging, and Skill‑based operation guides so AI agents can dynamically run, observe, and iterate on remote troubleshooting tasks such as GPU‑hang analysis and coredump debugging.
Background: AI agents like Codex, Cursor, or Claude need a reliable execution interface; the existing WebTerminal forces manual clicks, cookie copying, and scrolling, which is unsuitable for automated reasoning.
The author’s goal was to turn the WebTerminal into an Agent‑friendly CLI (named wt) that abstracts the terminal as a stable execution surface while keeping authentication, audit, and heartbeat logic in the official web page.
Core redesign : Introduced black‑screen commands wsh (shell) and wcp (file copy) that bypass the UI and operate directly on the remote shell. Authentication is performed once via the browser (e.g., ./bin/wt auth login --target-ip x.y.z.w) and cached in ~/.drogo-webterminal-helper/direct/default.auth.json for subsequent commands.
Session design : wt session start launches or reuses a persistent Chromium instance; the CLI reuses the existing terminal instance from window.terminalMap[wsSessionId] and sends input via writeMsg2Session. This keeps the official login flow intact while allowing repeated commands without repeated logins.
Command execution : wt run sends a shell command, appends a unique marker ( printf '\n__WT_DONE___:%s\n' "$?") to reliably detect completion, and captures three output forms – raw ANSI log (*.raw.log), plain text (*.plain), and xterm snapshot (*.snapshot.json). Example GPU‑hang diagnostics are expressed as a series of wt run calls that the Agent can decide to extend.
File transfer : Replaced DOM‑based upload/download with WebTerminal’s file API ( openFileSystem, listFiles, downloadFile, uploadFile, heartbeat). Commands like ./bin/wt upload ./local.txt /home/admin/local.txt and ./bin/wt download /home/admin/remote.log captures/remote.log use the logged‑in browser context, ensuring authentication and enabling checksum verification (size/md5/sha256).
Interactive debugging : wt interact starts a local HTTP server that forwards POST /command, POST /send, POST /drain, etc., to the remote process, preserving state across requests. This enables step‑by‑step gdb, emacs, or other TUI debugging without restarting the session. The server runs single‑threaded (using HTTPServer) to avoid Playwright thread‑safety issues.
Acceptance cases :
Emacs + eshell + gdb demo: the Agent creates a C source file via raw key input, compiles it, triggers a segmentation fault, and then drives gdb through a series of /send commands (bt, frame 0, info locals, print item.id) to locate the null‑pointer dereference.
GPU‑hang analysis: the Agent runs a set of wt run commands to collect environment, process, and kernel logs, then decides—based on observed data—whether to attach to a GPU process, run gdb -p, or abort.
Skill concept : Skills are concise operation manuals (not magical scripts) that tell the Agent when to log in, which command to run, how to handle errors, and when to request human approval. They keep the CLI stable while allowing the knowledge base to evolve without code changes.
Design trade‑offs :
Do not replace the WebTerminal’s SSO flow with direct SSH; preserve governance.
Avoid hard‑coding scenario‑specific suites; let the Agent branch dynamically.
Use simple HTTP for the interactive control plane to keep debugging transparent; WebSocket can be added later if needed.
Retain interact‑script for repeatable smoke tests, but default to live HTTP for real‑time debugging.
Reusable patterns :
Separate execution abstraction from scenario logic.
Decouple authentication from command execution.
Persist raw, plain, and snapshot outputs for audit and replay.
Express troubleshooting expertise as Skills rather than hard‑coded code.
Design interactive programs as stateful services with a minimal HTTP API.
Protocol‑driven file transfer with checksum verification.
CLI capability snapshot (excerpt from Appendix A):
wt session # manage persistent WebTerminal browser session
wt status # view terminal status
wt run # execute a shell command and capture output
wt attach # raw TTY attach to WebTerminal
wt interact # start live HTTP interactive control plane
wt interact-script # run fixed expect‑style script
wt snapshot # get xterm buffer snapshot
wt ls-files # list files via WebTerminal API
wt download # download file via API
wt upload # upload file via API
wt direct-info # expose sanitized direct protocol infoConclusion: By turning the WebTerminal into a stable CLI and pairing it with Skills, the system enables AI agents to perform end‑to‑end remote debugging—observing, deciding, executing, and re‑observing—without manual web interaction, effectively evolving agents from chat‑only tools to capable engineers.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
