Zero‑Code Multi‑Platform Regression Testing Powered by Xmind and AI Visual Recognition
The article explains how a B2B team transformed its three‑platform regression workflow—from manual, repetitive clicks to a reusable, AI‑driven testing asset—by converting Xmind mind‑maps into Midscene YAML scripts, eliminating selector maintenance and enabling zero‑code test case creation across PC, app and mini‑program environments.
Background: Why This Was Needed
In a B2B product with PC backend, app, and mini‑program, each release is shipped in parallel, causing three major pain points: manual regression requiring repeated clicks on dozens of test cases, frequent bug regressions because fixes are not covered in subsequent cycles, and "test‑once‑forget" self‑testing that never gets reused.
The core goal is to turn regression testing from a labor‑intensive activity into a long‑term asset that can be written once and reused indefinitely.
Why Now
Two conditions made a self‑built automation platform viable: AI visual recognition removed the selector‑maintenance ceiling, and the high regression pressure from three‑platform parallel development amplified the ROI of automation.
Two Core Requirements for a Test Framework
High fault tolerance : the framework must be insensitive to UI changes, breaking the "write script → script breaks → fix script" loop.
Low entry barrier : both developers and testers should be able to use it without learning a new API.
Framework Selection: Why Midscene?
Horizontal Comparison of Mainstream Solutions
The team evaluated three categories—Playwright/Appium, record‑and‑play tools (e.g., Selenium IDE), and AI visual‑recognition solutions like Midscene—against long‑term maintenance cost and developer onboarding difficulty.
Playwright/Appium : mature ecosystem and stable API, but heavy reliance on selectors makes scripts fragile after UI changes.
Record‑and‑play (Selenium IDE) : quick to start and no code required, yet scripts are tightly coupled to the DOM and have poor reusability.
AI visual‑recognition (Midscene) : UI‑change tolerant, natural‑language driven, low maintenance, but incurs slightly higher execution cost and requires prompt tuning for complex scenarios.
The conclusion was to choose the option with the lowest overall cost. While Playwright scripts are fast to write, their maintenance dominates costs over a year. Midscene’s AI approach eliminates selector maintenance, making "write once, reuse long‑term" feasible with only a modest increase in execution cost.
Core Advantage: Code‑Level Differences
Traditional Playwright code (selector‑based) :
// Changing a style or text can break the script
await page.locator('.ant-modal-content >> .ant-btn-primary').click();
await page.locator('div.order-list > div:nth-child(3) span.price').innerText();
await page.waitForSelector('.ant-message-success');Even with semantic APIs like getByRole('button', { name: '确认' }), UI text changes still require script updates, so maintenance costs remain.
Midscene YAML (natural‑language driven) :
# The same logic works as long as the UI semantics stay the same
- aiAct: 点击"确认"按钮
- aiAssert: 第三个订单的价格是 99 元
- aiAssert: 出现"操作成功"的提示When the UI changes but semantics stay the same, the script does not need modification, dramatically reducing maintenance effort and lowering the learning curve because developers no longer need to write page.locator or getByRole calls.
How Midscene Works
The execution loop is: Screenshot → Multimodal model recognition → Action command output → Execution . Midscene uses a pure‑visual approach, feeding a screenshot and a natural‑language instruction to a large multimodal model, which returns the target element’s coordinates. Those coordinates are then translated into real actions (click, input, swipe) for the browser or device.
This visual‑only strategy detaches the system from DOM or accessibility trees, allowing it to run on any screenshot‑capable target—Web, app, mini‑program, Electron desktop, or even screen‑mirrored devices—making it the unified engine for the three platforms.
Overall Solution Design
Design Goals
Environment‑agnostic : no local tool installation required; the platform is ready‑to‑use out of the box.
Zero‑code test case creation : while Midscene still expects YAML, the team leverages AI to generate YAML from Xmind mind‑maps, achieving near‑zero manual scripting.
Full‑environment support : the platform handles PC, app, and mini‑program test cases uniformly.
Layered Architecture
1. DSL Input Layer : Converts test intent into a structured format. The team’s testers already maintain cases in Xmind; a built‑in xmind2yaml parser (using adm-zip and fast-xml-parser) extracts the mind‑map, builds an AST, and produces Midscene DSL via two pathways: [Template] nodes generate YAML for common actions (list, pagination, form submit) instantly. [Flow] nodes invoke LLMs to translate complex business steps into aiAct / aiAssert / aiQuery commands.
This semantic conversion layer is decoupled from execution details.
2. Platform Management Layer (Central Control) : Built with Egg.js, Sequelize, and socket.io, it provides:
Case management (CRUD, versioning, Xmind file hosting).
Task scheduling with a dual‑model system ( AitestTask + AitestSession) and a state machine (pending / running / success / failure / cancelled), pushing real‑time progress via socket.io.
Report generation (HTML/JSON) with screenshots and AI decision logs for failed steps.
3. Execution Layer : Supports a web executor (Playwright pool) and an Electron‑based desktop executor. It pools tasks and sessions for stable long‑running runs.
4. Whistle Multi‑Instance Management : To isolate mock rules when multiple users share a machine, the platform dynamically allocates ports 8800–8899, launches an independent Whistle instance per task, recycles ports after execution, and provides rule templates for one‑click mock setup.
5. Driver Layer : Wraps platform‑specific drivers ( @midscene/cli, @midscene/web) for Web (Playwright), Android (ADB), and iOS (WDA), exposing visual understanding and intelligent driving capabilities.
Platform Usage Flow
Four steps constitute a complete regression run:
Case Management : Upload and organize all Xmind‑derived cases on the platform.
YAML Script : Automatically convert Xmind to Midscene YAML, visible as aiAct, aiAssert, aiQuery entries.
Execution Panel : Select case and target environment (PC/app/mini‑program) and trigger execution; progress, AI decisions, and screenshots stream live via socket.io.
Result Review : After completion, a structured report shows pass rate, failure screenshots, AI logs, and assertion details, enabling instant root‑cause analysis without manual reproduction.
R&D Self‑Testing Exploration
Problem: Does R&D Need Xmind?
Initially Xmind helped testers create cases without code. For developers, the extra step of drawing Xmind → converting to YAML added overhead, duplicated effort, and created adoption friction.
The team is experimenting with a layered approach: testers keep using Xmind for release regression, while developers generate self‑test cases directly from code changes using an AI‑coding pipeline.
Trial: Claude Generates YAML Directly
The YAML generation step is mounted as a Claude skill (AI Coding sub‑agent). Input consists of the requirement document and the current branch’s code diff; Claude identifies affected pages, interactions, and assertions, then outputs a ready‑to‑run Midscene YAML file.
Trial Value
Eliminate duplicate expression : developers no longer need to draw Xmind before testing.
Make self‑testing reusable : ad‑hoc manual checks become executable YAML cases with persistent reports.
Lower adoption barrier : no new tooling to learn; the AI coding flow simply adds a YAML‑generation step.
If this workflow matures, the platform will evolve from a pre‑release regression tool to a continuous quality‑assist feature throughout development.
Landing Results
Current Adoption
The platform is fully deployed for APP release regression, covering all core modules. Developers and testers are "hands‑off": the platform triggers runs, produces traceable reports, and frees human attention from manual clicking, effectively machine‑izing the quality gate before release.
Conclusion
The platform’s value lies not only in automating a few regression paths but in converting manual, experience‑driven testing into an executable, traceable, and reusable quality asset. While regression testing is the immediate win, the ongoing R&D self‑testing trial aims to embed quality assurance into every development cycle.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
大转转FE
Regularly sharing the team's thoughts and insights on frontend development
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
