Artificial Intelligence 10 min read

Real‑time Debugging Boosts the Effectiveness of AI‑Generated UI Automation Scripts

This article examines how integrating real‑time debugging with large‑model AI can dramatically improve the accuracy and success rate of automatically generated UI test scripts, presenting a LangChain‑based architecture, toolchain design, experimental results, and future challenges in AI‑driven UI automation.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Real‑time Debugging Boosts the Effectiveness of AI‑Generated UI Automation Scripts

In fast‑moving software development cycles, UI automation testing is essential for efficiency and quality, yet traditional methods struggle with complex, frequently changing interfaces, leading to high maintenance costs and low reliability.

The article introduces behavior‑driven development (BDD) with tools like Cucumber and Gherkin, highlighting their benefits for cross‑team communication while noting limitations such as the need for programmers to implement step definitions.

Recent advances in large‑model AI (e.g., GPT) enable automatic generation of test code from natural‑language specifications, but current outputs suffer from syntax errors and incomplete coverage, achieving less than 5% first‑try success.

To address these issues, the authors propose a system that incorporates real‑time debugging: AI not only generates code but also executes it on a device via adb, receives feedback, and iteratively refines the script. Using a LangChain framework, the system accesses two tool families—page‑information retrieval and debugging—to determine UI element IDs and validate generated actions.

Prompt engineering is crucial; detailed prompts guide the AI to consider page modules, component IDs, required actions, and output format, while the system’s function‑calling capability allows the AI to invoke specialized tools as needed.

By decomposing the overall task into smaller AI agents—each handling a specific sub‑task such as element identification or code debugging—the success rate of generated scripts rose from 5% to over 80% in experiments on Ctrip’s hotel order detail page.

Future work includes reducing large‑model invocation costs, handling more complex interactions, and further improving automation reliability to minimize human review.

AIUI AutomationTestingLangChainMobile TestingReal-time Debugging
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.