Applying Large Language Models for Automated Test Case Generation at KooJiaLe
This article describes how KooJiaLe, a leading 3D design company, built an AI‑powered platform that uses large language models to automate test case generation, detailing its workflow, generation modes, editing features, export options, optimization efforts, results, and remaining challenges.
Many software companies are now exploring the use of large language models (LLMs) to assist engineers throughout the software development lifecycle. This article reports on KooJiaLe, a leading 3D‑design enterprise, and its internal AI testing group’s attempts to generate test cases using LLMs.
The company believes that a private Retrieval‑Augmented Generation (RAG) system is essential because generic LLMs cannot cover specific business domains.
Improving test‑case authoring efficiency : Traditional manual test‑case writing consumes significant time. By automatically generating initial test cases with AI and then having testers review and refine them, the platform shortens preparation time and boosts testing efficiency.
2.2 Test‑case generation methods
Direct generation : Paste requirement text into the input box on the “Direct Generation” tab and click “Generate Test Cases”.
Image upload : Click “Upload Image” on the “Direct Generation” tab, adjust the recognized result, then generate test cases.
Free‑prompt generation : Paste requirements into the “Free Generation” tab, optionally edit the platform‑provided prompt, and generate test cases.
2.3 Test‑case edit, add, delete
The platform allows online editing of generated test cases, supporting addition and deletion operations (see screenshots).
2.4 Test‑case export
Direct import to internal test‑case management platform for review and test‑plan creation.
Export to XMind.
3. Test‑case generation workflow
The offline generation process includes: requirement storage, scheduled retrieval, preprocessing, prompt assembly, GPT service invocation, test‑case parsing, failure retry, task status update, and finally storing the test cases.
4. Tool optimization process
Initially, over 50% of generation tasks failed. The team performed extensive analysis and optimizations.
4.1 Root‑cause analysis
Service stability : Early reliance on a single GPT service caused failures when the service was unstable.
Input length limits : GPT’s token limit prevented handling large requirement texts.
Technical implementation : Front‑end issues caused browser‑level request blocking.
4.2 Handling service instability
Retry mechanism : Two additional requests on failure, though limited impact due to short retry intervals.
Introducing alternative models : Added Wenxin Yiyan and Minimax as backup engines; when GPT fails, the system switches to these models, improving reliability.
4.3 Handling length restrictions
Combined Wenxin Yiyan (for Chinese requirement understanding) with GPT (for test‑case generation) to bypass token limits while leveraging each model’s strengths.
4.4 Other optimizations
Encrypted user input on the front‑end to prevent XSS attacks.
Opened a free‑prompt feature, allowing users to fine‑tune prompts for better results.
5. Summary & Outlook
5.1 Achievements : Over 300 generation tasks have been created, producing more than 2,000 test cases with an 80%+ success rate, noticeably improving tester efficiency.
5.2 Limitations and issues :
Insufficient domain knowledge leads to incomplete or inaccurate test cases.
Limited handling of non‑functional requirements such as performance and security.
Complex scenarios may require deeper understanding beyond the AI’s capability.
Lack of effective evaluation metrics makes it hard to assess the usefulness of generated cases.
Overall, while the AI‑driven platform shows promising results, further research and enhancements are needed to address domain expertise, non‑functional testing, and evaluation challenges.
Continuous Delivery 2.0
Tech and case studies on organizational management, team management, and engineering efficiency
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.