Qunar Ticket Test‑Environment Governance and Automated Monitoring Framework
This article describes Qunar Ticket’s comprehensive test‑environment governance framework, including the “Mirror‑Inspect” monitoring service, configuration and data synchronization strategies, and automated allocation management, highlighting how these practices reduced environment‑related project delays from up to 20% to below 8%.
Background
During projects, problems with test environments are a common source of delays, missed tests, and incorrect tests. Ensuring test environments are available, usable, and standardized is a universal expectation.
Container‑based frameworks can speed up environment construction, but limited hardware resources, build performance, complex calls and large data volumes make it difficult to achieve on‑demand, rapid, disposable environments for complex transaction systems.
Therefore, beyond fast environment provisioning, it is necessary to improve the stability and availability of existing environments.
Environment Governance Approach
The governance can be divided into six elements:
Environment availability
Proactive repair
Data standardization
Configuration standardization
Test code control
Reasonable resource allocation
Environment Availability
Availability includes deliverable readiness, component and machine health, smooth business links, continuous availability during use, and visual platform/report feedback.
During environment creation, monitoring servers and application start‑up success/failure provides basic availability feedback. Additional concerns are addressed by the “Mirror‑Inspect” service.
Mirror‑Inspect Service
The service provides two core functions: (1) “illuminate” anomalies and basic status of the environment, and (2) proactive repair.
Implementation: a Mirror‑Inspect server is deployed on each test machine, managed via Salt API. It starts a monitor that collects information such as VM status, service health, disk, load, memory, deployment and version details, and persists them. Automatically fixable issues are repaired.
Examples: automatically clean Tomcat logs when disk usage exceeds a threshold; restart Tomcat if the service is down and alert if restart fails; collect non‑master branch information and display it on the platform; perform business‑link validation and aggregate results for dashboards.
Architecture diagram:
Server deployment example:
Note: the HTTP endpoint provides detailed server and application information for the test environment.
Basic checks and repairs:
Results are displayed on the platform:
Configuration Synchronization Strategy
Hot‑config systems are increasingly used for lightweight business logic and feature toggles. Discrepancies between online and test configurations cause missed or erroneous tests.
Automatic synchronization leverages the fallback node of the hot‑config system. The steps are:
Move all files from the online node to the fallback node, making the online node empty so the system reads from the fallback node.
Delete all files under the test node, causing it to read the fallback node as well, achieving unified configuration.
For configurations that must differ, a second‑stage replacement is performed before Tomcat starts.
Modify test configurations either by copying files to the test node (not recommended) or via the Mirror‑Inspect “business configuration” page, which records project and environment info and applies changes automatically, with lifecycle management.
Second‑stage replacement implementation:
Salt scripts are used for scheduling the replacements.
Data Synchronization Scheme
Database synchronization follows four principles:
When creating an environment, copy schema and data from production.
When production SQL changes, proactively sync to test.
Replace test data with predefined masks during sync.
Consider data volume and use MySQL copy files for asynchronous large‑scale sync.
DBA assistance is required to provide safe interfaces for data extraction and change notifications.
Code and Environment Allocation Management
Qunar’s Odin Desktop integrates JIRA, code projects, users, and deployment operations, consolidating environment allocation and Mirror‑Inspect information.
When a branch is released and JIRA is closed, Odin Desktop listens for the closure, triggers Mirror‑Inspect configuration management and deployment interfaces, restores master code to the test environment, and synchronizes other environments that are not occupied.
Conclusion
Two key metrics demonstrate the impact of test‑environment governance at Qunar Ticket:
Support for dozens of development and test environments with fully unattended operation, automatic repair, allocation, isolation, integration, and proactive alerts.
Project delay caused by environment issues dropped from 15‑20% before 2018 to below 8% in 2019.
The core ideas are: one‑click Mirror‑Inspect integration, end‑to‑end unattended environment, reliable metrics, closed‑loop strategies, and platform‑based project cycle management.
Colleagues with similar needs are encouraged to share experiences.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.