Operations 9 min read

Building a High‑Availability Wireless Test Cluster for Mobile Apps at Ant Financial

The article details Ant Financial's development of a highly available wireless test cluster that supports automated testing for its massive mobile app ecosystem, describing its architecture, data‑driven monitoring, full integration, and the All‑in‑One solution that enables rapid, cost‑effective iteration across dozens of services and IoT scenarios.

AntTech
AntTech
AntTech
Building a High‑Availability Wireless Test Cluster for Mobile Apps at Ant Financial

On November 16, at the inaugural Android Green Alliance Developer Conference, Ant Financial engineer Pei Yang presented a summary of the development and practice of the company's wireless test cluster.

The industry has accumulated many automated‑testing practices, and with the deeper integration of DevOps, automated and manual testing, diverse frameworks and design concepts emerge; supporting rapid business iteration, dynamic client components, IoT and human‑machine interaction scenarios has become a focal point.

Ant Financial leveraged a wireless experiment cluster to build an automation‑testing architecture for an application with 870 million active users, distilling concrete optimization solutions along the way.

1 High Availability

High availability is a prerequisite for supporting business services. The mobile cloud‑test platform provides the essential infrastructure for automated testing, requiring both high availability and flexible execution control to improve development efficiency and reduce costs.

1.1 Device Cluster

Mobile devices differ from traditional servers in reliability; by combining redundancy with intelligent scheduling, a service‑level agreement of four 9s is achieved through a bidirectional device‑task selection mechanism, real‑time anomaly detection, and dynamic task switching, addressing the finer granularity of automated tasks.

1.2 Data Analysis

Ensuring high availability involves unified monitoring of global resources and continuous analysis of telemetry data. The cloud‑test service reports hundreds of data points covering device health, task efficiency, and key‑scenario latency, and allows dynamic addition of business‑specific metrics via standard extension interfaces.

1.3 Continuous Improvement

Ongoing data analysis has uncovered numerous issues, leading to rapid fixes and weekly version iterations, embodying an agile, small‑step development model that sustains fast business iteration.

2 Full Integration

Effective service integration with business is the best measure of service value.

2.1 Service Standardization

The experiment cluster offers standardized RPC interfaces for internal domains and secure, reliable interfaces for isolated business domains, unified through a gateway that simplifies business onboarding while supporting isolation, rate‑limiting, and handling over 200 000 daily requests.

2.2 Service Personalization

Beyond standard solutions, the platform enables rapid assembly of personalized task flows, supporting over 200 business lines and 40+ applications. It also accommodates emerging scenarios such as facial recognition, IoT scanning, and screen casting by integrating custom MCU devices and industrial‑grade robots to simulate complex physical environments.

3 AIO (All‑in‑One) One‑Stop Solution

High‑availability services reduce operational costs and boost development efficiency. To manage growing resource demands, the team designed an AIO smart‑cabinet that integrates all capabilities in a modular fashion.

3.1 Elastic Scaling & On‑Demand Composition

Each AIO cabinet is a compact device cluster offering:

Custom MCU with 9 categories and 67 control commands covering data, voltage, current, temperature, and power management.

Controllable execution environment (light, electromagnetic shielding).

High‑precision power measurement (±0.01 C).

Link simulations (WLAN, 4G, weak‑network).

Device special‑state simulation and protection (low‑power sustain, charge‑discharge safeguards).

These capabilities can be combined or subsetted to meet specific business needs while controlling costs.

3.2 Service Pluginization & Hardware Modularity

The AIO cabinet hosts services as plugins; users can define new plugins for special devices or requirements. The core controller board is fully modular with rich GPIO extensions, enabling future IoT‑heavy scenarios.

To date, the AIO wireless experiment cluster has executed over 500 000 automated tasks, more than 4 million test runs, captured over 50 000 crashes, supported 40+ Alibaba Group apps, identified 2.5 million anomalies, completed 150 000 mini‑program reviews, and provided 4 000 hours of remote device sharing.

Now part of Ant Financial’s mPaaS, the AIO cluster powers the “MTP Mobile Testing Platform,” delivering end‑to‑end testing solutions that reduce resource investment while improving testing efficiency and quality.

operationsHigh AvailabilityAutomated Testingmobile testingcloud testingDevice Farm
AntTech
Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.