Operations 9 min read

How ICBC Boosted System Stability with Advanced Performance Capacity Testing

This article details ICBC Software Development Center's comprehensive approach to performance capacity testing, covering background challenges, a structured quality practice plan, enhanced test scope evaluation, result analysis, tool support, implementation outcomes, and future directions for ensuring system stability and scalability.

Efficient Ops
Efficient Ops
Efficient Ops
How ICBC Boosted System Stability with Advanced Performance Capacity Testing

Background and Challenges

In recent years, with the rapid development of cloud computing, micro‑services, and container technologies, transaction channels have become more diverse and convenient, leading to a rapidly growing user base and heightened expectations for user experience and service quality. The Industrial and Commercial Bank of China (ICBC) is undergoing digital transformation, which has increased the complexity of performance, security, and high‑availability testing, creating new challenges such as broader test scope assessment, lower test accuracy, and higher workload.

Performance Capacity Testing Quality Practice Plan

ICBC Software Development Center continuously explores and practices improvements in test scope evaluation , test process management , and tool support . By comprehensively assessing system performance and scalability under different loads, the center helps developers and operations personnel optimize system architecture , enhance stability , plan resource allocation , prevent bottlenecks , and improve user experience .

(1) Strengthen Test Scope Evaluation, Enhance Coverage

The center controls performance capacity test scope from three aspects: core scenario sorting , "entry" decision‑tree evaluation , and production operation analysis .

1. Sort core scenarios based on keywords and conduct routine guarding. By aligning with financial industry characteristics, scenarios such as accounting, regulatory, and transaction switching are identified, focusing on sensitive customer transactions, large‑amount fund transfers, and accounting processing. Scenario keywords are continuously refined as products evolve.

2. Build a performance capacity "entry" decision tree for comprehensive assessment. Experience‑based decision points (e.g., architecture changes, online batch processing) are summarized into a tree that guides developers to evaluate whether performance testing is needed for a given change.

3. Conduct performance capacity evaluation based on production business changes. During peak periods like Spring Festival, National Day, or major sales events, the expected business volume drives targeted capacity testing, and a risk‑rule library is maintained to automatically flag potential capacity issues.

Performance capacity "entry" decision‑tree illustration

(2) Strengthen Test Result Analysis, Solidify Execution

Refine performance test monitoring indicators. Application‑level, database‑level, and system‑level metrics are defined with evaluation thresholds to ensure comprehensive monitoring.

Strengthen multi‑type performance capacity test coverage. A combination of load, capacity, endurance, and stress tests is used to verify stability under various loads and durations, detecting issues such as memory leaks or thread exhaustion.

(3) Strengthen Tool Support, Improve Execution Efficiency

To better support evaluation and result analysis, the center built a next‑generation performance testing platform offering three core capabilities: performance risk assessment, performance monitoring analysis, and automated performance testing.

The automated testing workflow consists of 7 steps :

Scenario standardization : defines automated scripts, monitoring indicators, environment information, data‑generation scripts, and mock configurations.

Continuous integration of environment : pipelines deliver container images directly to the performance testing environment.

Mock auto‑deployment : mock packages are automatically deployed to target containers.

Performance data generation : pre‑test data generation and post‑test cleanup scripts run automatically.

Automated load test execution : scheduled automation triggers tests at configured times.

Performance metric monitoring : predefined metrics are collected automatically.

Performance result assertion : combines business, system, and database metrics to form an assertion model that automatically analyzes results and identifies capacity risks.

Implementation Results and Future Outlook

Through continuous improvements in test scope evaluation, result analysis, and tool support, the center has ensured stable operation of mobile banking (5.36 billion customers) and corporate online banking (14.43 million customers).

Looking ahead, the center will further optimize the performance capacity testing governance system and platform, explore the application of large models in testing, automatically identify inefficient programs from development and production data, and continuously train risk‑assessment models to safeguard production safety.

AutomationOperationsSoftware Engineeringperformance testingCapacity Planning
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.