How a Bank Transformed IT Ops with Automated DevOps and SRE Practices
This article outlines how China Merchants Bank’s data‑center application management team identified traditional financial IT operational pain points, introduced DevOps and SRE concepts, built non‑functional management frameworks, and implemented automated tooling, monitoring, and capacity‑scaling to achieve fully automated operations.
Preface
This article is based on a talk by Zhang Jianlin, head of Application Management at China Merchants Bank Data Center, at GOPS 2017 Shenzhen. It is divided into five topics.
Zhang Jianlin Head of Application Management, China Merchants Bank Data Center 20 years of operations development, massive O&M, and application planning experience. Currently focuses on automated lifecycle management of applications.
1. Traditional Financial IT Application Operations Pain Points
Traditional finance faces a bottleneck, while internet finance brings huge impact. System scale, component count, and user traffic grow non‑linearly, making traditional O&M costly and unable to cope.
Costly traditional O&M cannot handle non‑linear growth. The solution is to migrate mainframes and standardize full‑process O&M control.
Analysis shows that non‑functional issues cause about 30% of production incidents in 2016. Automation of application O&M is essential.
2. Ideological Collision
Traditional finance IT separates development and operations, with developers often “handing off” responsibilities.
Automation and process are not contradictory; the bank adopts DevOps‑style automation, standardizing non‑functional requirements and building tools to achieve automated O&M.
Both DevOps and SRE aim to strengthen collaboration; DevOps pushes automation from development, while SRE focuses on self‑service tools for operations.
The bank has integrated both philosophies into a continuous delivery pipeline.
3. Seeking Change in a Dilemma
3.1 Continuous Improvement via Non‑Functional Management
Automation of product release is the primary task, supported by a non‑functional management tool.
Improvement involves product specifications, architecture upgrades, process control, automated handover, and data analysis, forming a continuous improvement loop.
3.2 Non‑Functional Matrix
The matrix classifies applications by scenario, deployment platform, and non‑functional indicators.
Application scenarios: front‑end, back‑end, OLTP, OLAP.
Deployment platforms: 390, AS400, open platform.
Non‑functional indicators: availability, maintainability, capacity planning, technical specifications.
3.3 Non‑Functional Common Component Library
The library provides unified management of security, logging, messaging, data access, caching, and file management across different technologies and platforms.
Resource layer: split by department rather than individual systems.
Integrate public components into the automation platform for one‑click provisioning.
After API standardization, the platform supports full‑lifecycle automation.
3.4 Application Framework Design Optimization
Design principles include N+1 redundancy, rollback capability, feature toggle, built‑in monitoring, multi‑active centers, mature technology usage, asynchronous calls, statelessness, horizontal scaling, and cost‑effective commodity hardware.
3.5 Performance Capacity Dynamic Scaling
Scaling is described on three axes: X‑axis (identical replication), Y‑axis (functional segmentation), Z‑axis (customer‑based segmentation). Similar axes apply to data layer scaling.
3.6 Non‑Functional Process Control
Non‑functional control starts at project initiation, ensuring development and operations are synchronized and that all non‑functional requirements lead to automation.
3.7 Automation Tool Platform
The platform manages the entire data‑center lifecycle—from development to testing to O&M—covering release, scaling, routine operations, service requests, and fault handling.
Identify automation scenarios across business lines.
Build API‑based management services.
Integrate API gateway and workflow bus for rapid automation.
Standardize APIs to support automated applications.
3.8 Full‑Flow Application Monitoring
Real‑time monitoring covers web, middleware, application servers, and databases, providing end‑to‑end visibility of business flows and performance.
3.9 Performance Capacity Forecast
Collected performance and pricing data are fed into a big‑data platform where algorithms predict future capacity needs, enabling proactive scaling without extensive load testing.
4. Initial Results
The non‑functional management platform standardizes requirements, drives development compliance, and integrates with the automation tool platform for seamless production rollout.
Profile‑based application standardization captures product IDs, owners, components, and non‑functional attributes, facilitating automated governance.
Automated validation across staging and production ensures consistent quality before release.
5. The Road to Automated Operations
All non‑functional concepts—framework, architecture, capacity, monitoring, release—converge toward fully automated operations.
By consolidating requirements, resources, environment setup, component upgrades, monitoring, and logging into a single automated platform, the bank achieves true DevOps/SRE‑driven automation.
6. Summary
The bank has shifted from traditional financial IT to a non‑functional‑centric automated operations model, collaborating with development from the start, standardizing APIs, and continuously improving automation tools to realize end‑to‑end automated O&M.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.