Operations 24 min read

How a Bank Transformed IT Ops with Automated DevOps and SRE Practices

This article outlines how China Merchants Bank’s data‑center application management team identified traditional financial IT operational pain points, introduced DevOps and SRE concepts, built non‑functional management frameworks, and implemented automated tooling, monitoring, and capacity‑scaling to achieve fully automated operations.

Efficient Ops
Efficient Ops
Efficient Ops
How a Bank Transformed IT Ops with Automated DevOps and SRE Practices

Preface

This article is based on a talk by Zhang Jianlin, head of Application Management at China Merchants Bank Data Center, at GOPS 2017 Shenzhen. It is divided into five topics.

Zhang Jianlin Head of Application Management, China Merchants Bank Data Center 20 years of operations development, massive O&M, and application planning experience. Currently focuses on automated lifecycle management of applications.

1. Traditional Financial IT Application Operations Pain Points

Traditional finance faces a bottleneck, while internet finance brings huge impact. System scale, component count, and user traffic grow non‑linearly, making traditional O&M costly and unable to cope.

Costly traditional O&M cannot handle non‑linear growth. The solution is to migrate mainframes and standardize full‑process O&M control.

Analysis shows that non‑functional issues cause about 30% of production incidents in 2016. Automation of application O&M is essential.

2. Ideological Collision

Traditional finance IT separates development and operations, with developers often “handing off” responsibilities.

Automation and process are not contradictory; the bank adopts DevOps‑style automation, standardizing non‑functional requirements and building tools to achieve automated O&M.

Both DevOps and SRE aim to strengthen collaboration; DevOps pushes automation from development, while SRE focuses on self‑service tools for operations.

The bank has integrated both philosophies into a continuous delivery pipeline.

3. Seeking Change in a Dilemma

3.1 Continuous Improvement via Non‑Functional Management

Automation of product release is the primary task, supported by a non‑functional management tool.

Improvement involves product specifications, architecture upgrades, process control, automated handover, and data analysis, forming a continuous improvement loop.

3.2 Non‑Functional Matrix

The matrix classifies applications by scenario, deployment platform, and non‑functional indicators.

Application scenarios: front‑end, back‑end, OLTP, OLAP.

Deployment platforms: 390, AS400, open platform.

Non‑functional indicators: availability, maintainability, capacity planning, technical specifications.

3.3 Non‑Functional Common Component Library

The library provides unified management of security, logging, messaging, data access, caching, and file management across different technologies and platforms.

Resource layer: split by department rather than individual systems.

Integrate public components into the automation platform for one‑click provisioning.

After API standardization, the platform supports full‑lifecycle automation.

3.4 Application Framework Design Optimization

Design principles include N+1 redundancy, rollback capability, feature toggle, built‑in monitoring, multi‑active centers, mature technology usage, asynchronous calls, statelessness, horizontal scaling, and cost‑effective commodity hardware.

3.5 Performance Capacity Dynamic Scaling

Scaling is described on three axes: X‑axis (identical replication), Y‑axis (functional segmentation), Z‑axis (customer‑based segmentation). Similar axes apply to data layer scaling.

3.6 Non‑Functional Process Control

Non‑functional control starts at project initiation, ensuring development and operations are synchronized and that all non‑functional requirements lead to automation.

3.7 Automation Tool Platform

The platform manages the entire data‑center lifecycle—from development to testing to O&M—covering release, scaling, routine operations, service requests, and fault handling.

Identify automation scenarios across business lines.

Build API‑based management services.

Integrate API gateway and workflow bus for rapid automation.

Standardize APIs to support automated applications.

3.8 Full‑Flow Application Monitoring

Real‑time monitoring covers web, middleware, application servers, and databases, providing end‑to‑end visibility of business flows and performance.

3.9 Performance Capacity Forecast

Collected performance and pricing data are fed into a big‑data platform where algorithms predict future capacity needs, enabling proactive scaling without extensive load testing.

4. Initial Results

The non‑functional management platform standardizes requirements, drives development compliance, and integrates with the automation tool platform for seamless production rollout.

Profile‑based application standardization captures product IDs, owners, components, and non‑functional attributes, facilitating automated governance.

Automated validation across staging and production ensures consistent quality before release.

5. The Road to Automated Operations

All non‑functional concepts—framework, architecture, capacity, monitoring, release—converge toward fully automated operations.

By consolidating requirements, resources, environment setup, component upgrades, monitoring, and logging into a single automated platform, the bank achieves true DevOps/SRE‑driven automation.

6. Summary

The bank has shifted from traditional financial IT to a non‑functional‑centric automated operations model, collaborating with development from the start, standardizing APIs, and continuously improving automation tools to realize end‑to‑end automated O&M.

automationdevopsSREPerformance ScalingIT Operations
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.