Operations 17 min read

How China Life Built a Self‑Developed Automated Ops Platform from Scratch

China Life’s Shanghai Data Center team transformed chaotic, multi‑system operations into a unified, automated platform by standardizing hardware, processes, and tools, leveraging OpenStack, Docker, Zabbix, and custom scripts, ultimately achieving efficient monitoring, change management, and a mobile‑enabled DevOps workflow.

Efficient Ops

Apr 16, 2017

How China Life Built a Self‑Developed Automated Ops Platform from Scratch

Preface

This article describes the key factors behind the successful development of China Life’s self‑built automated operations platform, outlining the journey from a fragmented environment to a standardized, intelligent ops system.

1. Pain Points of Operations

Pain point 1: Numerous systems

In the early stage, many branch data centers lacked proper cabling and used ad‑hoc fiber connections, leading to a chaotic infrastructure. Consolidating operations in the central data center introduced a large number of product lines, each with its own application and database instances, resulting in heavy batch workloads.

Pain point 2: Complex equipment

Branch data centers were equipped with a variety of hardware, while the headquarters data center was more standardized.

Pain point 3: Inconsistent standards

Operating systems, database versions, and middleware were heterogeneous (e.g., 32‑bit vs 64‑bit, RedHat 5.4 vs 5.8), making unified management difficult.

Pain point 4: Disordered operations

Manual procedures for database migration or large data transfers required ad‑hoc staffing, and changes were often made without a formal plan, leading to cascading incidents.

2. Transformation

Transformation 1: Standardization

The first goal was to migrate legacy X86 and U2L workloads to virtualized Linux on X86, reducing reliance on small‑scale servers. Most applications now run on Virtual Linux, with some Windows VMs, and an OpenStack cluster has been established.

Platform standardization

Hardware procurement, OS, database, middleware, and open‑source platform versions were unified to enable consistent management.

Application architecture standardization

Non‑clustered or single‑point applications are prohibited from going live; a “gentleman’s agreement” with development enforces this rule.

Operations process standardization

A structured workflow now covers resource onboarding/offboarding, change planning, and emergency change approval. Regular small changes occur on Wednesdays, major changes on Fridays, with a one‑week review period.

Operation flow standardization

Documentation is centralized, versioned, and change logs are recorded for auditability.

Transformation 2: Toolization

After standardization, a suite of internal tools was built:

Process management tool – unified workflow and permission requests.

Configuration management tool – manual CMDB updates integrated into the change process.

Monitoring tool – commercial software for infrastructure monitoring; custom development fills gaps in application monitoring.

Transformation 3: Autonomy

Commercial automation tools were too heavyweight and expensive, prompting a decision to develop in‑house solutions after regulatory encouragement for self‑developed systems.

OpenStack, VMware vSphere APIs, and custom scripts were combined with open‑source components such as Zabbix (monitoring), Activiti (workflow), and Rundeck (batch jobs). The platform now comprises roughly 200,000 lines of code across dozens of functional modules.

Key technologies include PrimeFaces for the web UI, jQuery Mobile for responsive design, Nginx for load balancing, Redis for caching, Tomcat for task scheduling, and multiple Zabbix proxies for massive metric collection.

The mobile app enables administrators to query CMDB data, restart VMs, and trigger emergency changes directly from a phone.

3. Future Direction: Integrated DCOS

The next step is to evolve the platform into a self‑developed, integrated Data Center Operating System (DCOS) that supports container‑based PaaS, intelligent self‑repair via Zabbix auto‑discovery, and seamless API‑driven management of images and resources.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Monitoring Automation cloud Ops Platform

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.