How Ping An Built a Scalable Mesos + Marathon PaaS for Financial Services
This article details how Ping An Technology designed and operated the Padis platform—an internal Docker‑based PaaS that evolved to Mesos + Marathon, providing independent container IPs, automated networking, load balancing, CMDB integration, monitoring, and log aggregation to support thousands of containers in production financial applications.
1. The Growth Story of the Padis Platform
Padis (Ping An Distribution) is an internally developed Docker‑based PaaS platform built on Mesos + Marathon. Its key features include container creation and scaling, dynamic independent IP per container, soft load‑balancing, and automated operations such as domain resolution, CMDB entry, and monitoring configuration.
2. From Docker to Mesos + Marathon
Early on, rapid environment provisioning required manual OS and middleware installation, leading to high operational risk. Docker introduced portable images, but single‑host Docker lacked clustering, health checks, and independent IPs, which caused compatibility issues with traditional three‑tier financial applications.
To address these gaps, Padis adopted Mesos + Marathon, enabling cluster‑wide resource management, rapid scaling, health monitoring, placement constraints, event subscriptions, and extensions like Mesos‑DNS and Mesos‑LB.
3. Adapting the MM Framework for Traditional Applications
Financial applications heavily depend on fixed IP addresses, which Docker’s default bridge network could not provide. Padis implemented an independent IP solution by extending the network module: using Linux bridge or OpenVSwitch to assign VLAN‑specific IPs to containers, and automating IP allocation via an Ansible‑driven message center.
The platform manages an IP pool with three states (allocated, free, reserved) and automatically reassigns IPs on container restart, ensuring firewall rules remain stable.
Load‑balancing integrates hardware (F5) and software (LVS, HAProxy) solutions, with soft load‑balancers deployed as containers that automatically update backend configurations during scaling events.
4. Expanding Platform Automation
Padis adds CMDB integration, automatically syncing IP information to Ping An’s central CMDB on every scale‑out or scale‑in.
Monitoring covers application performance via Zabbix, Mesos/Marathon component metrics, and host CPU/memory usage.
Log aggregation is achieved by mounting shared storage into containers; logs are then forwarded to the internal ELK platform for analysis and alerting.
Additional features include WebSSH, one‑click environment cloning, and NAS storage mounting.
5. Outcomes and Production Impact
Today the platform runs two production clusters (internal ServerFarm and external DMZ) each with ~1000 CPU cores, >6 TB memory, and capacity for 3000‑4000 containers. Critical financial services such as Ping An WiFi, Gold Manager, and others operate on this infrastructure, handling high‑traffic events like a fund‑sale promotion that reached 1 billion yuan in sales within a minute.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.