Cloud Computing 19 min read

How Hengfeng Bank Built a High‑Availability OpenStack Cloud for Financial Services

This article details Hengfeng Bank's practical experience with OpenStack, covering why the bank chose the open‑source cloud platform, its multi‑site deployment architecture, high‑availability design, management practices, and lessons learned from operating a large‑scale financial cloud environment.

Efficient Ops
Efficient Ops
Efficient Ops
How Hengfeng Bank Built a High‑Availability OpenStack Cloud for Financial Services

1. OpenStack Status at Hengfeng Bank

Hengfeng Bank, one of China's 13 joint‑stock commercial banks, operates multiple OpenStack clusters across two regions and three data centers, supporting both production and testing environments, multi‑tenant isolation, and more than 200 applications running on over 6,000 virtual machines.

Key features include independent OpenStack instances per network zone, a hyper‑converged architecture with pure‑SSD Ceph storage, and integration with Cisco SDN for dynamic VxLAN binding and port migration.

2. Why Choose OpenStack

The bank prefers OpenStack because it is open source, vendor‑agnostic, and avoids lock‑in; the community offers a large ecosystem, mature codebase, and a robust governance model with thousands of developers worldwide.

OpenStack's licensing model reduces service‑fee costs, and its modular architecture allows the bank to customize and extend components as needed.

3. Deploying OpenStack

3.1 How Hengfeng Deploys OpenStack

The deployment uses separate control and compute nodes, with additional DHCP agents and Ceph monitors placed on compute nodes to avoid two‑layer network issues. The architecture spans two data centers with symmetric hardware for high availability, employing three control nodes for quorum and a dedicated arbitration node.

3.2 High‑Availability Applications

Control nodes run HAProxy with a primary‑plus‑two‑standby configuration and a virtual IP for API traffic. Database services use Galera clustering across three nodes, providing automatic failover without manual intervention. Memcached also operates in a primary‑standby mode.

4. Managing OpenStack

4.1 Management Approach

The bank isolates fault domains to prevent failures in one network zone from affecting others, and limits cluster size to around 1,000 physical machines for performance reasons. A single Keystone service manages multiple OpenStack clusters, and two identical Ceph clusters provide storage redundancy across data centers.

4.2 Operational Practices

Comprehensive monitoring covers network, compute, and storage layers; smokeping is used to detect latency issues. The team runs simulated banking workloads across the clouds to validate end‑to‑end functionality. Configuration management is handled with Puppet, and all OpenStack code is sourced from the upstream community via GitLab/GitHub, ensuring a consistent baseline.

5. Conclusion

The bank does not rely on any vendor‑specific OpenStack distribution; instead, it customizes the upstream community version, applying patches as needed while contributing back to the ecosystem. This approach avoids lock‑in and maintains control over the cloud stack.

Q & A

Q: How many people are involved in the OpenStack team? A: Only three to five engineers, because the bank uses a stable set of features and custom patches rather than the full upstream codebase.

Q: What typical problems have you encountered? A: Frequent hot‑migration bugs, patch management challenges, and occasional host‑level failures that require rapid remediation.

cloud computingHigh AvailabilityInfrastructure AutomationOpenStackfinancial services
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.