Cloud Native 10 min read

Multi-Cloud Active‑Active Architecture: Design, Benefits, and Challenges

The article examines why multi‑cloud active‑active (multi‑active) deployments are essential for high availability, outlines common disaster‑recovery patterns such as primary‑backup and active‑active, details the technical workflow of traffic routing, business and storage layers, and discusses the practical advantages and drawbacks of this approach.

High Availability Architecture

Nov 5, 2021

Multi-Cloud Active‑Active Architecture: Design, Benefits, and Challenges

When an internet company reaches a certain scale, system high availability becomes critical, and many adopt a "multi‑active" strategy to mitigate unexpected failures. The author, an Apache Dubbogo committer, shares experiences from implementing a dual‑cloud solution.

Why multi‑active matters – Real‑world incidents like Bilibili’s 2021 server outage and Futu Securities’ IDC network failure illustrate how basic service failures can severely impact availability, making multi‑active a powerful remedy.

Disaster‑recovery patterns

Primary‑Backup

In small companies a primary‑backup setup is common, but the standby cluster is rarely exercised, risking unverified code, configuration, and data during a failover.

Active‑Active (Multi‑Active)

All clusters serve traffic under normal conditions; traffic is split across them, and if one cluster fails, traffic is shifted to the remaining healthy clusters. Variants include same‑city dual‑active, cross‑region dual‑active, and multi‑center designs, each requiring more resources as the level rises.

Technical details of multi‑cloud active‑active

Two cloud providers host duplicate services. Under normal operation both clouds serve users; if one cloud experiences an issue, all traffic is switched to the other.

The workflow includes:

Clients access services through an entry layer.

The entry layer distributes traffic to business layers according to routing rules.

The business layer processes logic and writes data to storage.

Traffic distribution / switching

Capacity of clusters in both clouds is evaluated, and traffic is typically split evenly. When a cloud fails, the entry layer redirects all traffic to the healthy cloud, highlighting the importance of a reliable entry component.

Business layer dual‑active

Deploy identical code to both clouds, ensuring isolation so that Cloud 1 cannot access Cloud 2. CI/CD pipelines enable rapid rollbacks, but core services should be isolated and validated in a non‑core cluster before promotion.

Storage layer

The design typically uses classic primary‑replica setups for MySQL and Redis, with one cloud hosting the primary and some replicas, and the other cloud hosting the remaining replicas, synchronized via master‑slave mechanisms over a dedicated line.

Pros

Simple architecture leveraging built‑in data‑sync mechanisms of Redis/MySQL, allowing each cloud to serve reads/writes locally.

Cons

The approach heavily depends on the stability of the primary cloud and the dedicated inter‑cloud link; line saturation or failure can cripple the system, and write operations may fail during a cloud outage, requiring manual compensation.

To truly achieve active‑active, multi‑master replication for both Redis and MySQL is needed, but implementing this reliably is extremely challenging.

Conclusion

Many companies end up with “pseudo‑active‑active” systems where the storage layer remains a single point of failure. For non‑BAT‑level firms, it is advisable to first ensure multi‑center backup of core data (transactions, users) to enable rapid recovery when a cloud encounters issues.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

architecture cloud-native multi-cloud disaster recovery Active-Active

Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.