Operations 29 min read

Understanding Geo-Distributed Active-Active Architecture: Principles, Risks, and Implementation Strategies

This article explains the concept of geo-distributed active‑active (multi‑active) systems, covering architectural principles, availability metrics, redundancy techniques such as master‑slave replication, cold and hot disaster recovery, same‑city and cross‑city active‑active setups, data synchronization challenges, and practical routing and sharding methods to achieve high availability and scalability.

Full-Stack Internet Architecture

Oct 20, 2021

Understanding Geo-Distributed Active-Active Architecture: Principles, Risks, and Implementation Strategies

01 System Availability

To understand geo‑distributed active‑active, we start with three architectural principles: high performance, high availability, and easy scalability. Availability is measured by MTBF and MTTR, with the formula Availability = MTBF / (MTBF + MTTR) * 100%.

Failures can be hardware, software, or force‑majeure, and rapid recovery is essential.

02 Single‑Machine Architecture

A simple single‑instance deployment is vulnerable to data loss; backup can mitigate loss but introduces recovery time and data staleness.

03 Master‑Slave Replication

Adding a replica provides real‑time synchronization, higher data integrity, fault tolerance, and read‑performance improvement.

04 Uncontrollable Risks

Even with redundancy, risks remain at the rack, switch, and data‑center levels; failures in a single data‑center can still cause outages.

05 Same‑City Disaster Recovery

Deploy a second data‑center in the same city, connect via a dedicated line, and use either cold backup (periodic copy) or hot backup (real‑time replica) to ensure data safety.

06 Same‑City Active‑Active

Both data‑centers serve traffic simultaneously, requiring read‑write separation and careful routing to avoid write conflicts.

07 Two‑City Three‑Center

Introduce a third, geographically distant data‑center for disaster backup, typically using cold backup to protect against city‑level catastrophes.

08 Pseudo Cross‑City Active‑Active

Simply mirroring active‑active across cities leads to high latency and performance degradation due to cross‑region data access.

09 True Cross‑City Active‑Active

Each data‑center must host its own primary databases and synchronize data bidirectionally using middleware (e.g., Canal, RedisShake, MongoShake) to avoid latency and ensure consistency.

10 Implementing Active‑Active

Route users at the edge based on business type, hash partitioning, or geographic location so that a user’s requests stay within a single data‑center, eliminating cross‑region conflicts.

11 Geo‑Distributed Multi‑Active

Scale the active‑active model to multiple regions using a star topology with a central hub for data synchronization, achieving high availability, scalability, and rapid failover.

Summary

The article emphasizes high performance, high availability, and easy scalability as core architectural goals, explains redundancy techniques from backup to multi‑region active‑active, and provides practical guidance for building resilient, globally distributed systems.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

System Architecture disaster recovery Active-Active multi-region

Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.