Understanding Geo-Distributed Active-Active Architecture: Principles, Risks, and Implementation Strategies
This article explains the concept of geo-distributed active‑active (multi‑active) systems, covering architectural principles, availability metrics, redundancy techniques such as master‑slave replication, cold and hot disaster recovery, same‑city and cross‑city active‑active setups, data synchronization challenges, and practical routing and sharding methods to achieve high availability and scalability.
01 System Availability
To understand geo‑distributed active‑active, we start with three architectural principles: high performance, high availability, and easy scalability. Availability is measured by MTBF and MTTR, with the formula Availability = MTBF / (MTBF + MTTR) * 100%.
Failures can be hardware, software, or force‑majeure, and rapid recovery is essential.
02 Single‑Machine Architecture
A simple single‑instance deployment is vulnerable to data loss; backup can mitigate loss but introduces recovery time and data staleness.
03 Master‑Slave Replication
Adding a replica provides real‑time synchronization, higher data integrity, fault tolerance, and read‑performance improvement.
04 Uncontrollable Risks
Even with redundancy, risks remain at the rack, switch, and data‑center levels; failures in a single data‑center can still cause outages.
05 Same‑City Disaster Recovery
Deploy a second data‑center in the same city, connect via a dedicated line, and use either cold backup (periodic copy) or hot backup (real‑time replica) to ensure data safety.
06 Same‑City Active‑Active
Both data‑centers serve traffic simultaneously, requiring read‑write separation and careful routing to avoid write conflicts.
07 Two‑City Three‑Center
Introduce a third, geographically distant data‑center for disaster backup, typically using cold backup to protect against city‑level catastrophes.
08 Pseudo Cross‑City Active‑Active
Simply mirroring active‑active across cities leads to high latency and performance degradation due to cross‑region data access.
09 True Cross‑City Active‑Active
Each data‑center must host its own primary databases and synchronize data bidirectionally using middleware (e.g., Canal, RedisShake, MongoShake) to avoid latency and ensure consistency.
10 Implementing Active‑Active
Route users at the edge based on business type, hash partitioning, or geographic location so that a user’s requests stay within a single data‑center, eliminating cross‑region conflicts.
11 Geo‑Distributed Multi‑Active
Scale the active‑active model to multiple regions using a star topology with a central hub for data synchronization, achieving high availability, scalability, and rapid failover.
Summary
The article emphasizes high performance, high availability, and easy scalability as core architectural goals, explains redundancy techniques from backup to multi‑region active‑active, and provides practical guidance for building resilient, globally distributed systems.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.