High‑Availability Deployment: From Cold Backup to Multi‑Active Architecture
This article explains the evolution of high‑availability deployment architectures—from simple cold backups and hot standby to same‑city active‑active, cross‑city active‑active, and finally multi‑active solutions—detailing their advantages, drawbacks, and practical design considerations for large‑scale internet services.
High‑availability (HA) is a mandatory stage for large‑scale internet companies as their business expands, and many such companies (Alibaba, Tencent, Baidu, NetEase, Sina, etc.) have already rebuilt their systems to support multi‑active deployments.
Stateful vs. Stateless Services
Stateless services achieve HA easily through load balancers (e.g., F5). Stateful services rely on persistent storage such as MySQL, Redis, or JVM memory, which requires more complex HA solutions.
HA Solutions Overview
Cold Backup
Hot Standby (Dual‑machine hot backup)
Same‑city Active‑Active
Cross‑city Active‑Active
Cross‑city Multi‑Active
Cold Backup
Cold backup copies data files while stopping the database service. It is simple, fast to back up and restore, and can restore to a specific point in time, but it requires service downtime, risks data loss between backup and restore, and consumes large storage because it is a full‑copy operation.
Hot Standby (Dual‑machine Hot Backup)
Hot standby synchronizes data from a primary node to a secondary node without stopping service; however, failover still requires downtime. Two main implementation styles exist:
Active/Standby (software‑level replication such as MySQL master‑slave, SQL Server transactional replication)
Active/Active (mutual standby, useful for read‑write separation and resource utilization)
Same‑city Active‑Active
Same‑city active‑active extends HA within a single data‑center, allowing traffic to be shifted between two local IDC sites when one fails. It can also support true dual‑active (both nodes serving reads and writes) if conflict resolution is handled.
Cross‑city Active‑Active (Dual‑Active)
Deploying front‑end entry points and applications in two distant cities enables traffic failover when one city is down, though user experience may degrade due to latency.
Cross‑city Multi‑Active
Multi‑active expands the dual‑active concept to multiple regions, forming a mesh network where each node has four inbound/outbound connections, ensuring that any single node failure does not affect service. However, longer write latency and higher conflict probability increase complexity.
To mitigate mesh complexity, the topology can be transformed into a star‑shaped network, centralizing synchronization through a core node while other nodes connect only to the core. This reduces the impact of any single node failure and simplifies load‑balancing.
Real‑world Implementations
Examples include Eleme’s Global Zone (single‑master write, multi‑slave read), Alibaba’s ideal multi‑active architecture, and Taobao’s unit‑based sharding with bidirectional sync for transaction units and unidirectional sync for non‑transaction units.
Challenges
Multi‑active architectures demand strong foundations: data transmission, validation, and a data‑operation layer that abstracts write and sync logic. They also increase operational complexity, testing difficulty, and require sophisticated disaster‑recovery pipelines.
Thought Questions
How would you handle a buyer located at the intersection of four cities when sharding by province/city?
Which of your current business modules can be made multi‑active and which cannot?
Should all services be multi‑active or only core services?
References
《Eleme Multi‑Active Technical Implementation (Part 1) – Overview》 https://zhuanlan.zhihu.com/p/32009822
《Eleme Framework Tools Blog》 https://zhuanlan.zhihu.com/eleme-arch
《Alibaba Multi‑Active and Same‑city Active‑Active Architecture Evolution》 https://www.sohu.com/a/158859741_444159
《Alibaba Cloud Database Multi‑Active Solution》 https://help.aliyun.com/document_detail/72721.html
《Multi‑Active Is Not That Hard》 https://wely.iteye.com/blog/2313293
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.