Operations 32 min read

Analysis of Arbitration and Two‑Site‑Three‑Center (3DC) Solutions in Dual‑Active Data Center Disaster Recovery

This article examines key arbitration mechanisms and the two‑site‑three‑center (3DC) extension model for dual‑active data‑center disaster‑recovery, comparing implementations from Huawei, EMC, IBM, HDS and NetApp, and discusses design considerations, risks of brain‑split, and best‑practice deployment options.

Architects' Tech Alliance
Architects' Tech Alliance
Architects' Tech Alliance
Analysis of Arbitration and Two‑Site‑Three‑Center (3DC) Solutions in Dual‑Active Data Center Disaster Recovery

In storage dual‑active solutions, arbitration becomes a critical point because the overall architecture is symmetric: two sites each hold a copy of data and are connected via a link (FC or IP). If the link fails, both sites may think the other has gone down, attempt to claim shared resources, and form separate clusters, leading to a brain‑split and data inconsistency.

The article analyzes several key points, including arbitration and the two‑site‑three‑center (3DC) scheme. Arbitration is essential because the storage cross‑site dual‑active technology is a symmetric architecture; the most difficult part is the link, as shown by vendors' reported RTT and distance. Link interruption causes both storage nodes to think the peer is down, try to acquire shared resources, and may form independent clusters (brain‑split) unless proper mechanisms are in place.

Download links:

《Data Center Dual‑Active Disaster Recovery Solution Introduction (1)》

1. Network‑virtualized disaster‑recovery and dual‑active design

2. F5 dual‑active data‑center solution

3. Alibaba Cloud ApsaraDB multi‑active disaster‑recovery product

4. Data‑center disaster‑recovery: dual‑active solution

5. Technical challenges, solutions and applications of unstructured data "dual‑active/multi‑active" storage

《Data Center Dual‑Active Disaster Recovery Solution Introduction (2)》

1. Hengfeng Bank two‑site‑three‑center multi‑active design share

2. Lenovo VPLEX dual‑active data‑center solution

3. Lenovo disaster‑recovery dual‑active solution

4. Ele.me multi‑active architecture share

5. Building hybrid‑cloud dual‑active architecture to improve business continuity and elasticity

In storage dual‑active schemes, the common way to prevent brain‑split is to use an arbitration mechanism placed at a third site, either as an arbitration server or an arbitration storage array. The three typical methods are:

Priority‑site method – the simplest, selecting one site as the priority; if a split occurs, the priority site wins arbitration, but if the priority site also fails, service is interrupted.

Software arbitration – a dedicated arbitration software runs on a physical or virtual machine (or in public cloud) at a third site.

Array arbitration disk – an additional storage array at the third site provides a dedicated arbitration LUN, offering higher stability and reliability.

The two‑site‑three‑center (3DC) extension expands the dual‑active design to three geographic locations (production, same‑city disaster‑recovery, and remote disaster‑recovery). Recent large‑scale natural disasters have increased interest in 3DC solutions. Enterprises must consider regulatory RTO/RPO requirements, the impact of remote disaster‑recovery on production performance, the integrity of the overall 3DC architecture, and resource accessibility at the remote site.

1. Huawei HyperMetro

1. Arbitration

(1) Arbitration requirements: a physical server, VM, or public‑cloud VM can act as the arbitration device; both dual‑active storage arrays must reach the device over IP with bandwidth >2 MB/s. The device should be placed at a third site to avoid simultaneous failure with the data centers.

(2) Unified management: a single arbitration system can manage both SAN and NAS dual‑active, providing services from the same site in any failure scenario.

(3) Dual arbitration modes: static‑priority and arbitration‑server modes. Static‑priority forces the preferred node to continue serving when the link fails (not recommended because if the preferred node fails, service is lost). HyperMetro supports arbitration at the pair or consistency‑group level, allowing fine‑grained, workload‑based arbitration.

2. Two‑Site‑Three‑Center Extension

Huawei OceanStor HyperMetro supports remote disaster‑recovery via HyperReplication combined with BCManager software, forming a 3DC solution. When the production site fails, the same‑city DR site takes over while maintaining replication with the remote site. If both production and same‑city sites fail, the remote site can switch to master/slave mode and resume services. This architecture offers better resource utilization and faster failover compared with traditional sync‑plus‑async 3DC designs.

2. EMC VPLEX

1. Arbitration

(1) Arbitration requirement: VPLEX Metro and VPLEX Geo provide a dedicated arbitration node called Witness, which runs as a VM (VMware only) in a third‑site fault domain.

(2) Brain‑split rules: VPLEX uses separation rules (static preferred cluster or No Automatic Winner) and the Witness node to decide which cluster continues I/O when link or cluster failures occur.

(3) The Witness automatically distinguishes site failures from link failures and can automatically continue I/O on the surviving site, making it essential for Oracle RAC deployments.

2. Two‑Site‑Three‑Center Extension

VPLEX offers two implementations:

Metro + EMC RecoverPoint CDP: VPLEX writes I/O to both sites and forwards it to RecoverPoint for continuous data protection (CDP). Adding a third site with asynchronous RecoverPoint replication creates a full 3DC topology.

Metro + EMC SRDF/A: The primary site’s VPLEX storage is EMC with SRDF license; asynchronous SRDF replicates data to the remote site, achieving multi‑site DR.

3. IBM SVC

1. Arbitration

(1) Arbitration requirement: SVC ESC and SVC HyperSwap need a quorum node or quorum disk at a third site. IP‑based quorum nodes can be any server or cloud VM running a simple Java program; physical quorum disks provide higher reliability for T3 recovery.

(2) Arbitration mechanism: Configuration Node (auto‑generated) has the highest arbitration priority, followed by nodes with lower latency to the quorum site.

2. Two‑Site‑Three‑Center Extension

Only the SVC ESC solution supports full 3DC expansion because its Vdisk Mirror technology can be cascaded, whereas SVC HyperSwap’s Metro Mirror cannot be extended beyond two clusters.

4. HDS GAD

1. Arbitration

(1) Arbitration requirement: GAD uses a dedicated arbitration disk placed at a third site; IP‑based arbitration is not supported.

(2) Arbitration mechanism: When the primary storage loses communication, it writes the failure state to the arbitration disk; the secondary storage detects this and stops I/O, ensuring only one side serves data.

2. Two‑Site‑Three‑Center Extension

HDS offers three mature 3DC topologies: cascaded three‑center, multi‑target, and asynchronous‑replication storage‑cluster. The cascaded model links production ↔ same‑city (sync) ↔ remote (async). The multi‑target model replicates synchronously to both sites and asynchronously to the remote site, providing zero data loss. The storage‑cluster model uses active‑active clusters with Universal Replicator for async replication, delivering the highest availability.

5. NetApp MetroCluster

1. Arbitration

(1) Arbitration requirement: MetroCluster uses TieBreak software deployed on a Linux host at a third site. It monitors SSH sessions to HA pairs and clusters, detecting failures within 3‑5 seconds.

(2) Arbitration mechanism: Two modes exist – static mode (no automatic failover) and TieBreak‑enabled mode (software monitors and can trigger alerts or automatic failover).

2. Two‑Site‑Three‑Center Extension

MetroCluster uses SnapMirror for data replication. Starting with ONTAP 9.5, MetroCluster supports SVM DR, allowing an active SVM inside the MetroCluster to act as a SnapMirror source to a third‑site SVM, providing multi‑site protection. Certain restrictions apply: only the active SVM can be the source, and the target SVM must be outside the MetroCluster configuration.

Download links:

《Data Center Dual‑Active Disaster Recovery Solution Introduction (1)》

《Data Center Dual‑Active Disaster Recovery Solution Introduction (2)》

Modern data‑center dual‑active and disaster‑recovery design

Dual‑active data‑center technology

Disclaimer: Please credit the author and source when reposting. Contact us for copyright issues.

For more architecture‑related knowledge, refer to the "Architect’s Full‑Store Technical Materials Pack" (37 e‑books). Scan the QR code or click the mini‑program link to access the "Architect Technical Alliance Bookstore" for detailed e‑book information.

dual activedisaster recoverystorageData Center3DCarbitration
Architects' Tech Alliance
Written by

Architects' Tech Alliance

Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.