Comprehensive Overview of Data Center Active‑Active (Dual‑Active) Solutions
This article provides an in‑depth technical overview of data‑center active‑active architectures, covering network interconnects, storage SAN/Fibre Channel links, application clustering, arbitration mechanisms, gateway‑based designs, technical requirements, and practical limitations for achieving end‑to‑end high availability.
First, the author thanks readers for supporting the paid article on data deduplication and for participating in the recent poll.
Because many readers have been discussing data‑center dual‑active solutions, this article aggregates common questions and the author’s insights into a comprehensive guide, noting that future posting frequency may be reduced due to work commitments.
Dual‑active (active‑active) in a data‑center context refers to end‑to‑end active‑active status for applications, networks, storage, and data. While some components may be deployed in HA mode and others in single‑point mode, the overall solution aims for full dual‑active capability. A typical dual‑active network diagram based on an array is shown below.
Data Center Interconnect Network
Data centers A and B are linked via an inter‑data‑center network, while each center uses a traditional two‑ or three‑tier architecture. The access layer connects to business servers, and the core/aggregation layer connects to the remote data center using large‑scale two‑layer technologies (CSS+iStack). This interconnect supports live VM migration with unchanged MAC addresses and overcomes VLAN count limits; most vendors support Trill for this purpose.
Data Center Internal Interconnect
Storage, switches, and servers are connected through a dedicated SAN network with redundant paths. Switches between the two data centers use Fibre Channel (FC) interconnects—not necessarily optical fiber—to provide real‑time data synchronization and heartbeat communication.
Active‑Active Application Deployment
Application clusters synchronize data via the large‑scale two‑layer network. Common enterprise clusters include VMware, Hyper‑V, Oracle RAC, SQL MSCS/MSFC, IBM DB2/PureScale, with Oracle RAC and PureScale being true active‑active clusters.
Active‑Active Arbitration Deployment
Third‑party arbitration is often provided by a storage‑cluster arbitration server, offering a low‑cost solution that can host virtual machines for HA. Examples include EMC VMAX3 arbitration; the author notes that after EMC’s acquisition by Dell, this strategy may evolve.
Although third‑party arbitration does not always require a separate third site, many customers prefer it for reliability. Priority‑site strategies can be used when the arbitration node fails, but they carry risk if the priority site also fails, making third‑party arbitration the safer choice.
Server application clusters also need arbitration, typically requiring only IP‑level reachability rather than the large‑scale two‑layer network.
External Access to Active‑Active Applications
End‑users access resources over the Internet, passing through local caches, Global NDS, and DNS resolution. Load balancing is achieved with GSLB and SLB, which synchronize IP resources between data centers and can return the lowest‑RTT IP address to the client.
Gateway‑Based Active‑Active
Due to space constraints, detailed gateway designs are omitted, but the author references Huawei’s VIS storage‑gateway dual‑active solution (diagram shown below).
Gateway dual‑active adds hardware, increasing cost and potential failure points, and may become a performance bottleneck. Adding more gateway nodes (supported by VPLEX, SVC, VIS) mitigates this issue. Gateways also handle storage failover and data synchronization, reducing storage performance pressure by using volume mirroring.
Basic Technical Conditions for Active‑Active
Two essential conditions are required: (1) real‑time data replicas so that a failure on one side can be served by an identical copy, and (2) automatic failover and recovery of server, storage, and network clusters.
Application‑Layer Active‑Active
Examples include Oracle RAC, IBM GPFS, Symantec SVC, PowerHA HyperSwap, and Huawei VIS. The author links to a separate article on PowerHA HyperSwap. Below is an illustration of the IBM GPFS dual‑active solution.
IBM GPFS uses IO Failure Group technology for data replica protection and an active‑active cluster for failover. Application‑layer dual‑active is less common because creating volume mirrors and synchronizing data at the server level can heavily impact applications.
NAS‑Based Active‑Active
Most dual‑active solutions are SAN‑based due to the high performance and reliability requirements of workloads such as databases, ERP, and SAP. However, NAS dual‑active is possible (e.g., NetApp FAS, IBM GPFS) because some databases (Oracle RAC, IBM PureScale) can run directly on NAS.
Limitations and Requirements of Active‑Active
Dual‑active is the highest‑level disaster‑recovery solution and imposes strict requirements:
• Distance: Typically 100‑300 km between sites to maintain strong consistency; distances >30 km require DWDM optical repeaters, with a maximum of about 3000 km.
• Network: Low latency, sufficient bandwidth, and low bit‑error rate are essential to support real‑time replication.
• Performance: Both data centers must have comparable hardware capabilities; gateways must not become bottlenecks.
True active‑active (both sites can read/write simultaneously) versus pseudo active‑passive depends on both storage and application support. If storage is active‑active but the application is not (e.g., VMware), the overall solution behaves as active‑passive.
Multi‑path: Storage‑based dual‑active often requires multi‑path routing. VMware provides a PSA interface for vendor‑specific multi‑path modules; other platforms like XenServer lack such interfaces and rely on native multi‑path support (e.g., ALUA).
In conclusion, dual‑active encompasses end‑to‑end active‑active for applications, networks, storage, and data. Application clusters can be built on top of virtualized environments (e.g., Oracle RAC on VMware VMs). The author invites further discussion and support for deeper technical exploration.
Finally, the author promotes the China Cloud Computing Conference, offering a discount code WEMEDIA1JJ for a 100‑yuan reduction, with details and ticket link provided.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.