Cloud Native 20 min read

Design and Implementation of Same‑City Dual‑Active Architecture for a Transaction Platform

The paper details a same‑city dual‑active architecture for a high‑traffic transaction platform, combining blue‑green and dual‑cluster deployment with zone‑aware routing, middleware transformations, and a gradual traffic‑coloring release process that achieved near‑50/50 traffic split, stable performance, minimal cost, and outlines remaining challenges.

DeWu Technology
DeWu Technology
DeWu Technology
Design and Implementation of Same‑City Dual‑Active Architecture for a Transaction Platform

This article documents the design, implementation, and operational experience of a same‑city dual‑active (双活) solution for a high‑traffic transaction platform.

Background : Frequent large‑scale outages in the industry prompted the team to build a rapid‑recovery capability by deploying two active clusters in different availability zones of the same city.

Design Overview : The solution combines blue‑green deployment with dual‑cluster deployment. Application services are split into logical blue and green clusters, while data stores (DB/Redis/HBase) remain single‑copy but are replicated across zones. Traffic is dynamically switched at the DNS/SLB/DAG layer based on user ID and flow‑percentage policies.

Four‑Layer Architecture :

Access layer – DNS, primary‑backup SLB, DLB, DAG for zone‑aware routing.

Application layer – blue/green logical clusters with traffic‑coloring.

Middleware layer – zone‑aware deployment and data‑sync strategies for each component.

Data layer – single data copy with cross‑zone master‑slave replication.

Key Middleware Transformations :

DLB – a stateless custom traffic gateway deployed symmetrically in both zones; failed nodes are removed from SLB endpoints within seconds.

Rainbow Bridge – a self‑developed distributed relational‑database proxy; manual traffic switch with minute‑level RTO.

DMQ – broker‑level sharding across zones; blue queues occupy the first half of broker partitions, green queues the second half; producers and consumers are bound to the corresponding colored queues.

Kafka – three‑zone ZAB deployment to guarantee quorum; leader partitions fail over when a zone goes down.

Elasticsearch – multi‑zone deployment with separate data and master nodes; masters span at least three zones to keep quorum.

Service Registry – a custom Raft‑based registration center with three‑zone node distribution.

Traffic Allocation Strategy :

RPC traffic is colored at the DAG level; each request carries a blue/green identifier, and downstream services route to the same‑zone instances.

MQ traffic proportion follows the RPC traffic proportion because producers are colored; adjustments propagate to consumer load with a 5‑10 s lag.

Release Process :

Pre‑deployment preparation – environment variables tag machines with zone identifiers.

Development & verification – upgrade service jars, integrate blue‑green release components, build test environments for dual‑active flow.

Online preparation & launch – gradual traffic shift, monitor RT and error rates, expand green cluster to 100 % after validation.

Results :

Traffic split achieved roughly 50:50 between zones; core metrics (QPS, latency, error rate) remained stable.

RT increase of 7‑8 ms observed for cross‑zone data calls due to single‑copy data layer.

Cost impact was minimal; after the parallel phase, resources in the original zone were scaled down.

New Challenges & Future Work :

Blue‑green traffic imbalance when downstream services are not in the release channel.

Further RT optimization for cross‑zone data access.

Ensuring container‑orchestration platform can scale across zones during a zone outage.

Resource availability for rapid scaling during peak periods.

Coordinated dual‑active flow between multiple large domains (e.g., transaction and search).

The article concludes with reflections on continuous improvement and invites readers to follow the technical series for deeper insights.

cloud nativeDeploymenthigh availabilityMiddlewaredual activetraffic routing
DeWu Technology
Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.