Databases 10 min read

Design and Application of Alibaba's Data Replication Center (DRC) for Active‑Active Scenarios

The article presents an overview of Alibaba's Data Replication Center (DRC), detailing its architecture, real‑time cross‑region synchronization capabilities, consistency and latency guarantees, deployment strategies, and its use cases on Alibaba Cloud such as RDS migration and multi‑active e‑commerce workloads.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Design and Application of Alibaba's Data Replication Center (DRC) for Active‑Active Scenarios

This article briefly introduces the design and application of DRC (Data Replication Center) in active‑active scenarios presented at QCON, and also discusses DRC usage on Alibaba Cloud.

DRC is a real‑time data flow service for data synchronization and distribution, independently developed by Alibaba's database team. It has been in production for several years and successfully supported 57.1 billion transactions across regions during the 2014 Double‑11 shopping festival.

DRC supports reading incremental data in real time from various sources such as MySQL, RDS, OceanBase, HBase, Oracle, and can write to databases, MetaQ, ODPS and other storage media. It also provides migration‑to‑cloud services, currently supporting cross‑region RDS synchronization, real‑time binlog subscription, and seven‑network isolation data subscription.

First, a brief introduction of what DRC is.

DRC architecture diagram:

Alibaba's next‑generation architecture for active‑active deployment achieves hot‑plug data center capability, enabling real‑time traffic switching and data recovery.

Illustration of Alibaba's active‑active implementation.

The data challenges of active‑active deployment are significant. During Double‑11, transaction volume spikes, so the transaction chain is modularized into three dimensions: buyer, seller, and product. Buyers are isolated per module due to low cross‑interaction and high latency sensitivity, while sellers and products are written centrally.

Two core data requirements are consistency (no data loss, no errors, transactional guarantees) and real‑time performance (sub‑second latency).

Two synchronization architectures are described: (1) read‑write separation where the central writes and modules read, requiring sub‑second delay and strong consistency; (2) module‑closed architecture where both central and modules can write, using redundancy to allow either side to take over.

The key points include avoiding circular replication via DB‑level transaction tagging, and flow control to handle peak traffic by synchronizing 100 % of data in both directions.

DRC's core capabilities in active‑active scenarios are low latency, strong consistency, and high availability.

Consistency is ensured through ordered pipelines, binary log integrity checks, multi‑storage verification, adaptive character set handling, and a transaction‑conflict checking mechanism (DRCcongo) that uses directed graphs for concurrency control.

Module latency must stay within one second; otherwise user experience degrades. Strategies to achieve this include optimal deployment with DBA‑tested environments, network‑level transmission optimizations (multi‑connection multiplexing, compression, batch sending), and ongoing protocol improvements.

DRC store employs a three‑tier storage mechanism (cache, local, distributed). The local DRCqueue supports up to 50 k QPS. Real‑time data resides mainly in cache for high read efficiency, while historical or delayed data uses pre‑read IO optimization and index tuning. Backup stores enhance disaster recovery and relieve downstream pressure.

Concurrency is handled by allowing parallel writes to databases while serializing hotspot writes; data stability is maintained through hotspot caching and concurrent parsing.

Additional reliability features include immediate detection of primary‑standby switches, automatic task restart on failure, machine failover, dual‑part meta and data disaster recovery, isolation of core versus non‑core workloads, and downstream isolation for slow consumers.

Monitoring is critical; product teams provide heartbeats and metrics, while operations ensure latency alerts.

During the 2014 Double‑11 event, DRC processed incremental data from 2 000 instances, handling a peak of 30 GB per second and supporting over 17 000 downstream real‑time subscriptions, with no latency for core database synchronization.

Within the Xiaowei ecosystem, DRC supports cross‑domain data synchronization for OB 1.0.

At the group level, DRC enables various downstream consumers to subscribe to real‑time incremental data.

On the cloud, DRC has provided migration‑to‑cloud services for a year and a half, with over 300 RDS instances using the service weekly for full and incremental data migration, enabling seamless application cut‑over. The database team also offers an upgrade service (DTS) that has been running on the cloud for a month.

DRC continues to add convenient services for more RDS users, such as cross‑domain synchronization (RDS replication) without intrusion, and RDS binlog services supporting both RDS and DRDS, allowing internal customers like OpenSearch, CDP, Jingwei, and Wangjuba to subscribe to new data like a faucet.

data replicationdatabasesAlibaba Cloudactive-activeDRCreal-time sync
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.