XPipe: Multi‑Data‑Center Redis Replication, High Availability and Disaster‑Recovery Switching
This article introduces XPipe, a framework designed by Ctrip to enable Redis multi‑data‑center replication, ensure high availability through keeper and MetaServer components, and provide reliable disaster‑recovery (DR) switching while maintaining low latency and data consistency.
Author : Meng Wenchao, Senior Manager of Framework R&D at Ctrip Technology Center, joined Ctrip in 2016 and leads the Redis multi‑data‑center project XPipe; previously led the communication team at Dianping.
Redis is extensively used within Ctrip, handling about 2 million QPS read/write operations, with roughly 100,000 QPS writes, and many services treat Redis as an in‑memory database.
To improve availability and performance, Ctrip requires multi‑data‑center Redis deployment, prompting the creation of XPipe.
XPipe addresses three core challenges: data replication with consistency, high availability of both XPipe and Redis, and disaster‑recovery (DR) switching when a data center fails.
For data replication, client‑side double‑write was examined but leads to inconsistency when writes succeed on one DC and fail on another; proxy servers act as a single client to avoid this, but they introduce complexity and potential single‑point failures. XPipe therefore adopts a pseudo‑slave "keeper" that pretends to be a Redis slave, allowing the master to push logs to the keeper, which buffers them on disk and can compress or encrypt traffic between data centers.
The keeper design enables reliable cross‑DC log transmission, supports custom protocols for compression and encryption, and mitigates data loss during network outages.
High availability is achieved by deploying each keeper as a master‑backup pair; a MetaServer monitors keeper status, promotes a backup to master on failure, and balances load across MetaServer nodes. Redis Sentinel is also used, but XPipe implements its own psync2.0 protocol on Redis 3.0.7 to avoid full‑sync pauses during master promotion.
DR switching follows a four‑step process similar to a two‑phase commit: (1) verify switch feasibility, (2) forbid writes on the old master, (3) promote the new master, and (4) synchronize other data centers to the new master. Rollback and retry mechanisms are provided for manual DBA intervention.
The overall architecture consists of a Console for meta‑information management, keepers for log buffering and cross‑DC transfer, and MetaServers for keeper coordination.
Testing shows that adding a keeper adds only ~0.1 ms latency (master‑to‑slave 0.2 ms → with keeper 0.3 ms). In production, two data centers with two keeper layers exhibit an average latency of 0.8 ms (99.9th percentile 2 ms), well within acceptable limits.
In summary, XPipe solves Redis multi‑data‑center data synchronization and DR switching, while the enhanced Redis version with psync2.0 greatly improves cluster stability.
All components are open‑source: XPipe ( https://github.com/ctripcorp/x-pipe ) and XRedis (enhanced Redis 3.0.7, https://github.com/ctripcorp/redis ).
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.