Databases 42 min read

Design of Ant Financial's Logical Data Center (LDC) and Unitization for High‑TPS Payments

The article explains how Ant Financial’s Logical Data Center (LDC) and unit‑based architecture, combined with sharding, CAP analysis, and OceanBase’s Paxos‑based consensus, enable the payment platform to sustain tens of millions of transactions per second during Double‑11 events while ensuring high availability and disaster recovery.

Practical DevOps Architecture

Nov 10, 2020

Design of Ant Financial's Logical Data Center (LDC) and Unitization for High‑TPS Payments

Since the first Double‑11 in 2008, Ant Financial has continuously pushed the limits of its technology to handle the massive traffic spikes of the shopping festival. The payment peak grew from 20,000 transactions per minute in 2010 to 540,000 transactions per second in 2019, a 1,360‑fold increase over 2009.

The key to achieving such throughput is not only application‑level optimizations like traffic shaping, but fundamentally the logical data center (LDC) architecture based on sharding and unitization.

This article does not dive into code‑level details; instead it provides a simple description of the most important principles.

Typical questions that distributed‑system designers face include:

What is the core design behind Alipay’s massive payment system?

What is LDC and how does it achieve multi‑active and disaster‑recovery across regions?

What does the CAP theorem really mean? How should we interpret "P"?

What is a split‑brain situation and how does it relate to CAP?

What is Paxos and what problem does it solve?

How do Paxos and CAP interact? Can Paxos escape the CAP constraints?

Can OceanBase escape the CAP constraints?

If you are interested, the following sections present a straightforward discussion without obscure terminology.

LDC and Unitization

LDC (Logical Data Center) is a logical counterpart to the traditional Internet Data Center (IDC). Regardless of the physical distribution, the whole data center behaves as a coordinated and unified logical entity.

This concept implicitly addresses two core challenges of distributed systems: overall coordination (availability, partition tolerance) and logical unity (consistency).

Unitization is an inevitable trend for large‑scale internet systems. A simple example illustrates the idea.

Most internet companies (e.g., Taobao, Ctrip, Sina) have a maximum transaction‑per‑second (TPS) in the hundred‑thousands range. Scaling beyond that is difficult because the database storage layer becomes a bottleneck; adding more servers cannot bypass it.

In contrast, the combined TPS of all e‑commerce platforms worldwide can easily reach billions because each company operates as an independent unit serving its own users.

Therefore, to multiply a single company’s capacity, the natural path is to split the system into many independent units – a classic divide‑and‑conquer approach. Each unit handles a subset of users, and the whole system’s TPS is the sum of the TPS of all units.

In Ant’s internal terminology, LDC and unitization are inseparable; the unitized system design is what makes LDC possible.

Key takeaway: Sharding (splitting databases and tables) removes the single‑point bottleneck caused by I/O limits. Unitization is a deployment pattern built on top of sharding, providing strong isolation and disaster‑recovery benefits.

Evolution of System Architecture

Most internet companies evolve through similar architectural stages.

1. Early stage – a single monolithic application runs on one server. This creates a clear single‑point failure.

image

2. Horizontal scaling – multiple machines run the same application, sharing the load. This introduces the first distributed layer.

image

While CPU usage improves, the database remains a bottleneck, prompting the adoption of master‑slave clusters.

image

Read traffic can be offloaded to slaves, but write traffic still hits the master, creating a new bottleneck as traffic grows.

3. Sharding (database‑level partitioning) appears as the solution.

image

Horizontal sharding splits tables across multiple databases; vertical sharding separates tables by business function, often coinciding with micro‑service boundaries.

Even with sharding, the number of database connections grows dramatically because each application instance may need to connect to many shards, leading to a Cartesian‑product of connections.

To mitigate this, the routing logic is moved from the database layer to the gateway layer, ensuring that a request for user A is always directed to the shard that holds A’s data, eliminating unnecessary cross‑shard connections.

When services need to interact (e.g., A transfers money to B), the communication shifts from database links to RPC calls (e.g., Dubbo over TCP), allowing connection reuse and reducing overhead compared to traditional database connections.

image

Packaging the entire system as a unitized deployment means each unit’s data flow starts and ends within that unit, enabling easy deployment across any data center while preserving logical unity.

Example of a three‑region, five‑data‑center deployment:

image

Ant’s Unitized Architecture Practice

Alipay, the largest domestic payment tool, reaches tens of thousands of TPS during Double‑11, and the demand will only increase, forcing the unitized architecture to span multiple data centers.

To guarantee high availability, Alipay adopts a three‑zone (CRG) model:

RZone (Region Zone): The smallest deployable unit that can be sharded. Each RZone connects to its own database and handles a specific user ID range.

GZone (Global Zone): A single global unit that stores non‑shardable data such as system configuration. It is deployed in multiple regions for disaster recovery, but only one instance serves traffic at any time.

CZone (City Zone): Deployed per city, it hosts data that exhibit a "write‑read time lag" (e.g., user accounts, membership services). RZones read from the local CZone instead of the remote GZone, reducing cross‑region latency.

The "write‑read time lag" observation means many data items are written once and accessed much later, allowing them to be placed in a locally cached CZone.

Data can be classified into two categories based on user scope:

User‑flow data: Orders, comments, behavior logs – naturally isolated per user, suitable for sharding and independent deployment.

User‑shared data: Accounts, personal blogs – accessed by many users; also system‑wide data: products, configuration, financial statistics – not tied to a specific user.

Mapping these categories to zones: RZone handles user‑flow data, GZone stores user‑shared and system‑wide data. CZone is an Ant‑specific innovation that provides local read access for data with a write‑read lag.

Alipay’s Multi‑Active and Disaster‑Recovery Mechanisms

Traffic diversion basics

After unitization, multi‑active simply means deploying the same unit in multiple locations. For example, Shanghai hosts two units serving user IDs [00‑19] and [40‑59]; Hangzhou hosts units for [20‑39] and [60‑79]. The assignment of users to units is dynamically configurable, allowing each unit to act as a backup for the other.

Cold‑backup (offline) and hot‑backup (online) concepts are clarified: cold‑backup refers to a powered‑off standby machine; hot‑backup is a running standby that receives no traffic until failover.

Traffic routing is controlled by a custom reverse‑proxy gateway called Spanner (built on Nginx). Some requests are routed directly to another IDC’s Spanner without entering backend services.

When a request only reads user‑flow data, no further routing is needed. For cross‑user operations (e.g., A transfers to B), additional routing may occur, either to another IDC or to a different RZone within the same IDC.

Example of RZone‑to‑DB mapping (original configuration):

RZ0* --> a
RZ1* --> b
RZ2* --> c
RZ3* --> d

During a disaster scenario, the mapping is changed so that the failed IDC’s RZones are reassigned to healthy IDC’s RZones, and the user‑ID ranges are updated accordingly.

Example of user‑ID to RZone mapping before failover:

[00-24] --> RZ0A(50%),RZ0B(50%)
[25-49] --> RZ1A(50%),RZ1B(50%)
[50-74] --> RZ2A(50%),RZ2B(50%)
[75-99] --> RZ3A(50%),RZ3B(50%)

After re‑mapping to a different IDC, the configuration becomes:

[00-24] --> RZ2A(50%),RZ2B(50%)
[25-49] --> RZ3A(50%),RZ3B(50%)
[50-74] --> RZ2A(50%),RZ2B(50%)
[75-99] --> RZ3A(50%),RZ3B(50%)

These changes are prepared in advance as disaster‑recovery plans and pushed to all traffic‑diversion clients.

CAP Theorem Review

① Definition of CAP

CAP states that a distributed system can simultaneously satisfy at most two of the three properties: Consistency, Availability, and Partition Tolerance.

Consistency: All nodes see the same data at the same time (atomic updates).

Availability: Every request receives a response (read or write) regardless of failures.

Partition Tolerance: The system continues to operate despite network partitions.

In practice, most systems aim for AP (availability + partition tolerance) or CP (consistency + partition tolerance), depending on workload characteristics.

② CAP Analysis Methodology

First determine whether the system handles partitions. If it does, decide whether it favors availability (AP) or consistency (CP) under partition conditions.

Typical analysis steps (pseudocode):

if (no partition possible || partition does not affect availability or consistency) {
    if (availability under partition) return "AP";
    else if (consistency under partition) return "CP";
} else {
    if (has both availability and consistency) return "AC"; // rare
}

Examples:

Single‑node application + single DB → CP (partition unlikely, but if it occurs the system becomes unavailable).

Horizontal scaling + master‑slave DB → AC (both availability and consistency are met when no partition; partition handling is absent).

Horizontal scaling + master‑slave with HA → AC (availability via failover, consistency via single master, but partition handling still missing).

CAP Analysis of Ant’s LDC Architecture

Ant’s LDC relies on OceanBase (OB), a Paxos‑based distributed database, to provide partition tolerance.

OB requires a quorum of (N/2)+1 nodes for each transaction, allowing the remaining nodes to be partitioned while still serving reads/writes – thus achieving AP with eventual consistency.

Because OB uses Paxos, it guarantees that only one value can be committed during a partition, preventing split‑brain inconsistencies. After the partition heals, the system converges to a consistent state.

Consequently, Ant’s LDC is classified as AP (high availability and partition tolerance) with eventual consistency.

Conclusion

The main take‑aways are:

Sharding users into RZones provides linear scalability for payment TPS.

OceanBase’s Paxos‑based consensus prevents split‑brain issues during network partitions or disaster‑recovery switches.

CZone enables fast local reads for data that exhibit a write‑read time lag, reducing cross‑region latency.

GZone stores truly global data; its access pattern is limited and does not dominate TPS.

These design principles, together with traffic‑shaping, pre‑warming, and extensive operational coordination, allow Ant Financial to handle 540,000+ TPS during Double‑11 and have room for further growth.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems CAP theorem Sharding unitization OceanBase High TPS

Written by

Practical DevOps Architecture

Hands‑on DevOps operations using Docker, K8s, Jenkins, and Ansible—empowering ops professionals to grow together through sharing, discussion, knowledge consolidation, and continuous improvement.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.