Databases 15 min read

Design and Evolution of Fusion: Didi’s Distributed NoSQL Database

Fusion is Didi’s self‑developed distributed NoSQL database, fully compatible with the Redis protocol, that combines Redis‑level low latency, MySQL‑level durability, ACID transactions, multi‑replica high availability, FastLoad bulk import, NewSQL secondary indexes, and cross‑datacenter active‑active replication to serve petabyte‑scale ride‑hailing data.

Didi Tech

Mar 15, 2019

Design and Evolution of Fusion: Didi’s Distributed NoSQL Database

Fusion is a self‑developed distributed NoSQL database at Didi, fully compatible with the Redis protocol. It combines Redis‑level low latency with MySQL‑level durability, multi‑replica high availability, and ACID transactions, serving as the primary storage for core ride‑hailing services handling petabyte‑scale data.

Background : Didi’s rapid growth from 2012 to 2016 (including the Uber merger) caused massive storage pressure—hundreds of billions of orders and driver trajectory data—making MySQL and pure Redis insufficient. To meet diverse storage requirements (throughput, latency, capacity, data size), Didi created Fusion.

Massive storage

FastLoad (offline‑to‑online data import)

NewSQL (enhanced MySQL‑like features)

Cross‑datacenter active‑active replication

Core Architecture : Fusion stores data on SSDs using RocksDB for I/O, adds a cache layer for performance, and exposes a Redis‑compatible RPC interface via a proxy. A single‑node Fusion service is composed of these layers, and a cluster is built by adding a routing manager and load balancer. The data‑flow diagram is shown below.

The control plane runs on the SaltStack platform, providing user, operation, monitoring, and billing services.

Cluster Design : Data is sharded by hash. Clients using Redis libraries (e.g., jedis, redigo, hiredis) connect through a VIP load balancer, which forwards requests to a proxy that maps keys to slots and routes them to the appropriate storage node.

FastLoad : To bridge offline Hadoop data to the online Fusion service, FastLoad creates a DTS task that reads JSON or Hive tables from HDFS, generates sorted SST files, and loads them directly into Fusion without going through the Redis SDK. This reduces network round‑trips and eliminates impact on online traffic.

NewSQL : Fusion addresses MySQL’s scalability limits by providing schema‑to‑key/value conversion, secondary indexes, and range queries while keeping the Redis protocol. Example schemas (student table) are stored as Redis hashes, with auxiliary indexes (e.g., age, sex) built on separate key spaces.

Cross‑Datacenter Active‑Active : Fusion replicates data asynchronously between two data centers. Writes in one region are cached locally and forwarded in batches with acknowledgments to the remote cluster, ensuring high throughput for the primary region while providing eventual consistency and automatic conflict resolution based on timestamps.

Summary & Outlook : Over four development phases, Fusion has evolved from a simple massive‑storage engine to a full‑featured distributed database with fast data loading, NewSQL capabilities, and multi‑region active‑active replication. Future work aims to further advance Fusion as a general‑purpose distributed database to solve more business challenges.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed database NewSQL NoSQL scalable storage Active‑Active Replication FastLoad Redis Compatibility

Written by

Didi Tech

Official Didi technology account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.