Databases 19 min read

Evolution of JD Baitiao’s Data Architecture: From MySQL to ShardingSphere

This article examines JD Baitiao’s journey from monolithic MySQL to NoSQL and DBRep, detailing how the team evaluated and ultimately adopted Apache ShardingSphere for scalable, low‑coupling database sharding, highlighting performance, maintenance, and business-driven architectural decisions.

Wukong Talks Architecture

May 8, 2022

Evolution of JD Baitiao’s Data Architecture: From MySQL to ShardingSphere

Hello, I am Wukong.

Recently, colleagues discussed how data evolves when moving from a monolithic to a micro‑service architecture, and this article aims to provide insights.

Source: SphereEx Link: https://segmentfault.com/a/1190000041107436 Formatting: Wukong

JD Baitiao’s rapid growth has satisfied the increasing consumption demand of users. Using JD Baitiao as a payment method on JD.com has become a habit for millions, making it a distinctive label for JD. The backend technology team behind JD Baitiao is crucial, having supported the service since its launch in early 2014 and now serving billions of users. Their long‑term effort has forged a unique financial‑grade database selection methodology.

When JD Baitiao’s financial services launched, the then‑head of R&D, Zhang Dongfang, anticipated a massive increase in data volume but did not foresee the cascade of changes this would trigger in database selection and future development patterns.

As a flagship financial consumption application under JD Technology, JD Baitiao now serves billions of users and generates massive daily traffic. The rapid growth of business and data has forced the backend R&D team into a constant state of pressure, perfectly illustrating the evolution of its data architecture.

1. Technology lifecycle: From MySQL to NoSQL to DBRep

For engineers, there is no eternally correct technology—only the most suitable choice for the current context.

The early stage of JD Baitiao coincided with the rapid rise of internet finance and consumption. Over the years, it has progressed from a grassroots project to a professional, large‑scale, unified, and standardized system, mirroring the rapid iteration of China’s internet consumer finance industry.

The technical selection path of JD Baitiao also reflects this industry’s evolution.

From an architectural perspective, there is no absolute good or bad; choices must fit the business lifecycle, team size, capabilities, and infrastructure. Only when the architecture evolution aligns with the business lifecycle can optimal results be achieved.

2014‑2015

The Solr + HBase solution addressed core and non‑core system access to key databases. Solr served as the index for searchable fields, while HBase stored the full data.

Solr clusters offloaded part of the read/write workload, relieving pressure on the core database.

Solr’s extended experience was poor and introduced significant business intrusion.

2015‑2016

NoSQL was introduced; business data were sharded by month and stored in a MongoDB cluster, temporarily satisfying massive data import/export needs in settlement scenarios.

High query efficiency for hot data; schema‑less storage made structural changes easy.

Still suffered from poor scalability, strong business intrusion, and high memory consumption.

2016‑2017

With rapid growth, data volume surpassed hundreds of billions, challenging MongoDB’s capacity and performance. JD Baitiao’s big‑data platform used DBRep to capture change logs via MySQL slaves, forwarding them to a message hub and finally persisting to Elasticsearch and HBase.

The solution offered strong real‑time data capability and good scalability.

Data sharding based on the business framework increased code maintenance costs.

2. Decoupling the backend architecture for smoother upgrades

Online shopping depends on speed and wallet depth. JD Baitiao was created to address wallet thickness, and now its backend stability and regularity are critical for user experience.

To ensure high performance under data surges, JD Baitiao’s team initially employed data‑sharding architecture, balancing performance with code controllability by using an application‑framework‑based data split scheme.

However, as the product iterated, early solutions became obstacles. The framework‑driven sharding increased code complexity and maintenance cost, making each upgrade labor‑intensive.

The team therefore focused on four decoupling dimensions:

Data‑architecture decoupling to reduce cross‑service impact during changes.

Technical‑architecture decoupling to simplify upgrade processes.

Business‑relationship decoupling to ensure user actions remain unaffected and to support large‑scale events like “618” and “11.11”.

R&D‑process decoupling to free backend productivity and lower code complexity.

Given the high coupling between backend databases and business logic, the team evaluated mature sharding components. The chosen component needed to satisfy four criteria for a financial‑grade, high‑concurrency, massive‑data scenario:

Product maturity and stability.

Excellent performance.

Ability to handle massive data volumes.

Flexible, extensible architecture.

After comparing a self‑developed framework with the ShardingSphere middleware, the team selected Apache ShardingSphere for its superior decoupling, lower code coupling, and reduced business intrusion.

Self‑developed framework

ShardingSphere

Performance

High

Code coupling

High

Low

Business intrusion

High

Low

Upgrade difficulty

High

Low

Scalability

Average

Good

Consequently, JD Baitiao adopted Apache ShardingSphere for its financial‑grade database sharding tasks.

3. Converging scenarios and the Apache ShardingSphere solution

JD Baitiao uses Apache ShardingSphere‑JDBC for online applications. ShardingSphere‑JDBC is a lightweight Java framework that enhances the JDBC layer, acting as an enhanced driver compatible with JDBC and various ORM frameworks, requiring only a JAR without extra deployment.

Key advantages for JD Baitiao:

Product maturity: Years of polishing, active community.

Good performance: Micro‑kernel, lightweight design with minimal overhead.

Low migration effort: Native JDBC compatibility reduces development work.

Flexible extension: Works with migration‑sync components for easy data expansion.

After extensive internal validation, Apache ShardingSphere became the preferred sharding middleware at the end of 2018.

Product adaptation

To fully support JD Baitiao’s business and improve experience, ShardingSphere enhanced its SQL engine, adding broader SQL compatibility. Collaboration between the two teams resulted in performance nearly identical to native JDBC.

Details of the integration can be found in the article “Apache ShardingSphere Practical Integration with JD Baitiao”.

Business cut‑over

ShardingSphere employs a customized HASH strategy, creating nearly ten thousand data nodes and avoiding hotspot issues. The migration lasted about four weeks.

DBRep reads data and synchronizes it to the target cluster via ShardingSphere.

Two clusters run in parallel; after migration, a self‑developed tool validates business and data consistency.

DBRep is the core of ShardingSphere‑Scaling, providing automation for future migration and scaling tasks.

Value gains

Simplified upgrade path: Decoupled architecture shortens technology stack changes, allowing developers to focus on business.

Saved development effort: Using a mature ShardingSphere eliminates the need to build a custom sharding component.

Flexible scaling: Combined with the Scaling sync component, the system easily handles peak events like “618” and “11.11”.

4. Facing new instability with a stable standard

How to understand instability and balance it.

As data importance grows and terminal scenarios become more granular, business lines continuously branch out, and a plethora of database products emerge. JD Baitiao, while still rapidly growing, is no longer at its original data scale; it now represents a mature, high‑traffic financial consumption scenario.

Future growth will inevitably trigger multiple “pain‑point” architecture transitions, each a risky venture for a stability‑focused financial product.

In the current generic data‑architecture landscape, the industry experiences a new unstable state. JD Baitiao needs a relatively stable standard and ecosystem to counter this trend, leading to the concept of “Database Plus”.

In 2018, Apache ShardingSphere creator Zhang Liang introduced “Database Plus”, aiming to provide a unified management layer above databases, reducing operational costs and improving efficiency.

Consequently, Sharding‑JDBC inside JD was upgraded to ShardingSphere, symbolizing the creation of a new ecosystem and the gradual establishment of the Database Plus direction, especially with the release of ShardingSphere 5.0.0.

Today, ShardingSphere’s plug‑in architecture enables a new data‑governance ecosystem on top of databases, offering horizontal scaling, encryption, and even distributed database capabilities without altering the underlying database architecture.

This technology is arguably one of the best solutions to address database fragmentation trends.

Looking ahead, Zhang Dongfang believes that building a unified data‑management platform atop diverse databases, with plug‑in capabilities, will match functional and business needs, eliminate redundant features, and allow rapid adjustments while keeping the data architecture clean and focused.

5. Returning to the essence

China’s massive internet user base generates the world’s largest data volume, yet the domestic data‑service market remains homogeneous, lacking a disruptive challenger to overseas database giants.

Manufacturers focus on niche scenarios, often overlooking broader perspectives. While new technologies are constantly discussed, they may not suit large‑scale financial, securities, manufacturing, or retail domains. Instead, middleware that adds incremental capabilities to existing technology stacks is more appropriate.

“Things need to return to their essence.” This principle also applies to data‑governance.

Thank you for reading. I am Wukong, looking forward to leveling up together!

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

database sharding NoSQL Data Architecture JD Baitiao Apache ShardingSphere

Written by

Wukong Talks Architecture

Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.