Databases 16 min read

When Do Database Tables Reach Bottlenecks and How to Split Them: Strategies, Cases, and Best Practices

This article explains how massive e‑commerce platforms identify the point at which database tables become performance bottlenecks, discusses vertical and horizontal sharding strategies, evaluates hash, range and consistent‑hash partitioning, and presents real‑world case studies from Suning, JD, Meituan, Ant Financial and Taobao.

Full-Stack Internet Architecture

Jun 11, 2021

When Do Database Tables Reach Bottlenecks and How to Split Them: Strategies, Cases, and Best Practices

Part1 – When Will Database Tables Reach Bottlenecks?

1.1 Suning Pintu million‑level table before splitting Suning Pintu, an e‑commerce app under Suning.com, exceeded 30 million users in July 2018. Daily active users reach ten‑million‑plus, with millions of new SKUs and orders per day, generating over 100 million rows per day and peaking at 100 k QPS. The single database grows >1 GB daily, making a missed‑cache SQL a disaster for the whole service. Therefore stability and long‑term scalability become mandatory. 1.2 JD Logistics platform table before splitting Initially JD Logistics used SQL Server for ~100 k daily transactions, then migrated to enterprise‑grade Oracle/IBM AIX with RAC + DataGuard, supporting millions of orders. The monolithic architecture became costly and inflexible, prompting a vertical split and containerised storage in 2015. 1.3 Meituan‑Dianping order table before splitting Sixteen years ago the order database exceeded 200 GB. Index optimisation and two read replicas helped, but flash‑sale traffic soon overloaded the system, forcing rate‑limiting and message‑queue throttling. Continuous schema changes caused lock contention, so a ten‑year‑horizon plan for table partitioning was launched. 1.4 Summary: When to consider table splitting There is no single quantitative metric; when hardware limits, operational cost, or business growth risk become unacceptable, splitting should be evaluated. Indicators include hardware performance limits, daily operational effort (e.g., monthly data migrations), and the inability to handle traffic spikes or future feature expansion.

2 – Purpose and Plans for Splitting Tables 2.1 Business data decoupling – vertical splitting Separate different business domains into independent databases to support micro‑service isolation, rapid iteration, safe releases, and fault isolation. 2.2 Capacity and performance pressure – horizontal splitting When a single table or database reaches capacity or performance limits, split it horizontally. Start with table‑level sharding, then consider database‑level sharding as growth continues. 2.3 How many shards are appropriate? Empirical rule: keep a single table under ~6 million rows; keep a single database under ~4 000 concurrent connections. Example calculation for a platform with 100 k daily orders and 8 000 QPS peak leads to roughly 64 tables and 4 databases. 2.4 How to split appropriately? Hash Modulo Even distribution across shards, but future traffic spikes may require re‑hashing and data migration. Range Partitioning No migration needed for growth, but can cause data skew where a few shards receive most of the load. Consistent Hashing Provides uniform distribution and minimal data movement when scaling, at the cost of more complex routing logic.

3 – New Problems Introduced by Splitting Choosing Partition Keys Keys must be uniformly distributed (e.g., user ID, order ID, merchant ID). Time‑based tables may use date as the key. Globally Unique Primary Keys Requires a distributed ID generation strategy (see the author’s article on distributed ID algorithms). Data Migration Two approaches: stop‑the‑world deployment (simple but risky) or smooth migration using double‑write, checkpoints, and gradual routing switch‑over. Transaction Management When data spans multiple databases, use distributed transaction patterns (TCC, saga, transaction messages) or application‑level compensation logic. Query Limitations Cross‑shard joins and aggregates must be handled at the application layer or via middleware that aggregates results.

4 – Large‑Scale Company Cases 4.1 Ant Financial Table Routing Rules Uses LDC unit deployment; routing based on the last two or three digits of the user ID, achieving a “hundred‑database‑hundred‑table” pattern, with occasional “thousand‑database‑thousand‑table” for massive batch jobs. 4.2 Meituan‑Dianping Data Migration Three‑stage migration: double‑write with reconciliation, import historical data while continuing double‑write, then cut over and delete old data. 4.3 Taobao Trillion‑Level Transaction Order Storage Engine Adopts similar historical‑order migration and sharding techniques to sustain ultra‑high transaction volume.

5 – Summary The article cannot cover every detail; readers are encouraged to supplement, correct, and discuss. Original content, please share responsibly.

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.