Databases 12 min read

Mastering Database Sharding: When and How to Split Databases and Tables

This article explains the concepts of database and table sharding, why they are needed, various vertical and horizontal splitting strategies, range and hash partitioning methods, their advantages and drawbacks, timing considerations, common challenges such as transactions and distributed IDs, and introduces popular sharding middleware.

macrozheng

May 11, 2022

Mastering Database Sharding: When and How to Split Databases and Tables

Preface

Hello everyone, today we discuss database sharding (分库分表).

1. What Is Database Sharding

Sharding databases : splitting one database into multiple databases deployed on different machines.

Sharding tables : splitting a single table into multiple tables.

2. Why Sharding Is Needed

2.1 Why Split Databases?

When business volume surges, a single MySQL instance may hit performance bottlenecks. Splitting reduces disk usage and eases concurrent connection limits, especially in micro‑service architectures where different modules (orders, users, products) are isolated into separate databases.

2.2 Why Split Tables?

Large tables slow down queries, especially when indexes are not used or when a table exceeds ten million rows, causing B+‑tree height to increase and query speed to degrade.

InnoDB stores data in 16 KB pages. For a B+‑tree of height 2, each leaf can hold 16 records (16 KB/1 KB). Assuming an 8‑byte bigint primary key and a 6‑byte pointer, each leaf can hold about 1 170 records (16 KB/14 B). Thus a height‑2 tree can store roughly 18 720 records, while a height‑3 tree can hold about 21 902 400 records. When height reaches 4, extra disk reads make queries noticeably slower.

3. How to Shard Databases and Tables

3.1 Vertical Splitting

3.1.1 Vertical Database Sharding

Early in development a single database may suffice. As the system grows, separate databases (user, order, points, product) can be created and deployed on different servers, reducing pressure on any single instance.

3.1.2 Vertical Table Sharding

If a table has many columns, frequently unused or large columns (e.g., email, address, user_desc) can be moved to a separate detail table, reducing I/O for common select * queries.

3.2 Horizontal Splitting

3.2.1 Horizontal Database Sharding

Data is partitioned across multiple database servers, each holding the same schema but different row subsets, alleviating single‑node bottlenecks.

3.2.2 Horizontal Table Sharding

Rows are distributed to multiple tables based on a rule such as hash modulo or range. For example, an order table can be split by time range into several tables.

3.3 Sharding Strategies

3.3.1 Range Partitioning

Rows are divided by key ranges (e.g., 0‑10 million, 10‑20 million). This simplifies scaling because new tables can be added without moving existing data, but it may create hotspot issues when recent IDs concentrate in one table.

3.3.2 Hash Modulo Partitioning

Rows are assigned to tables by computing key % N. This distributes load evenly and avoids hotspots, but expanding from N to a larger number of tables requires data migration.

3.3.3 Combined Range + Hash

Use range partitioning to allocate large key intervals to different databases, then apply hash modulo within each database to spread rows across tables, balancing scalability and hotspot avoidance.

4. When to Consider Sharding

4.1 When to Shard Tables

If daily order volume reaches hundreds of thousands and query performance degrades, especially when table size approaches tens of millions (B+‑tree height > 3), sharding should be planned. Many practitioners start considering sharding around 5 million rows.

4.2 When to Shard Databases

When multiple services share a single monolithic database and it becomes a performance bottleneck, separate databases (order, user, etc.) can be created, often aligning with a micro‑service architecture.

5. Problems Introduced by Sharding

Transaction management – local transactions no longer work across databases; distributed transactions are required.

Cross‑database joins – need to be performed in multiple steps.

Sorting and aggregation – must be done per node and merged in the application.

Pagination – either aggregate results after per‑node pagination or let the front‑end handle pagination across nodes.

Distributed ID generation – cannot rely on auto‑increment; alternatives include UUID or Snowflake algorithms.

6. Popular Sharding Middleware

cobar

Mycat

Sharding‑JDBC

Atlas

TDDL (Taobao)

vitess

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

database sharding distributed transactions horizontal partitioning Vertical Partitioning hash partition range partition

Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.