Databases 17 min read

How to Implement Database Sharding: Horizontal & Vertical Partitioning with Sharding-JDBC

This article explains the fundamentals of database sharding, covering both vertical and horizontal partitioning concepts, routing algorithms, advantages and drawbacks, practical implementation steps using sharding-jdbc, and discusses related challenges such as distributed transactions, pagination, global IDs, and tool selection.

macrozheng
macrozheng
macrozheng
How to Implement Database Sharding: Horizontal & Vertical Partitioning with Sharding-JDBC

Previously many Java beginners left comments wanting a systematic study of sharding, and with the company project using

sharding-jdbc

to refactor a MySQL architecture, I decided to write a series of practical articles on sharding as a summary of my architectural learning.

I have read many fragmented articles on sharding online, but most are repetitive and lack detailed real‑world cases. After purchasing a few paid courses, I found that while experienced developers could follow them, beginners still face a steep learning curve.

What is Sharding

In fact,

sharding

consists of two concepts:

database sharding

and

table sharding

. They are usually performed together, so we commonly refer to them as sharding.

Sharding solves performance degradation caused by excessively large databases or tables by splitting a massive database into multiple independent databases and a massive table into several smaller tables, thereby optimizing the performance of each individual database or table.

How to Shard

The core idea of sharding is to split data (

Sharding

) and then quickly locate and aggregate query results. Sharding can be performed along two dimensions:

vertical

(by business) and

horizontal

(by data range or hash).

Sharding diagram
Sharding diagram

Below we use an order‑related business example to illustrate vertical and horizontal splitting.

Vertical Split

Vertical split includes

vertical

database sharding and

vertical

table sharding.

1. Vertical Database Split

Vertical database split is easy to understand: each business gets its own dedicated database (

专库专用

).

Tables related to orders, payments, coupons, points, etc., are placed in separate databases. Developers cannot directly access other business databases; instead, they must use API interfaces, which is the initial form of micro‑services.

Vertical splitting improves performance to some extent but does not solve the problem of a single table becoming too large, so it often needs to be combined with horizontal splitting.

Vertical database split diagram
Vertical database split diagram

2. Vertical Table Split

Vertical table split is based on columns, i.e., splitting a large table into smaller tables by separating frequently accessed columns from large or rarely used columns.

For example, an

order

table can have a separate table for frequently accessed fields (order amount, order number) and another table

work_extend

for large

blob

fields. The split tables are then distributed across different databases.

Vertical table split diagram
Vertical table split diagram

Since databases load data by rows, after vertical splitting the core tables contain only high‑frequency, short‑length fields, allowing more data to be cached in memory, improving hit rate and reducing disk I/O.

Business data is decoupled, enabling independent maintenance, monitoring, and scaling.

In high‑concurrency scenarios, database pressure is alleviated to some extent.

Development complexity increases because cross‑business data must be accessed via APIs.

Distributed transaction management becomes harder.

The single‑table size problem remains; horizontal splitting is still required.

Horizontal Split

When vertical splitting cannot fully eliminate large‑scale data bottlenecks, horizontal splitting is needed to further improve performance.

1. Horizontal Database Split

Horizontal database split distributes the same table across multiple databases, each possibly on a different server, achieving horizontal scalability.

This approach solves single‑database storage and performance limits, but introduces routing complexity.

For example, three databases

orderDB_1

,

orderDB_2

,

orderDB_3

each contain an identical

order

table. An order ID modulo 3 determines which database stores a particular order.

Horizontal database split diagram
Horizontal database split diagram

2. Horizontal Table Split

Within the same database, a large table is split into multiple identical tables, each holding a portion of the data.

For an

order

table with 9 million rows, it can be divided into

order_1

,

order_2

,

order_3

, each storing 3 million rows.

Horizontal table split diagram
Horizontal table split diagram

Although horizontal table split reduces the size of a single table, all sub‑tables still reside on the same database instance, competing for the same CPU, memory, and network resources. To achieve true distributed scaling, the split tables must be placed on different database instances.

Distributed sharding diagram
Distributed sharding diagram

Solves the problem of a single database becoming a bottleneck under high concurrency, improving system stability and load capacity.

The amount of work required for system refactoring is moderate.

Cross‑shard transaction consistency is difficult to guarantee.

Cross‑database join queries have poor performance.

Scaling and maintenance become complex, especially when thousands of sub‑tables are created.

What are the Rules

The "certain rule" mentioned earlier is a routing algorithm that decides which database and table a piece of data should reside in.

Common algorithms include the modulo algorithm and the range algorithm.

1. Modulo Algorithm

Taking the hash of a field and applying modulo N (where N is the number of database instances or sub‑tables) is the most common method.

For the

order

table, the

work_no

(order number) field is modulo‑ed: the remainder i determines the target database (i=0 → first DB, i=1 → second DB, etc.).

Data shards are relatively uniform, avoiding hot‑spot concentration.

If a machine fails, the modulo base changes, causing data to be routed to different databases, which can break consistency.

2. Range Algorithm

Data can be split by time intervals or ID ranges. For example, a

User

table may store IDs 1‑9999 in the first database, 10000‑19999 in the second, and so on.

Each table’s data volume is controllable.

Horizontal scaling is simple—just add new nodes without migrating existing data.

Data location can be quickly determined.

Hot‑spot issues may arise when a particular time range receives a surge of orders.

Challenges of Sharding

1. Distributed Transactions

Cross‑database transactions are inevitable. Traditional two‑phase or three‑phase commit solutions have poor performance and high development cost. Many systems adopt eventual consistency with compensation mechanisms, such as using Alibaba’s

Seata

framework.

2. Pagination, Sorting, Cross‑DB Joins

These common operations become painful after sharding because data must be aggregated from multiple shards before being presented to the user.

3. Distributed Primary Keys

Auto‑increment IDs lose their global uniqueness across shards, so a distributed ID generator is required.

4. Read‑Write Separation

Most relational databases provide master‑slave high‑availability. When combined with sharding, both read and write replicas must also be sharded.

5. Data Masking

Sensitive information (ID numbers, phone numbers, etc.) must be masked according to privacy rules.

Sharding Tools

Instead of building a wheel, many mature sharding middleware solutions exist. Common choices include:

sharding-jdbc

(Alibaba)

TSharding

(Mogujie)

Atlas

(360)

Cobar

(Alibaba)

MyCAT

(based on Cobar)

Oceanus

(58.com)

Vitess

(Google)

Why Choose sharding-jdbc

sharding-jdbc

is a lightweight Java framework delivered as a

jar

. It works as an enhanced JDBC driver on the client side, requiring no separate server deployment. Its strong compatibility with any JDBC‑based ORM (JPA, Hibernate, MyBatis, Spring JDBC Template) and with all major connection pools (DBCP, C3P0, BoneCP, Druid, HikariCP) makes it easy to integrate.

Minimal intrusion: no code changes or SQL modifications are needed; only table configuration is required.

Broad compatibility with various databases and connection pools.

Summary

This article reviewed the basic concepts of database sharding. In upcoming posts, I will demonstrate how

sharding-jdbc

implements each functional point of sharding in a real‑world project.

JavaShardingdistributed transactionshorizontal scalingvertical scalingdatabase partitioning
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.