Databases 17 min read

How to Implement Database Sharding: Horizontal & Vertical Partitioning with Sharding-JDBC

This article explains the fundamentals of database sharding, covering both vertical and horizontal partitioning concepts, routing algorithms, advantages and drawbacks, practical implementation steps using sharding-jdbc, and discusses related challenges such as distributed transactions, pagination, global IDs, and tool selection.

macrozheng

Oct 9, 2020

How to Implement Database Sharding: Horizontal & Vertical Partitioning with Sharding-JDBC

Previously many Java beginners left comments wanting a systematic study of sharding, and with the company project using sharding-jdbc to refactor a MySQL architecture, I decided to write a series of practical articles on sharding as a summary of my architectural learning.

I have read many fragmented articles on sharding online, but most are repetitive and lack detailed real‑world cases. After purchasing a few paid courses, I found that while experienced developers could follow them, beginners still face a steep learning curve.

What is Sharding

In fact, sharding consists of two concepts: database sharding and table sharding. They are usually performed together, so we commonly refer to them as sharding.

Sharding solves performance degradation caused by excessively large databases or tables by splitting a massive database into multiple independent databases and a massive table into several smaller tables, thereby optimizing the performance of each individual database or table.

How to Shard

The core idea of sharding is to split data ( Sharding) and then quickly locate and aggregate query results. Sharding can be performed along two dimensions: vertical (by business) and horizontal (by data range or hash).

Below we use an order‑related business example to illustrate vertical and horizontal splitting.

Vertical Split

Vertical split includes vertical database sharding and vertical table sharding.

1. Vertical Database Split

Vertical database split is easy to understand: each business gets its own dedicated database ( 专库专用).

Tables related to orders, payments, coupons, points, etc., are placed in separate databases. Developers cannot directly access other business databases; instead, they must use API interfaces, which is the initial form of micro‑services.

Vertical splitting improves performance to some extent but does not solve the problem of a single table becoming too large, so it often needs to be combined with horizontal splitting.

2. Vertical Table Split

Vertical table split is based on columns, i.e., splitting a large table into smaller tables by separating frequently accessed columns from large or rarely used columns.

For example, an order table can have a separate table for frequently accessed fields (order amount, order number) and another table work_extend for large blob fields. The split tables are then distributed across different databases.

Since databases load data by rows, after vertical splitting the core tables contain only high‑frequency, short‑length fields, allowing more data to be cached in memory, improving hit rate and reducing disk I/O.

Business data is decoupled, enabling independent maintenance, monitoring, and scaling.

In high‑concurrency scenarios, database pressure is alleviated to some extent.

Development complexity increases because cross‑business data must be accessed via APIs.

Distributed transaction management becomes harder.

The single‑table size problem remains; horizontal splitting is still required.

Horizontal Split

When vertical splitting cannot fully eliminate large‑scale data bottlenecks, horizontal splitting is needed to further improve performance.

1. Horizontal Database Split

Horizontal database split distributes the same table across multiple databases, each possibly on a different server, achieving horizontal scalability.

This approach solves single‑database storage and performance limits, but introduces routing complexity.

For example, three databases orderDB_1, orderDB_2, orderDB_3 each contain an identical order table. An order ID modulo 3 determines which database stores a particular order.

2. Horizontal Table Split

Within the same database, a large table is split into multiple identical tables, each holding a portion of the data.

For an order table with 9 million rows, it can be divided into order_1, order_2, order_3, each storing 3 million rows.

Although horizontal table split reduces the size of a single table, all sub‑tables still reside on the same database instance, competing for the same CPU, memory, and network resources. To achieve true distributed scaling, the split tables must be placed on different database instances.

Solves the problem of a single database becoming a bottleneck under high concurrency, improving system stability and load capacity.

The amount of work required for system refactoring is moderate.

Cross‑shard transaction consistency is difficult to guarantee.

Cross‑database join queries have poor performance.

Scaling and maintenance become complex, especially when thousands of sub‑tables are created.

What are the Rules

The "certain rule" mentioned earlier is a routing algorithm that decides which database and table a piece of data should reside in.

Common algorithms include the modulo algorithm and the range algorithm.

1. Modulo Algorithm

Taking the hash of a field and applying modulo N (where N is the number of database instances or sub‑tables) is the most common method.

For the order table, the work_no (order number) field is modulo‑ed: the remainder i determines the target database (i=0 → first DB, i=1 → second DB, etc.).

Data shards are relatively uniform, avoiding hot‑spot concentration.

If a machine fails, the modulo base changes, causing data to be routed to different databases, which can break consistency.

2. Range Algorithm

Data can be split by time intervals or ID ranges. For example, a User table may store IDs 1‑9999 in the first database, 10000‑19999 in the second, and so on.

Each table’s data volume is controllable.

Horizontal scaling is simple—just add new nodes without migrating existing data.

Data location can be quickly determined.

Hot‑spot issues may arise when a particular time range receives a surge of orders.

Challenges of Sharding

1. Distributed Transactions

Cross‑database transactions are inevitable. Traditional two‑phase or three‑phase commit solutions have poor performance and high development cost. Many systems adopt eventual consistency with compensation mechanisms, such as using Alibaba’s Seata framework.

2. Pagination, Sorting, Cross‑DB Joins

These common operations become painful after sharding because data must be aggregated from multiple shards before being presented to the user.

3. Distributed Primary Keys

Auto‑increment IDs lose their global uniqueness across shards, so a distributed ID generator is required.

4. Read‑Write Separation

Most relational databases provide master‑slave high‑availability. When combined with sharding, both read and write replicas must also be sharded.

5. Data Masking

Sensitive information (ID numbers, phone numbers, etc.) must be masked according to privacy rules.

Sharding Tools

Instead of building a wheel, many mature sharding middleware solutions exist. Common choices include: sharding-jdbc (Alibaba) TSharding (Mogujie) Atlas (360) Cobar (Alibaba) MyCAT (based on Cobar) Oceanus (58.com) Vitess (Google)

Why Choose sharding-jdbc

sharding-jdbc

is a lightweight Java framework delivered as a jar. It works as an enhanced JDBC driver on the client side, requiring no separate server deployment. Its strong compatibility with any JDBC‑based ORM (JPA, Hibernate, MyBatis, Spring JDBC Template) and with all major connection pools (DBCP, C3P0, BoneCP, Druid, HikariCP) makes it easy to integrate.

Minimal intrusion: no code changes or SQL modifications are needed; only table configuration is required.

Broad compatibility with various databases and connection pools.

Summary

This article reviewed the basic concepts of database sharding. In upcoming posts, I will demonstrate how sharding-jdbc implements each functional point of sharding in a real‑world project.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Sharding distributed transactions horizontal scaling Vertical Scaling database partitioning

Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.