Databases 12 min read

Database Sharding: Horizontal Partitioning, Hash Modulo and Range Schemes, and a Hybrid Design

The article explains vertical and horizontal database sharding, compares hash-modulo and range partitioning schemes with their advantages and drawbacks, and proposes a hybrid group‑based design that balances load, avoids hotspots, and simplifies scaling without costly data migrations.

Code Ape Tech Column
Code Ape Tech Column
Code Ape Tech Column
Database Sharding: Horizontal Partitioning, Hash Modulo and Range Schemes, and a Hybrid Design

Database Sharding Overview

In large‑scale projects, when data volume grows, developers split data either vertically (by business domain) or horizontally (by rows). Vertical sharding creates separate databases for modules such as orders and users, while horizontal sharding distributes rows of a single table across multiple tables or databases.

Horizontal sharding is illustrated with a 40 million‑row order table that exceeds MySQL’s recommended single‑table size; the data can be split into four or more tables, or further into separate databases.

1. Hash Modulo Scheme

Each row is routed by applying id % N where N is the total number of tables. For example, with four tables, id=12 maps to table 0 and id=13 to table 1, ensuring an even distribution and eliminating hotspot issues.

Advantages: Uniform data distribution, no hotspot.

Disadvantages: When the number of tables changes (e.g., expanding from 4 to 8), the modulo base changes, causing existing rows to map to different tables and requiring painful data migration.

2. Range Scheme

Rows are assigned to tables based on predefined ID ranges (e.g., IDs 0‑9 999 999 go to table 0, 10 000 000‑19 999 999 to table 1). This design avoids data migration when new tables are added because existing ranges remain unchanged.

Advantages: No data migration needed during scaling.

Disadvantages: Hotspot risk because sequential IDs concentrate recent data in the same table.

3. Hybrid Design Idea

The proposed solution combines the two schemes: use a range to assign an ID to a logical group , then within the group apply hash‑modulo across all tables of that group. This keeps existing data stable while achieving uniform distribution.

Group Definition

A group contains several databases and tables. For example, IDs 0‑40 million belong to group01 , which has three databases (DB_0, DB_1, DB_2) and a total of ten tables.

Routing steps:

Determine the group by ID range.

Compute id % total_table_count (e.g., 10). Map the remainder to a specific database (e.g., 0‑3 → DB_0, 4‑6 → DB_1, 7‑9 → DB_2).

Within the selected database, route to the appropriate table.

This approach lets servers with higher capacity store more tables (DB_0 holds four tables, DB_1 and DB_2 hold three each), achieving load balancing according to hardware capabilities.

Core Process Flow

Scaling to New Groups

When the data volume exceeds the current group’s range, a new group (e.g., group02 ) is defined with its own ID range and the same routing logic, eliminating the need for data migration.

4. System Design

The final architecture stores the relationships between groups, databases, and tables, and caches them locally (e.g., JVM cache) to avoid frequent lookups. For dynamic updates, a distributed configuration center (or Zookeeper) can refresh the cache without restarting services.

Group‑to‑DB and table‑to‑DB mappings are illustrated below.

With this design, horizontal scaling, hotspot avoidance, and hardware‑aware data distribution are achieved without costly migrations.

scalabilitydatabase shardinghorizontal partitioninghash modulorange partitioning
Code Ape Tech Column
Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.