Databases 24 min read

Practical Guide to Horizontal Database Sharding: Design Principles, Common Schemes, and Expansion Strategies

This article explains the background, design criteria, and practical implementations of horizontal database sharding—including range, hash, gene, routing‑table, and consistent‑hash methods—while discussing common pitfalls, data‑skew mitigation, and both doubling and consistent‑hash expansion techniques for MySQL clusters.

High Availability Architecture
High Availability Architecture
High Availability Architecture
Practical Guide to Horizontal Database Sharding: Design Principles, Common Schemes, and Expansion Strategies

Background

As data volume and concurrency grow, single‑table performance degrades and a single MySQL instance may hit its processing limits, prompting the need for horizontal sharding (both database and table splitting). Two main approaches exist: vertical and horizontal, with the latter being the focus of this article.

What Makes a Good Sharding Scheme?

Key criteria include sustainability (the ability to handle future data growth without redesign) and low data‑skew (uniform distribution across shards). A sustainable scheme should allow smooth expansion when the current M‑by‑N (M databases, N tables) layout reaches capacity.

Common Sharding Schemes

1. Range Sharding

Data is partitioned by value ranges, e.g., orders stored in separate databases per year. Example code:

/**
 * 通过年份分表
 */
public static String rangeShardByYear(String orderId) {
    int year = Integer.parseInt(orderId.substring(0, 4));
    return "t_order_" + year;
}

Advantages: simple and compatible with TiDB/TiKV. Drawbacks: hotspot tables, need to pre‑create new shards, and cross‑range queries can be complex.

2. Hash Sharding

Hash of the sharding key determines both database and table indices. A naïve implementation that takes the same hash modulo DB_CNT and TBL_CNT leads to severe data‑skew when DB_CNT and TBL_CNT share a common factor.

public static ShardCfg shard(String userId) {
    int hash = userId.hashCode();
    int dbIdx = Math.abs(hash % DB_CNT);
    int tblIdx = Math.abs(hash % TBL_CNT);
    return new ShardCfg(dbIdx, tblIdx);
}

Corrected approaches include:

Standard Two‑Step Sharding : compute a global slot = hash % (DB_CNT * TBL_CNT), then derive dbIdx = slot / TBL_CNT and tblIdx = slot % TBL_CNT.

Routing‑Table Redundancy : store the dbIdx for each key in a separate routing table (cached for fast lookup).

Gene Method : use the first few characters of the key for dbIdx and the full hash for tblIdx, reducing but not eliminating skew.

Remove Common Factor Method : adjust table index calculation to eliminate the influence of the common divisor.

Consistent Hashing : map hash values to configurable ranges (or virtual nodes) to achieve balanced distribution and easier scaling.

Common Expansion Schemes

4.1 Doubling Expansion

Each expansion doubles the number of databases, using master‑slave replication to copy data, disabling writes during the switch, and then re‑routing traffic. This method provides clear capacity growth but incurs a write‑downtime window and may waste resources when many doublings occur.

4.2 Consistent‑Hash Expansion

Instead of doubling all shards, only the overloaded range is split and reassigned, allowing selective scaling without a full cluster restart. The process mirrors the doubling method but updates only the affected hash range configuration.

Summary

The article presents a comprehensive overview of horizontal sharding design for MySQL, covering range, hash, gene, routing‑table, and consistent‑hash techniques, their pros/cons, data‑skew considerations, and practical expansion procedures such as doubling and consistent‑hash based scaling.

ShardingMySQLconsistent hashinghorizontal partitioningDatabase Scalinghash algorithmrange partitioning
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.