Databases 14 min read

Database Sharding: Concepts, Global ID Strategies, Partitioning Schemes, and Expansion Solutions

This article explains database sharding fundamentals, including terminology, when to adopt sharding, various global ID generation methods such as auto‑increment, UUID, COMB, and Snowflake, different partitioning strategies, challenges like distributed transactions, and practical expansion and migration solutions.

Top Architect

Feb 22, 2022

Database Sharding: Concepts, Global ID Strategies, Partitioning Schemes, and Expansion Solutions

1. Sharding Overview

When a single‑instance database cannot handle large data volume or high concurrency, sharding (splitting databases and tables) is considered to overcome the limitations.

1.1 Sharding Terminology

Read‑Write Separation: Different databases handle reads and writes respectively.

Partition: Records are divided into different physical partitions on the same server.

Database Sharding: Multiple tables of a system are stored across several database instances.

Table Sharding:

Vertical Sharding: Different fields are stored in separate tables.

Horizontal Sharding (most complex): Records are split across tables according to a sharding algorithm.

1.2 Should You Adopt Sharding?

Sharding introduces complexity and performance overhead; it should only be used when the projected workload truly requires it. Before sharding, consider:

If current data size is below a few million rows, sharding is usually unnecessary.

Increase storage or add more databases to handle data‑size issues.

Upgrade CPU/Memory, enable read‑write separation, tune MySQL configuration, optimize tables, indexes, SQL, use partitioning or vertical table splitting for performance problems.

Only if these measures fail, consider horizontal sharding.

2. Global ID Generation Strategies

2.1 Auto‑Increment Column

Advantages: built‑in, ordered, good performance. Drawbacks: in a sharded environment IDs may collide if not planned.

2.1.1 Set Auto‑Increment Offset and Step

### 假设总共有 10 个分表
### 级别可选: SESSION(会话级), GLOBAL(全局)
SET @SESSION.auto_increment_offset = 1; ### 起始值, 分别取值为 1~10
SET @SESSION.auto_increment_increment = 10; ### 步长增量

When using this method, data must be migrated to new shards during expansion.

2.1.2 Global ID Mapping Table

Store a key in a global Redis for each table that records the current maximum ID; each request increments the key and returns the new ID. The Redis data should be persisted to a durable store.

2.2 UUID (128‑bit)

Universally unique identifiers generated by the platform. Format example: 550e8400‑e29b‑41d4‑a716‑446655440000. Advantages: simple and globally unique. Disadvantages: large storage/transmission size, unordered, performance impact.

2.3 COMB (Combined GUID)

Combines a 10‑byte GUID with a 6‑byte timestamp to produce an ordered identifier, improving index performance.

2.4 Snowflake Algorithm

Twitter's open‑source distributed ID generator producing a 64‑bit integer. Structure (excluding the sign bit):

41 bits: timestamp in milliseconds (covers ~69 years).

10 bits: node identifier (5‑bit data‑center + 5‑bit machine, supporting 1024 nodes).

12 bits: sequence number (supports 4096 IDs per millisecond per node).

3. Sharding Strategies

3.1 Range Sharding

Data is divided based on a range of a specific field (e.g., user ID or order time). Advantage: after adding new nodes, only the new range needs to be assigned, avoiding data migration. Disadvantage: time‑based ranges can cause hotspot imbalance.

3.2 Consistent Hashing

Uses a hash ring to map keys to nodes; adding or removing nodes requires minimal data movement.

3.3 Modulo Sharding

Assigns records based on ID % N. Simple but requires data migration when N changes.

3.4 Snowflake Sharding

Leverages Snowflake's node ID bits to route records, achieving expansion without data migration.

4. Problems Introduced by Sharding

4.1 Distributed Transactions

Two‑phase/three‑phase commit incurs high performance cost; compensation mechanisms are often preferred.

4.2 Cross‑Node JOIN

MySQL supports JOIN on a single instance; for multi‑node joins, avoid using native JOIN and consider alternatives such as global tables, field redundancy, or application‑side assembly.

4.3 Cross‑Node Aggregation

Must be performed in the application layer; large aggregations followed by pagination can be inefficient.

4.4 Node Expansion

Adding nodes changes shard ownership, requiring data migration.

5. Node Expansion Plans

5.1 Conventional Expansion

When adding nodes without careful planning, most data will need to be moved. Typical steps:

Estimate migration time and announce downtime.

Stop services, run migration scripts.

Update sharding rules.

Restart services.

5.2 Migration‑Free Expansion

Double‑capacity strategy: each existing node (A, B) adds a replica (A2, B2) and switches to a new modulo rule.

Add A2 and B2 as read‑only replicas and sync data.

Change sharding rule from ID%2=0 => A, ID%2=1 => B to ID%4=0 => A, ID%4=2 => A2, ID%4=1 => B, ID%4=3 => B2.

Remove replication and let all four nodes serve traffic; redundant data can be cleaned later.

6. Sharding Implementation Options

6.1 Proxy Layer

Deploy a proxy server (e.g., MyCAT) that masquerades as a MySQL instance, routing queries to real backend nodes. The proxy is transparent to applications and supports read‑write separation, sharding, multi‑tenant, and cloud‑native scenarios.

6.2 Application Layer

Integrate a library/JAR into the application code. Examples:

Sharding‑JDBC (active) provides lightweight JDBC‑level sharding, supports Snowflake algorithm, flexible sharding keys, and full SQL parsing (joins, aggregates, limits, etc.).

Other historic solutions (e.g., TDDL) are no longer maintained.

Sharding‑JDBC works with any Java ORM (JPA, Hibernate, MyBatis) and connection pool (Druid, C3P0, etc.).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Scalability Sharding Partitioning ID generation

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.