Databases 24 min read

Comprehensive Guide to Database Sharding: Vertical & Horizontal Partitioning, Global ID Generation, and Distributed Transaction Challenges

This article explains the concepts, methods, advantages, and drawbacks of database sharding, covering vertical and horizontal partitioning, sharding rules, global primary key generation, distributed transaction issues, cross‑node queries, pagination, data migration, and practical case studies, with code examples and diagrams.

Top Architect

Mar 16, 2022

Comprehensive Guide to Database Sharding: Vertical & Horizontal Partitioning, Global ID Generation, and Distributed Transaction Challenges

1. Data Partitioning

Relational databases can become performance bottlenecks when a single table reaches massive size (e.g., 10 million rows or 100 GB). Sharding (data partitioning) reduces load by distributing data across multiple databases, shortening query time.

1.1 Vertical (Logical) Partitioning

Vertical partitioning includes vertical database splitting and vertical table splitting.

Vertical splitting stores low‑coupling tables in separate databases, similar to micro‑service governance.

Vertical table splitting extracts rarely used or large columns into an extension table, improving memory cache hit rate and reducing disk I/O.

Advantages:

Reduces business coupling and clarifies responsibilities.

Enables independent scaling and monitoring of different business domains.

Improves I/O and connection limits under high concurrency.

Disadvantages:

Cross‑database joins become impossible; aggregation must be done via service calls.

Distributed transaction handling becomes more complex.

Large tables may still require horizontal splitting.

1.2 Horizontal (Physical) Partitioning

When vertical splitting is insufficient, horizontal partitioning splits a large table into multiple tables or databases based on a sharding key.

Two common rules:

Range based: Split by time or ID intervals (e.g., month, userId range).

Hash/modulo based: Use hash(key) % N to distribute rows evenly.

Advantages of range‑based sharding: easy to add new nodes without data migration. Drawbacks: uneven load if newer data is hotter.

Advantages of hash‑based sharding: balanced data and request distribution. Drawbacks: re‑hashing required for scaling.

2. Global Primary Key Generation

Auto‑increment IDs are unsuitable for sharded environments. Common solutions:

UUID – simple but large and index‑unfriendly.

Dedicated sequence table (MyISAM) with a unique stub column to generate globally unique IDs.

Multiple ID‑generation servers with staggered auto‑increment steps.

Twitter Snowflake – 64‑bit IDs composed of timestamp, datacenter ID, worker ID, and sequence.

Meituan‑Dianping Leaf – a production‑grade ID service handling high availability and clock rollback.

3. Problems Introduced by Sharding

3.1 Distributed Transaction Consistency

Updates spanning multiple shards require XA or two‑phase commit, increasing latency and conflict probability.

3.2 Cross‑Node Joins

Joins across shards are costly; solutions include global tables, field redundancy, data assembly in two steps, or ER‑based sharding that keeps related tables in the same shard.

3.3 Pagination, Sorting, and Aggregation

Cross‑shard pagination requires sorting each shard and merging results, which can be CPU‑ and memory‑intensive for large page numbers. Aggregation functions (MAX, SUM, COUNT) must be executed per shard and then combined.

3.4 Global Primary Key Collision

Strategies such as UUID, sequence tables, staggered auto‑increment, or Snowflake avoid collisions while providing scalability.

3.5 Data Migration & Scaling

When capacity limits are reached, historical data must be migrated to new shards. Range‑based sharding eases scaling; hash‑based sharding requires re‑hashing.

4. When to Consider Sharding

Table size exceeds practical limits (e.g., >10 GB, >10 M rows).

Backup or DDL operations cause unacceptable downtime.

High write/read concurrency leads to lock contention.

Specific fields become hot spots and benefit from vertical splitting.

Business growth demands horizontal scaling.

Separating workloads improves availability and isolates failures.

5. Sharding Middleware Recommendations

sharding‑jdbc (Dangdang)

TSharding (Mogujie)

Atlas (360)

Cobar (Alibaba)

MyCAT (based on Cobar)

Oceanus (58.com)

Vitess (Google)

6. Case Study: User Center

A typical user table (uid, login_name, passwd, sex, age, nickname, etc.) grows from 100 k to billions of rows. Vertical splitting isolates frequently updated fields (e.g., last_login_time) into a separate table, while horizontal sharding distributes users across databases using range or modulo on uid.

For non‑uid queries (login_name, email), a mapping table or hash‑based gene function maps the attribute to uid, enabling efficient routing.

Operational analytics that require massive scans are off‑loaded to a separate backend service or data warehouse (e.g., Elasticsearch, Hive) to avoid impacting the user‑facing service.

Overall, sharding improves scalability, availability, and performance but introduces complexity in transaction management, query routing, and data migration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Sharding distributed transactions horizontal scaling Global ID Generation database partitioning vertical splitting

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.