Databases 14 min read

Why and How to Split Databases: Vertical & Horizontal Sharding Explained

This article explains the reasons for database sharding, compares vertical (scale‑up) and horizontal (scale‑out) partitioning methods, illustrates vertical database/table splits and horizontal partition strategies such as HASH, RANGE, LIST, KEY and composite, and discusses the challenges like distributed transactions, cross‑database joins and increased management overhead.

Architecture & Thinking
Architecture & Thinking
Architecture & Thinking
Why and How to Split Databases: Vertical & Horizontal Sharding Explained

1 Why Split Databases

Physical servers have limited CPU, memory, storage and connection capacity; a sudden surge of concurrent operations can cause performance bottlenecks. Splitting large databases into smaller ones distributes load across multiple machines, improving performance. Two main approaches are vertical and horizontal splitting.

2 Vertical Splitting (Scale‑Up)

Vertical splitting separates databases or tables by functional modules to reduce resource contention. Examples include creating separate order, product, and user databases.

2.1 Vertical Database Split

Three databases are created: one for products, one for orders, and one for scores.

2.2 Vertical Table Split

When a table has many columns, frequently used fields can stay in a main table while rarely used fields move to an auxiliary table, linked by primary key.

Aligns with micro‑service principles, eases management and scaling.

Improves performance under high concurrency and enables hot‑cold data separation.

Drawbacks:

Some tables cannot be joined directly, requiring complex aggregation logic.

Data volume growth and distributed transactions remain challenging.

3 Horizontal Splitting (Scale‑Out)

Horizontal splitting divides a large table into multiple tables with identical structure, either within the same database (in‑database partitioning) or across databases (sharding).

3.1 In‑Database Partitioning

When the Orders table reaches tens of millions of rows, it can be partitioned by order_id ranges, e.g., 1‑10 M in order1 , 10‑20 M in order2 , etc.

Industry guidelines suggest splitting when a single table exceeds 5 million rows or 2 GB (Alibaba) or 10 million rows (Baidu). However, the number of columns and their types also affect performance.

3.2 In‑Database Partition Strategies

3.2.1 HASH

<code>DROP TABLE IF EXISTS `t_userinfo`;
CREATE TABLE `t_userinfo` (
  `id` int(10) unsigned NOT NULL,
  `personcode` varchar(20) DEFAULT NULL,
  `personname` varchar(100) DEFAULT NULL,
  `depcode` varchar(100) DEFAULT NULL,
  `depname` varchar(500) DEFAULT NULL,
  `gwcode` int(11) DEFAULT NULL,
  `gwname` varchar(200) DEFAULT NULL,
  `gravalue` varchar(20) DEFAULT NULL,
  `createtime` DateTime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
PARTITION BY HASH(YEAR(createtime)) PARTITIONS 10;</code>

This creates ten partitions based on the year of createtime .

3.2.2 RANGE

<code>DROP TABLE IF EXISTS `t_userinfo`;
CREATE TABLE `t_userinfo` (
  `id` int(10) unsigned NOT NULL,
  `personcode` varchar(20) DEFAULT NULL,
  `personname` varchar(100) DEFAULT NULL,
  `depcode` varchar(100) DEFAULT NULL,
  `depname` varchar(500) DEFAULT NULL,
  `gwcode` int(11) DEFAULT NULL,
  `gwname` varchar(200) DEFAULT NULL,
  `gravalue` varchar(20) DEFAULT NULL,
  `createtime` DateTime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
PARTITION BY RANGE(gwcode) (
  PARTITION P0 VALUES LESS THAN (101),
  PARTITION P1 VALUES LESS THAN (201),
  PARTITION P2 VALUES LESS THAN (301),
  PARTITION P3 VALUES LESS THAN MAXVALUE
);</code>

Rows with gwcode 1‑100 go to P0, 101‑200 to P1, etc.

3.2.3 LIST

<code>DROP TABLE IF EXISTS `t_userinfo`;
CREATE TABLE `t_userinfo` (
  `id` int(10) unsigned NOT NULL,
  `personcode` varchar(20) DEFAULT NULL,
  `personname` varchar(100) DEFAULT NULL,
  `depcode` varchar(100) DEFAULT NULL,
  `depname` varchar(500) DEFAULT NULL,
  `gwcode` int(11) DEFAULT NULL,
  `gwname` varchar(200) DEFAULT NULL,
  `gravalue` varchar(20) DEFAULT NULL,
  `createtime` DateTime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
PARTITION BY LIST(`gwcode`) (
  PARTITION P0 VALUES IN (46,77,89),
  PARTITION P1 VALUES IN (106,125,177),
  PARTITION P2 VALUES IN (205,219,289),
  PARTITION P3 VALUES IN (302,317,458,509,610)
);</code>

Only rows whose gwcode matches the listed values are placed in the corresponding partition.

3.2.4 KEY

<code>DROP TABLE IF EXISTS `t_userinfo`;
CREATE TABLE `t_userinfo` (
  `id` int(10) unsigned NOT NULL,
  `personcode` varchar(20) DEFAULT NULL,
  `personname` varchar(100) DEFAULT NULL,
  `depcode` varchar(100) DEFAULT NULL,
  `depname` varchar(500) DEFAULT NULL,
  `gwcode` int(11) DEFAULT NULL,
  `gwname` varchar(200) DEFAULT NULL,
  `gravalue` varchar(20) DEFAULT NULL,
  `createtime` DateTime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
PARTITION BY KEY(gwcode) PARTITIONS 10;</code>

MySQL’s internal hash function distributes rows based on gwcode . This method is less commonly used due to its nondeterministic nature.

3.2.5 Composite

Combines two or more partitioning methods, e.g., RANGE followed by HASH.

4 Problems of Sharding

4.1 Transaction Issues

Data spread across multiple databases makes distributed transaction management difficult and costly.

4.2 Cross‑Database Join Limitations

Joining tables located in different shards is not supported, requiring multiple queries and additional aggregation logic.

4.3 Extra Data Management and Computation Overhead

Operations such as ranking top‑N users now require querying each shard separately and merging results, increasing complexity and resource consumption.

MySQLdatabase shardinghorizontal partitioningvertical partitioningpartitioning
Architecture & Thinking
Written by

Architecture & Thinking

🍭 Frontline tech director and chief architect at top-tier companies 🥝 Years of deep experience in internet, e‑commerce, social, and finance sectors 🌾 Committed to publishing high‑quality articles covering core technologies of leading internet firms, application architecture, and AI breakthroughs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.