Databases 13 min read

Using Hash Partitioning in MySQL: Scenarios, Queries, and Optimizations

This article explains MySQL hash partition tables, illustrating their suitable use cases such as equality queries on IDs or dates, demonstrating creation, data loading, and query plans, and comparing standard hash partitions with linear hash partitions to highlight performance and data distribution considerations.

Aikesheng Open Source Community

May 11, 2022

Using Hash Partitioning in MySQL: Scenarios, Queries, and Optimizations

When discussing partitioned tables, range partitioning is most common, while hash partitioning has limited, non‑standard scenarios. The following examples demonstrate MySQL hash partition tables, their use cases, and related adjustments.

For a hash partitioned table, the simplest method is to hash a single column. The example table hash_t1 contains 5 million rows and is partitioned by HASH on the auto‑increment id column into 1024 partitions:

mysql:ytt_new> show create table hash_t1\G
*************************** 1. row ***************************
       Table: hash_t1
Create Table: CREATE TABLE `hash_t1` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `r1` int DEFAULT NULL,
  `log_date` date DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=5000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
/*!50100 PARTITION BY HASH (`id`)
PARTITIONS 1024 */
1 row in set (0.00 sec)

mysql:ytt_new> select count(*) from hash_t1;
+----------+
| count(*) |
+----------+
|  5000000 |
+----------+
1 row in set (2.43 sec)

The partitioning method of hash_t1 is straightforward: the id value is taken modulo the number of partitions, resulting in a very even data distribution.

mysql:ytt_new> select max(table_rows),min(table_rows) from information_schema.partitions where table_name = 'hash_t1';
+-----------------+-----------------+
| max(table_rows) | min(table_rows) |
+-----------------+-----------------+
|            4883 |            4882 |
+-----------------+-----------------+
1 row in set (0.04 sec)

Consider the following SQL statements:

SQL 1: select count(*) from hash_t1 where id = 1;
SQL 2: select count(*) from hash_t1 where id in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15);
SQL 3: select count(*) from hash_t1 where id <= 1;
SQL 4: select count(*) from hash_t1 where id <= 15;

SQL 1 and SQL 2 are well‑suited for hash‑partitioned tables, while SQL 3 and SQL 4 are not.

Execution plan for SQL 1 shows an optimal scenario: the query reaches a single partition (p8) with a constant filter.

mysql:ytt_new> explain select count(*) from hash_t1 where id = 8\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: hash_t1
   partitions: p8
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.00 sec)

SQL 2 filters on 15 constant IDs, touching 15 partitions—a small proportion of the total, but still has room for improvement.

mysql:ytt_new> explain select count(*) from hash_t1 where id  in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: hash_t1
   partitions: p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: NULL
         rows: 15
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

SQL 3 and SQL 4 produce the same result as SQL 1 and SQL 2 but must scan all partitions, which is inefficient.

mysql:ytt_new> explain select count(*) from hash_t1 where  id <=1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: hash_t1
   partitions: p0,p1,p2,…,p1021,p1022,p1023
...

Therefore, hash partitioning is limited to equality‑based filters, similar to using a hash index on a regular table.

To reduce the number of partitions scanned by SQL 2, we can redesign the table by hashing on id DIV 1024, creating hash_t2 so that the first 1024 IDs are stored in a single partition (p0).

mysql:ytt_new> create table hash_t2 (id bigint unsigned auto_increment primary key, r1 int, log_date date) partition by hash(id div 1024) partitions 1024;
Query OK, 0 rows affected (10.54 sec)

mysql:ytt_new> load data infile '/var/lib/mysql-files/hash_sample.csv' into table hash_t2;
Query OK, 5000000 rows affected (3 min 20.11 sec)
Records: 5000000  Deleted: 0  Skipped: 0  Warnings: 0

Now SQL 2 can be satisfied by a single partition (p0):

mysql:ytt_new> explain select count(*) from hash_t2 where id in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: hash_t2
   partitions: p0
         type: range
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 8
          ref: NULL
         rows: 15
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

Hash partitioning also works well for equality queries on date columns. For example, creating hash_t3 partitioned by YEAR(log_date):

mysql:ytt_new> create table hash_t3 (id bigint unsigned , r1 int,log_date date, key idx_log_date(log_date));
Query OK, 0 rows affected (0.04 sec)

mysql:ytt_new> alter table hash_t3 partition by hash(year(log_date)) partitions 11;
Query OK, 0 rows affected (0.32 sec)

mysql:ytt_new> load data infile '/var/lib/mysql-files/hash_sample.csv' into table hash_t3;
Query OK, 5000000 rows affected (2 min 4.59 sec)
Records: 5000000  Deleted: 0  Skipped: 0  Warnings: 0

Execution plan for a date‑based equality query shows it is limited to a single partition (p7):

mysql:ytt_new> explain  select count(*) from hash_t3 where log_date = '2020-08-05'\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: hash_t3
   partitions: p7
         type: r
possible_keys: idx_log_date
          key: idx_log_date
      key_len: 4
          ref: const
         rows: 1405
     filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.00 sec)

Using YEAR may become sub‑optimal as data grows; partitioning by MONTH can provide finer granularity.

MySQL also offers a special KEY partition that uses the system function PASSWORD as the hash function, suitable for non‑integer primary keys.

If partitions need to be frequently expanded or shrunk, linear hash partitioning can be considered. Linear hash is MySQL’s implementation of consistent hashing, aiming to improve scaling performance, but it may cause uneven data distribution, hotspots, and query‑level record amplification.

Example comparison: reducing hash_t1 from 1024 to 10 partitions takes 2 min 46 sec, while a linear‑hash table hash_linear_t1 with the same reduction completes in 1 min 28 sec.

mysql:ytt_new> alter table hash_t1 coalesce partition 1014;
Query OK, 0 rows affected (2 min 46.01 sec)

mysql:ytt_new> create table hash_linear_t1 (id bigint unsigned auto_increment primary key, r1 int,log_date date) partition by linear hash(id) partitions 1024;
Query OK, 0 rows affected (34.13 sec)

mysql:ytt_new> load data infile '/var/lib/mysql-files/hash_sample.csv' into table hash_linear_t1 ;
Query OK, 5000000 rows affected (2 min 7.78 sec)

mysql:ytt_new> alter table hash_linear_t1 coalesce partition 1014;
Query OK, 0 rows affected (1 min 28.29 sec)

Data distribution inspection shows the linear‑hash table has significant hotspots compared with the regular hash table.

mysql:ytt_new> select table_rows from information_schema.partitions where table_name = 'hash_t1';
+------------+
| TABLE_ROWS |
+------------+
|     485723 |
|     537704 |
|     523017 |
|     470724 |
|     478982 |
|     512272 |
|     483190 |
|     455829 |
|     520512 |
|     461572 |
+------------+

mysql:ytt_new> select table_rows from information_schema.partitions where table_name = 'hash_linear_t1 ';
+------------+
| TABLE_ROWS |
+------------+
|     269443 |
|     340989 |
|     611739 |
|     584321 |
|     566181 |
|     624040 |
|     637801 |
|     688467 |
|     331397 |
|     317695 |
+------------+

In summary, MySQL hash partition tables are useful for equality‑based queries on IDs or dates, but they cannot be used for range queries. Careful design—such as using id DIV N or linear hash—can improve query performance and partition management.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

MySQL Database Optimization hash partition

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.