Databases 8 min read

Understanding ClickHouse MergeTree Partitioning and Merge Rules

This article explains how ClickHouse's MergeTree engine creates partition directories based on a partition key, details the naming convention PartitionID_MinBlockNum_MaxBlockNum_Level, and describes the automatic and manual merge processes that consolidate partitions for efficient storage.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Understanding ClickHouse MergeTree Partitioning and Merge Rules

In a previous article the basic features, architecture, and use cases of ClickHouse were introduced; this note dives deeper into the MergeTree engine’s partitioning concept and the rules governing partition merges.

MergeTree Partition Rules

A table is created with a monthly partition key:

CREATE TABLE tab_partition(`dt` Date, `v` UInt8)
ENGINE = MergeTree PARTITION BY toYYYYMM(dt) ORDER BY v;

Sample data is inserted for several months, which triggers the creation of partition directories on each insert.

After data is written, MergeTree generates partition directories such as 202002_1_1_0 . No directories are created when only the table schema exists because there is no data.

The directory naming follows the pattern PartitionID_MinBlockNum_MaxBlockNum_Level . For example, 202002_4_4_0 where 202002 is the PartitionID, 4_4 are the minimum and maximum block numbers, and 0 indicates the current merge level.

PartitionID is derived from the partition key value. Four rules apply:

If no partition key is specified, the ID defaults to all .

For integer keys that cannot be formatted as YYYYMMDD , the integer’s string representation is used.

For date‑type keys (or integers convertible to YYYYMMDD ), the formatted date string becomes the ID.

For other types (e.g., Float, String), a 128‑bit hash of the inserted value is used as the ID.

MinBlockNum and MaxBlockNum are auto‑incrementing integers that start at 1 for each table. When a new partition directory is created, both numbers are equal; after merges, MinBlockNum becomes the smallest value among the merged parts and MaxBlockNum the largest.

Level records how many times a partition has been merged; it starts at 0 and increments by one each time a merge occurs for that partition.

MergeTree Partition Merge Rules

As data accumulates, many partition directories can appear. MergeTree, built on an LSM‑style engine, periodically merges directories belonging to the same partition (default every 10‑15 minutes) or when the OPTIMIZE TABLE statement is executed. Old parts become inactive ( active = 0 ) and are later removed by a background task (default 8 minutes).

Example directory listing before merging:

# ls
202002_1_1_0  202004_2_2_0  202009_3_3_0
202002_4_4_0  202002_5_5_0

Manual merge is triggered with:

OPTIMIZE TABLE tab_partition;

After optimization, querying system.parts shows which parts are active:

SELECT partition, name, part_type, active FROM system.parts WHERE table='tab_partition';
┌─partition─┬─name─────────┬─part_type─┬─active─┐
│ 202002    │ 202002_1_1_0 │ Wide      │      0 │
│ 202002    │ 202002_1_5_1 │ Wide      │      1 │
│ 202002    │ 202002_4_4_0 │ Wide      │      0 │
│ 202002    │ 202002_5_5_0 │ Wide      │      0 │
│ 202004    │ 202004_2_2_0 │ Wide      │      1 │
│ 202009    │ 202009_3_3_0 │ Wide      │      1 │
└───────────┴──────────────┴───────────┴────────┘

Active parts (active = 1) are the latest merged results; inactive parts are automatically ignored during queries.

The naming rule for a newly merged part is:

PartitionID remains unchanged.

MinBlockNum = smallest MinBlockNum among merged parts.

MaxBlockNum = largest MaxBlockNum among merged parts.

Level = maximum existing level plus one.

The resulting directory structure is illustrated below:

Conclusion

This note serves as a learning record for the book “ClickHouse Principles and Practical Applications”, demonstrating how MergeTree creates partition directories, the naming convention, and the basic merge process; readers are encouraged to try the steps themselves.

optimizationSQLDatabaseClickHouseData PartitioningMergeTree
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.