How ClickHouse Replicates MySQL in Real-Time: A Step-by-Step Guide
This article explains how to configure ClickHouse as a MySQL replica using the MaterializeMySQL engine, covering code acquisition, MySQL master setup, ClickHouse slave configuration, handling of delete and update operations, and the underlying replication mechanism with practical code examples.
Code acquisition
Since the feature is still in the acceptance stage, we pull the pull‑request code from GitHub.
<code>git fetch origin pull/10851/head:mysql_replica_experiment</code>Start compilation…
MySQL Master
We need a MySQL instance with binlog enabled as the master:
<code>docker run -d -e MYSQL_ROOT_PASSWORD=123 mysql:5.7 mysqld --datadir=/var/lib/mysql --server-id=1 --log-bin=/var/lib/mysql/mysql-bin.log --gtid-mode=ON --enforce-gtid-consistency</code>Create a database and table, then insert data:
<code>mysql> create database ckdb;
mysql> use ckdb;
mysql> create table t1(a int not null primary key, b int);
mysql> insert into t1 values(1,1),(2,2);
mysql> select * from t1;</code>ClickHouse Slave
Replication is performed per database; different databases can come from different MySQL masters, enabling multi‑source synchronization for OLAP analysis.
Create a replication channel:
<code>clickhouse :) CREATE DATABASE ckdb ENGINE = MaterializeMySQL('172.17.0.2:3306', 'ckdb', 'root', '123');
clickhouse :) use ckdb;
clickhouse :) show tables;
clickhouse :) select * from t1;</code>Check ClickHouse sync position:
<code>cat ckdatas/metadata/ckdb/.metadata
Version:1
Binlog File:mysql-bin.000001
Binlog Position:913
Data Version:0</code>Delete
Delete a row on the MySQL master and verify the change on the ClickHouse slave.
<code>mysql> delete from t1 where a=1;
clickhouse :) select * from t1;</code>The metadata shows that the Data Version has increased to 2:
<code>cat ckdatas/metadata/ckdb/.metadata
Version:1
Binlog File:mysql-bin.000001
Binlog Position:1171
Data Version:2</code>Update
Update rows on the MySQL master and check the result on the ClickHouse slave.
<code>mysql> update t1 set b=b+1;
clickhouse :) select * from t1;</code>Implementation mechanism
MySQL binlog events include four main types:
MYSQL_QUERY_EVENT – DDL
MYSQL_WRITE_ROWS_EVENT – insert
MYSQL_UPDATE_ROWS_EVENT – update
MYSQL_DELETE_ROWS_EVENT – delete
When a transaction commits, MySQL writes the corresponding events to the binlog. Consuming these events requires implementing the MySQL Replication Protocol.
Three main challenges for ClickHouse consuming MySQL binlog are:
DDL compatibility
Delete/Update support
Query filtering
DDL
DDL compatibility demands substantial code. Example of table conversion:
MySQL master:
<code>show create table t1\G
CREATE TABLE `t1` (
`a` int(11) NOT NULL,
`b` int(11) DEFAULT NULL,
PRIMARY KEY (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1</code>ClickHouse slave:
<code>ATTACH TABLE t1 (
`a` Int32,
`b` Nullable(Int32),
`_sign` Int8,
`_version` UInt64
) ENGINE = ReplacingMergeTree(_version)
PARTITION BY intDiv(a, 4294967)
ORDER BY tuple(a)
SETTINGS index_granularity = 8192;</code>The ClickHouse table adds two hidden columns:
_sign(‑1 for delete, 1 for insert) and
_version(data version). The engine is
ReplacingMergeTreeusing
_versionas the version column.
Update and Delete
Delete and update operations are represented in ClickHouse with the hidden columns:
<code>-- part1: initial insert
┌─a─┬─b─┬─_sign─┬─_version─┐
│ 1 │ 1 │ 1 │ 1 │
│ 2 │ 2 │ 1 │ 1 │
-- part2: delete a=1
┌─a─┬─b─┬─_sign─┬─_version─┐
│ 1 │ 1 │ -1 │ 2 │
-- part3: update b=b+1
┌─a─┬─b─┬─_sign─┬─_version─┐
│ 2 │ 3 │ 1 │ 3 │</code>A
FINALquery removes rows where
_sign = -1:
<code>select a,b,_sign,_version from t1 final;</code>Query
The
MaterializeMySQLengine filters out rows with
_sign = -1, so deleted records are not returned in query results.
Summary
Real‑time MySQL replication in ClickHouse was a 2020 upstream roadmap item that required deep knowledge of both the MySQL replication protocol and ClickHouse internals. Pull request 10851 implements an efficient binlog consumer inside ClickHouse, supports database‑level multi‑source replication, and plans to add a CRC function for data‑consistency verification.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.