Databases 10 min read

How ClickHouse Replicates MySQL in Real-Time: A Step-by-Step Guide

This article explains how to configure ClickHouse as a MySQL replica using the MaterializeMySQL engine, covering code acquisition, MySQL master setup, ClickHouse slave configuration, handling of delete and update operations, and the underlying replication mechanism with practical code examples.

Efficient Ops
Efficient Ops
Efficient Ops
How ClickHouse Replicates MySQL in Real-Time: A Step-by-Step Guide

Code acquisition

Since the feature is still in the acceptance stage, we pull the pull‑request code from GitHub.

<code>git fetch origin pull/10851/head:mysql_replica_experiment</code>

Start compilation…

MySQL Master

We need a MySQL instance with binlog enabled as the master:

<code>docker run -d -e MYSQL_ROOT_PASSWORD=123 mysql:5.7 mysqld --datadir=/var/lib/mysql --server-id=1 --log-bin=/var/lib/mysql/mysql-bin.log --gtid-mode=ON --enforce-gtid-consistency</code>

Create a database and table, then insert data:

<code>mysql> create database ckdb;
mysql> use ckdb;
mysql> create table t1(a int not null primary key, b int);
mysql> insert into t1 values(1,1),(2,2);
mysql> select * from t1;</code>

ClickHouse Slave

Replication is performed per database; different databases can come from different MySQL masters, enabling multi‑source synchronization for OLAP analysis.

Create a replication channel:

<code>clickhouse :) CREATE DATABASE ckdb ENGINE = MaterializeMySQL('172.17.0.2:3306', 'ckdb', 'root', '123');
clickhouse :) use ckdb;
clickhouse :) show tables;
clickhouse :) select * from t1;</code>

Check ClickHouse sync position:

<code>cat ckdatas/metadata/ckdb/.metadata
Version:1
Binlog File:mysql-bin.000001
Binlog Position:913
Data Version:0</code>

Delete

Delete a row on the MySQL master and verify the change on the ClickHouse slave.

<code>mysql> delete from t1 where a=1;
clickhouse :) select * from t1;</code>

The metadata shows that the Data Version has increased to 2:

<code>cat ckdatas/metadata/ckdb/.metadata
Version:1
Binlog File:mysql-bin.000001
Binlog Position:1171
Data Version:2</code>

Update

Update rows on the MySQL master and check the result on the ClickHouse slave.

<code>mysql> update t1 set b=b+1;
clickhouse :) select * from t1;</code>

Implementation mechanism

MySQL binlog events include four main types:

MYSQL_QUERY_EVENT – DDL

MYSQL_WRITE_ROWS_EVENT – insert

MYSQL_UPDATE_ROWS_EVENT – update

MYSQL_DELETE_ROWS_EVENT – delete

When a transaction commits, MySQL writes the corresponding events to the binlog. Consuming these events requires implementing the MySQL Replication Protocol.

Three main challenges for ClickHouse consuming MySQL binlog are:

DDL compatibility

Delete/Update support

Query filtering

DDL

DDL compatibility demands substantial code. Example of table conversion:

MySQL master:

<code>show create table t1\G
CREATE TABLE `t1` (
  `a` int(11) NOT NULL,
  `b` int(11) DEFAULT NULL,
  PRIMARY KEY (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1</code>

ClickHouse slave:

<code>ATTACH TABLE t1 (
    `a` Int32,
    `b` Nullable(Int32),
    `_sign` Int8,
    `_version` UInt64
) ENGINE = ReplacingMergeTree(_version)
PARTITION BY intDiv(a, 4294967)
ORDER BY tuple(a)
SETTINGS index_granularity = 8192;</code>

The ClickHouse table adds two hidden columns:

_sign

(‑1 for delete, 1 for insert) and

_version

(data version). The engine is

ReplacingMergeTree

using

_version

as the version column.

Update and Delete

Delete and update operations are represented in ClickHouse with the hidden columns:

<code>-- part1: initial insert
┌─a─┬─b─┬─_sign─┬─_version─┐
│ 1 │ 1 │     1 │        1 │
│ 2 │ 2 │     1 │        1 │
-- part2: delete a=1
┌─a─┬─b─┬─_sign─┬─_version─┐
│ 1 │ 1 │    -1 │        2 │
-- part3: update b=b+1
┌─a─┬─b─┬─_sign─┬─_version─┐
│ 2 │ 3 │     1 │        3 │</code>

A

FINAL

query removes rows where

_sign = -1

:

<code>select a,b,_sign,_version from t1 final;</code>

Query

The

MaterializeMySQL

engine filters out rows with

_sign = -1

, so deleted records are not returned in query results.

Summary

Real‑time MySQL replication in ClickHouse was a 2020 upstream roadmap item that required deep knowledge of both the MySQL replication protocol and ClickHouse internals. Pull request 10851 implements an efficient binlog consumer inside ClickHouse, supports database‑level multi‑source replication, and plans to add a CRC function for data‑consistency verification.

real-timeClickHouseMySQLReplicationDatabase SyncMaterializeMySQL
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.