Big Data 20 min read

Quick Guide to Deploying Alibaba Canal for Real‑Time MySQL Binlog Synchronization with Kafka and Zookeeper

This article provides a step‑by‑step tutorial on building a small‑scale data platform by installing MySQL, Zookeeper, Kafka and the open‑source Canal middleware, configuring Canal to capture MySQL binlog events, and forwarding the structured data to Kafka for downstream processing.

Architecture Digest

Mar 15, 2020

Quick Guide to Deploying Alibaba Canal for Real‑Time MySQL Binlog Synchronization with Kafka and Zookeeper

Premise

The author’s business system architecture is complete, but the data layer is weak; the goal is to build a lightweight data platform that can near‑real‑time sync MySQL changes (insert, update, delete) to another data source, clean the data, and create a model for analytics and tagging. After evaluating options, the author chose Alibaba’s open‑source Canal middleware.

About Canal

Introduction

Canal (pronounced “kə'næl”) is a tool for parsing MySQL binary logs and providing incremental data subscription and consumption. It originally simulated a MySQL slave to fetch changes via binlog, and later evolved into a generic binlog subscription framework supporting many downstream systems.

Working Principle

Canal mimics a MySQL slave: it sends a dump request to the master, receives the master’s binary log stream, parses the byte‑stream into events, and forwards them to configured connectors such as TCP, Kafka or RocketMQ.

Versions and Components

At the time of writing (2020‑03‑05) the latest stable release was v1.1.4. The stable version includes three core components:

canal‑admin – web UI for managing Canal instances.

canal‑adapter – adapters for persisting data to MySQL, HBase, Elasticsearch, etc.

canal‑deployer – the core service that parses binlog and sends messages to connectors.

Usually only canal‑deployer is required; the other two are optional.

Required Middleware for Deployment

Install MySQL

Using the official yum repository on CentOS 7:

cd /data/mysql
wget https://dev.mysql.com/get/mysql80-community-release-el7-3.noarch.rpm
sudo rpm -Uvh mysql80-community-release-el7-3.noarch.rpm
sudo yum install mysql-community-server
service mysqld start
# Find temporary root password
cat /var/log/mysqld.log
# Change root password and allow remote access
ALTER USER 'root'@'localhost' IDENTIFIED BY 'QWqw12!@';
UPDATE USER SET HOST='%' WHERE USER='root';
GRANT ALL PRIVILEGES ON *.* TO 'root'@'%';
ALTER USER 'root'@'%' IDENTIFIED WITH mysql_native_password BY 'QWqw12!@';
# Create a test database
CREATE DATABASE `test` CHARSET `utf8mb4` COLLATE `utf8mb4_unicode_ci`;
# Create a Canal user for replication
CREATE USER canal IDENTIFIED BY 'QWqw12!@';
GRANT SELECT, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'canal'@'%';
FLUSH PRIVILEGES;

Install Zookeeper

Download and extract version 3.6.0, then configure dataDir in zoo.cfg and start:

mkdir -p /data/zk/data
cd /data/zk
wget http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.6.0/apache-zookeeper-3.6.0-bin.tar.gz
tar -zxvf apache-zookeeper-3.6.0-bin.tar.gz
cd apache-zookeeper-3.6.0-bin/conf
cp zoo_sample.cfg zoo.cfg
# edit zoo.cfg: set dataDir=/data/zk/data
sh /data/zk/apache-zookeeper-3.6.0-bin/bin/zkServer.sh start

Install Kafka

Kafka 2.4.0 (Scala 2.13) is used. After extraction, adjust log.dirs to /data/kafka/data and start the broker:

mkdir -p /data/kafka/data
wget https://mirrors.tuna.tsinghua.edu.cn/apache/kafka/2.4.0/kafka_2.13-2.4.0.tgz
tar -zxvf kafka_2.13-2.4.0.tgz
# edit config/server.properties: set log.dirs=/data/kafka/data
sh /data/kafka/kafka_2.13-2.4.0/bin/kafka-server-start.sh -daemon /data/kafka/kafka_2.13-2.4.0/config/server.properties

Install and Configure Canal

Download the stable v1.1.4 deployer package and unpack:

mkdir /data/canal && cd /data/canal
wget https://github.com/alibaba/canal/releases/download/canal-1.1.4/canal.deployer-1.1.4.tar.gz
tar -zxvf canal.deployer-1.1.4.tar.gz

Key configuration files: canal.properties – set canal.serverMode=kafka and canal.mq.servers=127.0.0.1:9092. instance.properties – configure MySQL connection, slave ID, default database, and Kafka topic/partition (e.g., canal.mq.topic=test, canal.mq.partition=0).

Start the service:

sh /data/canal/bin/startup.sh
# tail logs
tail -f /data/canal/logs/canal/canal.log
tail -f /data/canal/logs/example/example.log

Verification

Create a table in the test database and perform DML operations:

use `test`;
CREATE TABLE `order` (
  id BIGINT UNIQUE PRIMARY KEY AUTO_INCREMENT COMMENT '主键',
  order_id VARCHAR(64) NOT NULL COMMENT '订单ID',
  amount DECIMAL(10,2) NOT NULL DEFAULT 0 COMMENT '订单金额',
  create_time DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
  UNIQUE uniq_order_id (`order_id`)
) COMMENT '订单表';
INSERT INTO `order`(order_id, amount) VALUES ('10086', 999);
UPDATE `order` SET amount = 10087 WHERE order_id = '10086';
DELETE FROM `order` WHERE order_id = '10086';

Consume the generated events from Kafka:

sh /data/kafka/kafka_2.13-2.4.0/bin/kafka-console-consumer.sh --bootstrap-server 127.0.0.1:9092 --from-beginning --topic test

The console shows JSON messages representing the CREATE, INSERT, UPDATE and DELETE events, confirming that Canal successfully captures MySQL binlog changes and forwards them to Kafka.

Conclusion

The guide demonstrates that deploying Canal is straightforward once the supporting middleware (MySQL, Zookeeper, Kafka) is in place; only a few configuration items need adjustment, resulting in low operational and learning costs. Future work will cover ELT processing of the structured binlog events and high‑availability clustering for production use.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data pipeline ZooKeeper Kafka mysql Canal real-time-sync

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.