Debezium: Open‑Source Change Data Capture Platform – Overview, Architecture, Use Cases, and Installation Guide
This article introduces Debezium, an open‑source low‑latency change data capture platform that streams database row changes via Kafka, explains its architecture and common scenarios such as cache invalidation and CQRS, and provides step‑by‑step Docker commands to install ZooKeeper, Kafka, MySQL and the Debezium connector.
Introduction
Debezium is an open‑source, low‑latency data‑flow platform that provides Change Data Capture (CDC) capabilities. By monitoring a database, applications can receive an event for every row change that has been committed, eliminating concerns about transactions or rollbacks. Debezium offers a unified model for all change events, abstracting away the complexities of individual DBMSs.
Debezium also persists change history in logs, allowing applications to stop and restart at any time while still receiving any events missed during downtime.
Monitoring databases and reacting to changes is complex; traditional triggers are limited to certain databases and usually only affect the same database. Various databases expose different APIs, lacking a standard approach, and implementing reliable, ordered change streams without impacting the source database is challenging.
Debezium provides modules that handle these tasks. Some modules are generic and work across multiple DBMSs, while others are specialized for specific systems, offering richer functionality and better use of native features.
Basic Information
Infrastructure
Debezium leverages Kafka and Kafka Connect for persistence, reliability, and fault tolerance. Each connector deployed in Kafka Connect monitors an upstream database, captures all changes, and writes them to one or more Kafka topics (typically one topic per table). Kafka ensures multiple replicas and overall ordering (ordering is guaranteed only within a single partition of a topic). This design allows many clients to consume the same change events with minimal impact on the source database.
For applications that do not require the full fault‑tolerance and scalability of Kafka, Debezium also offers an embedded connector engine that runs inside the application process, delivering change events directly without persisting them to Kafka.
Common Use Cases
Cache Invalidation
When source data changes, cache entries can be immediately invalidated. If the cache runs in a separate process (e.g., Redis, Memcached, Infinispan), simple invalidation logic can be placed in that process, simplifying the main application.
Simplifying Monoliths
Applications often perform dual‑writes after committing database changes (e.g., updating search indexes, sending notifications). Using CDC, these post‑commit actions can be handled by independent services, improving fault tolerance and scalability.
Shared Database
When multiple applications share a database, CDC allows each to monitor changes directly without a message bus, ensuring all services stay in sync.
Data Integration
Data stored in multiple systems can be synchronized using Debezium combined with simple event‑processing logic, providing an ETL‑like solution.
Command‑Query Responsibility Segregation (CQRS)
In CQRS architectures, write and read models differ. Debezium captures write‑side changes and streams them to update read‑side views, making reliable, ordered processing feasible.
Installation
Debezium requires three independent services: ZooKeeper, Kafka, and the Debezium connector service. The official recommendation is to use Docker; the examples below use MySQL as the source database.
Start ZooKeeper
$ docker run -it --rm --name zookeeper -p 2181:2181 -p 2888:2888 -p 3888:3888 quay.io/debezium/zookeeper:1.9For Podman:
$ sudo podman pod create --name=dbz --publish "9092,3306,8083"
$ sudo podman run -it --rm --name zookeeper --pod dbz quay.io/debezium/zookeeper:1.9Start Kafka
$ docker run -it --rm --name kafka -p 9092:9092 --link zookeeper:zookeeper quay.io/debezium/kafka:1.9For Podman:
$ sudo podman run -it --rm --name kafka --pod dbz quay.io/debezium/kafka:1.9Start MySQL
The container runs a pre‑configured MySQL server with an inventory database:
$ docker run -it --rm --name mysql -p 3306:3306 -e MYSQL_ROOT_PASSWORD=debezium -e MYSQL_USER=mysqluser -e MYSQL_PASSWORD=mysqlpw quay.io/debezium/example-mysql:1.9For Podman:
$ sudo podman run -it --rm --name mysql --pod dbz -e MYSQL_ROOT_PASSWORD=debezium -e MYSQL_USER=mysqluser -e MYSQL_PASSWORD=mysqlpw quay.io/debezium/example-mysql:1.9Start Kafka Connector
The connector service exposes a REST API for managing Debezium MySQL connectors:
$ docker run -it --rm --name connect -p 8083:8083 -e GROUP_ID=1 -e CONFIG_STORAGE_TOPIC=my_connect_configs -e OFFSET_STORAGE_TOPIC=my_connect_offsets -e STATUS_STORAGE_TOPIC=my_connect_statuses --link kafka:kafka --link mysql:mysql quay.io/debezium/connect:1.9For Podman:
$ sudo podman run -it --rm --name connect --pod dbz -e GROUP_ID=1 -e CONFIG_STORAGE_TOPIC=my_connect_configs -e OFFSET_STORAGE_TOPIC=my_connect_offsets -e STATUS_STORAGE_TOPIC=my_connect_statuses quay.io/debezium/connect:1.9Register MySQL Connector
Registering the Debezium MySQL connector starts monitoring the MySQL binlog and emits change events for each committed transaction.
{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"tasks.max": "1",
"database.hostname": "mysql",
"database.port": "3306",
"database.user": "debezium",
"database.password": "dbz",
"database.server.id": "184054",
"database.server.name": "dbserver1",
"database.include.list": "inventory",
"database.history.kafka.bootstrap.servers": "kafka:9092",
"database.history.kafka.topic": "schema-changes.inventory"
}
}Register via curl:
$ curl -i -X POST -H "Accept:application/json" -H "Content-Type:application/json" localhost:8083/connectors/ -d '{ "name": "inventory-connector", "config": { "connector.class": "io.debezium.connector.mysql.MySqlConnector", "tasks.max": "1", "database.hostname": "mysql", "database.port": "3306", "database.user": "debezium", "database.password": "dbz", "database.server.id": "184054", "database.server.name": "dbserver1", "database.include.list": "inventory", "database.history.kafka.bootstrap.servers": "kafka:9092", "database.history.kafka.topic": "dbhistory.inventory" } }'Update Database and View Change Events
Use the watch-topic tool to observe the dbserver1.inventory.customers topic:
$ docker run -it --rm --name watcher --link zookeeper:zookeeper --link kafka:kafka quay.io/debezium/kafka:1.9 watch-topic -a -k dbserver1.inventory.customersExecute a change in the MySQL client: mysql> UPDATE customers SET first_name='Anne Marie' WHERE id=1004;
Verify the update: mysql> SELECT * FROM customers;
Observe the change event in the watcher terminal. The event payload contains before and after structures, allowing you to see exactly what was modified.
Sample event payload:
{
"schema": { ... },
"payload": {
"before": { "id": 1004, "first_name": "Anne", "last_name": "Kretchmar", "email": "[email protected]" },
"after": { "id": 1004, "first_name": "Anne Marie", "last_name": "Kretchmar", "email": "[email protected]" },
"source": { "name": "1.9.5.Final", "name": "dbserver1", "server_id": 223344, "ts_sec": 1486501486, "file": "mysql-bin.000003", "pos": 364, "db": "inventory", "table": "customers" },
"op": "u",
"ts_ms": 1486501486308
}
}IT Services Circle
Delivering cutting-edge internet insights and practical learning resources. We're a passionate and principled IT media platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.