Introducing DTLE: An Open‑Source MySQL Data Transfer Middleware for CDC, Replication, and Cloud Synchronization
The article presents DTLE, an open‑source MySQL data‑transfer middleware that extends replication capabilities with high‑performance CDC, multi‑topology support, cloud‑to‑cloud synchronization, and robust cluster management, while comparing it with other open‑source solutions and showcasing real‑world demos.
Overview
This talk, originally delivered by Hong Bin at the 3306π technical meetup in Wuhan, introduces DTLE (Data‑Transformation‑le), an open‑source CDC tool released on Programmer's Day (Oct 24) that aims to address the limitations of MySQL replication for heterogeneous data‑store environments.
MySQL Replication Recap
MySQL replication works by streaming binlog events from a primary instance to a replica, where the replica's I/O thread writes events to a relay log and the SQL thread replays them. While widely used for high‑availability and read‑write splitting, it suffers from insufficient filtering (only database/table level), high storage overhead, limited topology flexibility, and is primarily designed for HA rather than complex data‑migration scenarios.
DTLE Core Scenarios
DTLE targets several use cases that go beyond traditional replication: remote multi‑active deployments, data aggregation and distribution across databases, real‑time data subscription via Kafka, and online data migration with minimal downtime.
Design Principles
DTLE is built around two key principles: ease of use (simple deployment without external dependencies) and reliability (distributed architecture with automatic failover and metadata consistency).
Architecture
DTLE consists of two process roles: a Manager that stores metadata, receives and dispatches jobs, and monitors agents; and Agents that handle binlog extraction, filtering, compression, transmission, and replay. Jobs are defined in JSON and submitted via HTTP to the Manager, which assigns them to available agents.
Cluster Mechanism
Multiple Manager nodes form a Raft‑based consensus group for metadata replication and leader election. Worker agents report health status, and the leader reassigns tasks if an agent becomes unresponsive.
Supported Topologies
DTLE supports various topologies, including simple 1‑to‑1 sync, n‑to‑1 aggregation, and 1‑to‑n distribution, as well as cross‑data‑center bidirectional sync with link compression to reduce bandwidth usage.
Technology Stack
The system is implemented in Go and leverages open‑source components such as HashiCorp Nomad (cluster scheduling), Consul (distributed KV store), Serf (gossip‑based node health detection), and NATS (lightweight messaging).
Key Features
Clustered deployment with automatic failover
Binlog and SQL replay modes
Parallel replay using MySQL 5.7 logical timestamps
Incremental checkpointing
Full and incremental sync
Database, table, and row‑level filtering
Link compression and cross‑network support
Automatic table creation and DDL handling
Limitations
Supports only MySQL 5.6/5.7 (InnoDB)
Requires GTID and specific binlog settings
Limited character set support
No trigger or custom authentication support
Comparison with Similar Tools
Compared with Debezium, StreamSets, and Otter, DTLE offers full‑load plus incremental sync, global metadata consistency without global locks, multi‑level data filtering, bidirectional GTID tracking, and both single‑node and clustered deployment options.
Demo and Cloud‑Sync Case
Demo scripts showcase one‑way sync, table‑level aggregation, data distribution, and cross‑IDC bidirectional replication. A cloud‑sync benchmark synchronizes ~1 billion rows between Alibaba Cloud RDS and JD Cloud RDS across regions, achieving >1000 rows/s after a 5‑hour full load.
Resources
GitHub repositories for DTLE source code, demo scripts, and PPT slides are provided, along with an invitation to join the DTLE technical community for support.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.