Practical Experience and Tips with TiDB Data Migration (DM) Tool
This article shares a comprehensive overview of TiDB Data Migration (DM), covering its architecture, configuration, online DDL support, common pitfalls such as duplicate‑key errors, large‑scale import tuning, version limits, and cleanup recommendations to help DBAs efficiently migrate MySQL/MariaDB workloads to TiDB.
Background – Early MySQL‑to‑TiDB sync used mydumper+loader for full backups and syncer for incremental binlog replication, which required many configuration files. PingCAP later released TiDB Data Migration (DM), a unified platform that simplifies full and incremental data migration from MySQL/MariaDB to TiDB.
Reason – The author has used DM since its internal testing version up to 1.0.6, finding it essential for DBAs because most TiDB deployments involve migrating existing MySQL data for performance testing and comparison.
Architecture – DM consists of three core components: DM‑master (task management), DM‑worker (execution), and dmctl (command‑line control). The architecture diagram (image) illustrates their interaction.
Cluster Configuration
集群版本:v3.0.5
集群配置:普通SSD磁盘,128G内存,40 核cpu
tidb21 TiDB/PD/pump/prometheus/grafana/CCS
... (list of nodes) ...Official guidance recommends deploying DM on dedicated machines with one DM‑worker per node unless the hardware exceeds TiDB’s recommended CPU/memory and provides multiple disks.
DM Features
Table routing and merge migration
Whitelist/blacklist for tables
Binlog event filtering
Shard support for merging tables
New Feature (v1.0.5) – Online DDL support, enabling tools like pt‑online‑schema‑change and gh‑ost to add columns without downtime. The article shows a concrete example where a missing column caused a binlog‑skip error and how to resolve it by adding the column and using sql‑skip .
skip event because not in whitelist
... (error logs) ...After adding the column and skipping the problematic binlog position, the task resumes successfully.
Pitfall (Error 1062) – Duplicate‑key errors can interrupt sync. The author describes troubleshooting steps: querying MVCC keys, checking PD TSO timestamps, and correlating worker logs to pinpoint the duplicate entry, ultimately fixing the issue by switching to REPLACE INTO statements.
Large‑Scale Import Tuning – During massive DM imports, cluster latency spikes. Adjusting TiDB parameters such as raftstore.apply-pool-size , store-pool-size , scheduler-worker-pool-size , grpc-concurrency , and RocksDB background jobs mitigates the impact.
raftstore:
apply-pool-size: 3-4
store-pool-size: 3-4
storage:
scheduler-worker-pool-size: 4-6
server:
grpc-concurrency: 4-6
rocksdb:
max-background-jobs: 8-10
max-sub-compactions: 1-2Limitations
Supported MySQL versions: 5.5 < MySQL < 8.0; MariaDB ≥ 10.1.2
Only DDL syntax supported by TiDB parser
Upstream must enable binlog with binlog_format=ROW
DM does not support dropping multiple partitions at once or dropping indexed columns directly
DM‑portal – Provides UI for task generation but initially lacked full‑database regex matching, causing temporary tables from online DDL tools to be ignored; this was fixed in version 1.0.5.
DM‑worker Cleanup
[purge]
interval = 3600
expires = 7
remain-space = 15Configuring expires removes relay logs after the specified days, preventing disk‑space exhaustion.
Conclusion – The author reflects on the journey from a TiDB newcomer to a core contributor, emphasizing the importance of sharing technical knowledge and the practical benefits of DM for large‑scale data migration and cluster stability.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.