TiDB on 360 Cloud Platform: Deployment, Migration, and Performance Tuning
This article shares the experience of deploying TiDB on the 360 Cloud Platform, covering background challenges with massive tables, TiDB's online DDL and high‑performance features, cluster architecture, DM migration workflow, common operational issues, and detailed tuning parameters to improve latency, QPS, and resource utilization.
When faced with tables containing billions of rows, traditional MySQL operations such as adding columns or indexes become painfully slow, often taking days; TiDB eliminates these bottlenecks by offering native online DDL, near‑instant column additions, and fast COUNT(*) queries.
360 Cloud Platform provides database services for multiple business lines, including MySQL, Redis, MongoDB, ES, and TiDB. Growing data volumes exposed MySQL's storage limits, prompting an evaluation of TiDB as a replacement. After testing, three TiDB clusters (over 50 nodes) were deployed, supporting nine business lines and handling tables exceeding 100 billion rows with total data volume around 80 TB.
Cluster Architecture
The TiDB deployment includes the TiDB cluster itself plus DM, Pump, and Drainer components. Two main workload types are supported: (1) legacy MySQL workloads that require high availability and 7×24 backup, migrated using DM (1 TB import takes ~16 hours, TiDB‑Lightning can reduce this to ~37 minutes for 54 GB), and (2) brand‑new services that benefit from TiDB's ACID guarantees and distributed nature.
DM Migration Details
DM requires specific privileges. The official manual lists SELECT , UPDATE , ALTER , CREATE , and DROP . In practice, additional permissions are needed:
Upstream: REPLICATION SLAVE (required for incremental sync)
Downstream: SUPER (required for checksum execution)
Configuration example for purge of relay‑log files:
[purge]
interval = 3600
expires = 7
remain-space = 15During migration, issues such as skipped events, primary‑key conflicts (error 1062), and failures when dropping partitions or indexed columns were observed. Example of a skipped event log:
["skip event"] [task=task_20357] [unit="binlog replication"] [event=query] [statement="ALTER TABLE `360`.`_data_201910_gho` ADD INDEX `idx_rawurl_md5`(`raw_url_md5`)"]To mitigate large‑scale Region balance impact, PD parameters can be tuned in real time via pd-ctl :
high-space-ratio 0.7 # ignore space when < 70% free
region-schedule-limit 8 # max concurrent region schedules
merge-schedule-limit 12 # max concurrent merges
leader-schedule-limit 8 # max concurrent leader schedules
max-merge-region-keys 50000 # stop merging when key count > 50k
max-merge-region-size 16 # stop merging when size > 16 MiBOther tuning knobs that improved performance include reducing tidb_ddl_reorg_worker_cnt and tidb_ddl_reorg_batch_size , adjusting RocksDB background jobs, and enabling raftstore.hibernate-regions to suppress idle Raft heartbeats.
Common operational pain points such as high disk usage (>50 %), large‑scale INSERT … SELECT causing CPU spikes, enum type limitations, and missing partition metadata views were documented, along with suggested work‑arounds and upcoming product requirements.
Future Outlook
The team plans to integrate TiDB into the 360 HULK cloud platform, continue upgrading to newer TiDB versions (3.1, 4.0), and expand features like partition management and improved load‑data performance. Appreciation is given to PingCAP for rapid support throughout the migration journey.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.