Evolution of ZzoNlineDDL: From 1.0 to 2.0 – Architecture, Features, and Lessons Learned
This article chronicles the development of ZzoNlineDDL's schema‑change platform from its early 1.0 version with limited capabilities to the robust 2.0 release, detailing architectural diagrams, intelligent alteration logic, partitioning, scheduling, monitoring, user interaction, problem analysis, and practical recommendations for MySQL online DDL operations.
Preface
We compare the role of a DBA managing a ferry crossing a river to developers needing schema changes, and introduce the "bridge" – the online DDL platform – that automates MySQL table alterations.
All discussion below focuses on MySQL schema changes.
1. 1.0 Era
The initial platform behaved like a narrow, fragile wooden bridge, offering limited functionality and requiring frequent DBA intervention.
1.1 Business Pain Points
Missing features such as unique index creation and column rename ( change ).
No interactive controls; tasks like terminating or monitoring required DBA assistance.
Inability to handle sharding; each shard required a separate ticket.
Error messages were opaque, forcing DBAs to investigate manually.
1.2 DBA Pain Points
Unclear error prompts caused time‑consuming troubleshooting.
Lack of interactive capabilities made progress checks cumbersome.
Manual handling of unique indexes, sharding, and other unsupported operations.
Low controllability – once a task started, only termination was possible.
1.3 Other Issues
Rapid binlog growth triggering alerts.
mdl lock problems.
Deadlocks and latency spikes.
2. 2.0 Era
After continuous upgrades, the platform reached a stable 2.0 version with richer capabilities.
Key capabilities include:
User interaction – pause, resume, terminate tasks.
Support for adding unique indexes.
Sharding (分库分表) automation.
Comprehensive monitoring (MDL, disk, load, latency).
Scheduled execution to avoid peak hours.
Fast alteration using ALTER TABLE for metadata‑only changes (e.g., VARCHAR expansion, index addition, column rename, default/value changes, MySQL 8.0 rapid column addition).
From 2023‑02‑17 to year‑end, failure rate dropped from 9.65% (1.0) to 4.25% (2.0), a 66% reduction.
Fast alteration accounted for 267 tasks (11% of total), demonstrating significant efficiency gains on large tables.
2.1 Feature Details
2.1.1 Intelligent Alteration
The platform automatically selects the optimal tool based on the DDL request: quick ALTER TABLE for metadata‑only changes, gh‑ost for complex operations, and falls back to pt‑osc only when necessary.
Fast alteration is the platform’s highlight.
[ 2024-02-06 14:15:12 ] [ 提示 ] [ 192.168.168.1 ] [ check/f_check_mode_opt.sh ] [ f_check_mode_opt:167 ] [ 开始执行 : 'main.sh' ]2.1.2 Sharding Support
For sharded projects, the platform auto‑generates tickets for each physical table, eliminating the need for dozens of manual tickets.
2.1.3 Scheduling
The scheduler respects both start and end times; if a task exceeds the user‑defined window, it is automatically paused.
2.1.4 Unique Index Handling
A hidden “extreme scenario” toggle forces users to acknowledge the risk of data loss when adding unique indexes; by default the platform disallows submission unless the user explicitly confirms.
For pt‑osc , use pt‑osc --check-unique-key-change to disable unique‑index addition.
2.1.5 User Interaction
Progress bars, estimated remaining time, and notifications via WeChat and email keep users informed. Users can pause, terminate, or resume tasks directly from the UI, with state changes reflected in the backend.
Only the ticket creator or DBA can perform these actions.
2.1.6 Monitoring & Alerts
MDL monitoring runs in a child thread; disk space, latency, and load are continuously checked. If thresholds are breached, the platform throttles or pauses the operation.
[ 2024-02-06 14:18:36 ] [ 成功 ] [ 192.168.1.1 ] [ `DBA`db/f_update_status_for_task.sh ] [ f_update_status_for_task:59 ] [ '192.168.1.2:3306' 改表修改chunk_size值, 从 '1500' 改为 '500' 完成 ]Disk monitoring ensures at least twice the target table size is available; otherwise the task fails.
Latency monitoring aborts the task if replica lag exceeds configurable limits (default 10 s, with a 600 s grace period).
Load monitoring checks active connections; sustained high load triggers a 30 s wait and eventual termination if not resolved.
2.1.7 Weak Atomicity
For multi‑statement tickets, the platform pre‑runs all statements in a test environment to approximate atomicity, then executes them sequentially in production.
2.1.8 Retry Mechanism
Failed tickets can be retried without re‑executing already successful statements, but scheduled tickets cannot use this feature and fast‑alter paths are bypassed.
2.1.9 Cleanup
After gh‑ost or pt‑osc runs, temporary tables and triggers are removed to avoid residual side effects.
2.2 Problem Analysis
2.2.1 SQL Parsing
The platform parses ALTER TABLE statements, stripping comments, ignoring parenthesized sections, and applying safety checks (e.g., disallowing unique‑index addition with other operations).
A 10‑second timeout acts as a safety net for mis‑parsed fast‑alter attempts.
2.2.2 VARCHAR Expansion
Because VARCHAR stores length in 1–2 bytes, expanding its size may require data copy if the column is indexed or under certain undocumented conditions.
2.2.3 Unique Index Risks
Unique indexes do not benefit from the change buffer, potentially degrading write performance; the platform recommends using ordinary indexes or gh‑ost hooks for large tables.
3. Summary
The article outlines the journey from a manual, error‑prone 1.0 tool to a mature 2.0 platform that automates MySQL schema changes, provides interactive controls, robust monitoring, and self‑healing mechanisms, ultimately delivering measurable reductions in failure rates and operational overhead.
While challenges such as deadlocks, MDL locks, and VARCHAR quirks remain, the platform now serves as an intelligent assistant for DBAs handling routine alterations.
Zhuanzhuan Tech
A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.