Schema Change for a 400‑Million‑Row MySQL Table Using Cascade Replication
This article details a step‑by‑step strategy for expanding a VARCHAR column from 50 to 500 characters on a 4‑hundred‑million‑row MySQL 8.0 table, evaluating online DDL, gh‑ost, pt‑osc, and ultimately implementing a cascade replication workflow to meet a four‑hour downtime window.
Background : A customer required expanding the open_id column (VARCHAR(50) → VARCHAR(500)) on a ~4 billion‑row table while limiting impact on the primary MySQL 8.0.22 instance to a maximum four‑hour outage.
Table Information : The table occupies ~280 GB of .ibd file space; row count exceeds 400 million. Standard SHOW TABLE STATUS and information_schema queries are provided to verify size.
Solution Options :
Method
Pros
Cons
Feasibility
OnlineDDL
Native, uses temporary table
ALGORITHM=COPY blocks DML; requires MySQL > 5.7
★★★★★
Gh‑ost
Uses binlog replay thread instead of triggers
Third‑party; may take long for large tables
★★★★
Pt‑osc
Version‑compatible, uses triggers for consistency
Third‑party with many restrictions
★★★
M‑S1‑S2 cascade
Predictable timing
Requires manual cascade replication steps
★
The team rejected the first three methods because the table is constantly busy (24 h traffic) and cannot tolerate the uncertain DDL window of OnlineDDL or the long execution time of gh‑ost (tested 70 hours on a 400 M row table). The chosen approach is a cascade replication (M‑S1‑S2) that isolates the change to a downstream replica.
Operational Steps :
Create a new replica S1 and establish M‑S1‑S2 cascade replication.
Run OnlineDDL on S2 to modify the column (no impact on M‑S1).
After the schema change, allow S2 to catch up with M‑S1‑S2.
Remove S1 and promote S2 to be the primary (M‑S2).
Backup S2, restore to S1, and rebuild M‑S2‑S1 cascade.
Stop the application, ensure master‑slave consistency, then resume service.
Finalize S2 as the new master and S1 as the replica; update application connection strings.
Run regression verification.
Key configuration commands (wrapped in ... ) include enabling GTID, adjusting binlog_expire_logs_seconds , setting innodb_buffer_pool_size , sync_binlog , and various timeout parameters to accommodate the large DDL operation.
show global variables like 'binlog_expire_logs_seconds';
set global binlog_expire_logs_seconds=1209600;
... (additional MySQL settings and ALTER TABLE command) ...
SELECT EVENT_NAME, WORK_COMPLETED, WORK_ESTIMATED FROM performance_schema.events_stages_current;Additional Scenarios : Directly altering the primary on a NVMe disk takes ~5 hours, while on a mechanical VM it can exceed 40 hours, illustrating the importance of replica‑based changes.
Conclusion :
Cascade replication isolates the workload and limits downtime to application‑side changes and verification.
OnlineDDL can be tuned for faster execution but may affect data safety.
Gh‑ost is suitable when longer change windows are acceptable.
Avoid running multiple DDLs concurrently on large tables.
Select the most appropriate method based on the specific business constraints rather than chasing the optimal solution.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.