Databases 10 min read

Schema Change for a 400‑Million‑Row MySQL Table Using Cascade Replication

This article details a step‑by‑step strategy for expanding a VARCHAR column from 50 to 500 characters on a 4‑hundred‑million‑row MySQL 8.0 table, evaluating online DDL, gh‑ost, pt‑osc, and ultimately implementing a cascade replication workflow to meet a four‑hour downtime window.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Schema Change for a 400‑Million‑Row MySQL Table Using Cascade Replication

Background : A customer required expanding the open_id column (VARCHAR(50) → VARCHAR(500)) on a ~4 billion‑row table while limiting impact on the primary MySQL 8.0.22 instance to a maximum four‑hour outage.

Table Information : The table occupies ~280 GB of .ibd file space; row count exceeds 400 million. Standard SHOW TABLE STATUS and information_schema queries are provided to verify size.

Solution Options :

Method

Pros

Cons

Feasibility

OnlineDDL

Native, uses temporary table

ALGORITHM=COPY blocks DML; requires MySQL > 5.7

★★★★★

Gh‑ost

Uses binlog replay thread instead of triggers

Third‑party; may take long for large tables

★★★★

Pt‑osc

Version‑compatible, uses triggers for consistency

Third‑party with many restrictions

★★★

M‑S1‑S2 cascade

Predictable timing

Requires manual cascade replication steps

The team rejected the first three methods because the table is constantly busy (24 h traffic) and cannot tolerate the uncertain DDL window of OnlineDDL or the long execution time of gh‑ost (tested 70 hours on a 400 M row table). The chosen approach is a cascade replication (M‑S1‑S2) that isolates the change to a downstream replica.

Operational Steps :

Create a new replica S1 and establish M‑S1‑S2 cascade replication.

Run OnlineDDL on S2 to modify the column (no impact on M‑S1).

After the schema change, allow S2 to catch up with M‑S1‑S2.

Remove S1 and promote S2 to be the primary (M‑S2).

Backup S2, restore to S1, and rebuild M‑S2‑S1 cascade.

Stop the application, ensure master‑slave consistency, then resume service.

Finalize S2 as the new master and S1 as the replica; update application connection strings.

Run regression verification.

Key configuration commands (wrapped in ... ) include enabling GTID, adjusting binlog_expire_logs_seconds , setting innodb_buffer_pool_size , sync_binlog , and various timeout parameters to accommodate the large DDL operation.

show global variables like 'binlog_expire_logs_seconds';
set global binlog_expire_logs_seconds=1209600;
... (additional MySQL settings and ALTER TABLE command) ...
SELECT EVENT_NAME, WORK_COMPLETED, WORK_ESTIMATED FROM performance_schema.events_stages_current;

Additional Scenarios : Directly altering the primary on a NVMe disk takes ~5 hours, while on a mechanical VM it can exceed 40 hours, illustrating the importance of replica‑based changes.

Conclusion :

Cascade replication isolates the workload and limits downtime to application‑side changes and verification.

OnlineDDL can be tuned for faster execution but may affect data safety.

Gh‑ost is suitable when longer change windows are acceptable.

Avoid running multiple DDLs concurrently on large tables.

Select the most appropriate method based on the specific business constraints rather than chasing the optimal solution.

MySQLonline DDLgh-ostschema changept-osccascade replication
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.