Databases 9 min read

Investigating Data Loss with gh-ost in MySQL AFTER_SYNC Semi‑Sync Replication and Applying a Fix

This article documents a reproducible test that shows gh-ost can lose rows when used on a MySQL 5.7 AFTER_SYNC semi‑synchronous replica, explains the underlying cause, and presents a source‑code modification that prevents the loss.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Investigating Data Loss with gh-ost in MySQL AFTER_SYNC Semi‑Sync Replication and Applying a Fix

Background – A recent post claimed that using gh-ost for online DDL in MySQL AFTER_SYNC mode may cause data loss. The author reproduced the issue by configuring a MySQL 5.7 primary‑secondary setup with semi‑sync replication and a 60‑second artificial delay in gh‑ost.

Environment Preparation

Clone the latest gh‑ost source (v1.1.2) with git clone https://github.com/github/gh-ost.git and build it using the provided build.sh script.

Deploy a MySQL 5.7 master‑slave cluster (1 master, 1 slave) and enable AFTER_SYNC semi‑sync replication.

Configure the master’s rpl_semi_sync_master_timeout to a value larger than the artificial delay (e.g., 120 000 ms).

Validation Steps

Insert a 60‑second sleep at the start of addDMLEventsListener in ./gh-ost-master/go/logic/migrator.go .

Set the master’s semi‑sync timeout to 120 s.

Create a test table t and insert a row (id=1).

Run gh‑ost to execute ALTER TABLE t ENGINE=InnoDB; .

Stop the slave’s IO thread to simulate a lost ACK.

Insert a second row (id=2) on the master while gh‑ost is waiting.

The DDL completes after about 120 seconds, but the newly inserted row (id=2) is missing, confirming data loss.

Principle Analysis

The loss occurs because gh‑ost reads the table’s primary‑key range before the transaction that inserted id=2 is fully committed. In AFTER_SYNC mode the master waits for an ACK from the slave; the transaction remains in the redo log until the timeout expires, so gh‑ost never sees the new key value.

Fix Implementation

A pull request adds a shared read lock and a retry mechanism when gh‑ost fetches the range. The changes are made in ./gh-ost-master/go/sql/builder.go and ./gh-ost-master/go/logic/migrator.go . After recompiling and re‑running the test with the same configuration, the second row persists, proving the fix works.

Precautions

Adjust rpl_semi_sync_master_timeout only on the master.

Set rpl_semi_sync_master_wait_no_slave=ON to ensure the master truly waits for an ACK.

When multiple slaves exist, consider rpl_semi_sync_master_wait_for_slave_count for ACK behavior.

Conclusion

The experiment confirms that gh‑ost can lose data under specific AFTER_SYNC timing conditions, but the provided source‑code fix resolves the issue, making gh‑ost safe for semi‑synchronous environments.

MySQLdatabase migrationData Lossgh-ostFixSemi-sync replicationAFTER_SYNC
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.