MySQL 5.7 GTID Replication Issue: Slave Has More GTIDs Than Master – Analysis and GDB Reproduction
The article analyzes a MySQL 5.7 replication bug where, after consecutive crashes, the slave ends up with more GTIDs than the master because the last binlog’s GTIDs are not persisted, and demonstrates the problem using GDB debugging and step‑by‑step reproduction.
Problem Description
In a production environment running MySQL 5.7.26, a rare scenario occurs when the primary server experiences two crashes in quick succession: during the slave’s re‑initialization the replication process fails with the error Slave has more GTIDs than the master has . Log excerpts show the slave’s GTID set exceeding the master’s by hundreds of thousands.
Root Cause Analysis
The issue stems from MySQL’s GTID persistence mechanism. The gtid_executed variable holds the in‑memory GTID set, while the mysql.gtid_executed table stores the persisted GTID set only when a binlog rotation occurs. During a crash‑recovery, if a second crash happens before the last binlog’s GTIDs are written to the table, those GTIDs are lost, causing the slave to appear ahead of the master.
GTID Persistence Principle
When MySQL starts, it initializes gtid_executed by reading persisted GTIDs from mysql.gtid_executed and merging them with GTIDs from the last binlog that have not yet been persisted.
Therefore, if the last binlog is not persisted before a second crash, the master’s GTID set becomes smaller than the slave’s.
GDB Debugging Reproduction
The problem was reproduced on MySQL 5.7.26 using GDB:
Start MySQL 5.7.26 with the debug binary.
Create a test database and generate a GTID; the initial binlog mysql-bin.000001 is not yet persisted.
Kill the MySQL process (simulating OOM).
Restart MySQL under GDB and set breakpoints at MYSQL_BIN_LOG::open_binlog (new binlog creation) and Gtid_state::save (GTID persistence).
Observe that after the first breakpoint a new binlog is created but GTIDs from the previous binlog have not been persisted.
Continue execution; the second breakpoint shows the GTID persistence step has not completed.
Terminate MySQL before Gtid_state::save finishes, leaving the new binlog’s GTIDs unpersisted.
Restart MySQL normally; the missing GTIDs are never read, so the slave ends up with more GTIDs than the master.
#mysql5.7.26 crash startup flow
|main
|mysqld_main
|ha_recover #mysqld.cc:4256 recovery process
|open_binlog #mysqld.cc:4282 generate new binlog
|Gtid_state::save #mysqld.cc:4870 read last binlog GTID into mysql.gtid_executed
|Gtid_table_persistor::save
|Gtid_table_persistor::write_rowThe same steps reproduce the issue on MySQL 5.7.36 and 5.7.44, while MySQL 8.0 does not exhibit the bug due to improved GTID persistence.
Conclusion
In MySQL 5.7, if a crash occurs during the recovery phase and another crash follows before the last binlog’s GTIDs are persisted, those GTIDs are lost, leading to a situation where the slave’s GTID set is larger than the master’s.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.