Analysis and Resolution of MySQL Slave Replication Deadlock Caused by Global Read Lock
The article investigates a MySQL replication issue where a continuously rising master‑slave lag was traced to a global read lock held by mysqldump, leading to a deadlock between the SQL thread and the stop‑slave operation, and explains how killing the lock‑waiting thread restores normal replication.
In a production environment an alarm indicated that the master‑slave replication delay was steadily increasing. Inspection of the replica showed the SQL thread stuck in waiting for global read lock . Further investigation revealed two problematic processes: a global read‑lock holder (identified as a long‑running mysqldump ) and a kill slave operation that was waiting for the same lock, creating a special deadlock situation.
Performance schema was not enabled, so lock details could not be observed directly. Enabling performance_schema and querying metadata_locks would expose the lock records.
The deadlock consisted of three locks that could not be released by ordinary UNLOCK TABLES commands. The fix was to kill the thread waiting for the global read lock, which immediately released the lock chain and allowed the replica to recover, after which the lingering mysqldump process disappeared.
Review: Why do STOP SLAVE and FLUSH TABLES WITH READ LOCK cause a deadlock?
During a mysqldump with --master-data , --single-transaction and FLUSH LOGS , the dump process acquires a global read lock and refreshes logs. Simultaneously, STOP SLAVE waits for the I/O thread to finish and releases LOCK_msp_map and master_info . Because both actions are issued by the same backup process, the SQL thread holds mi->stop_cond while FLUSH LOGS attempts to acquire a system lock, causing both sides to wait on each other and forming an infinite deadlock.
Reproduction Scenario
Using gdb to dump thread stacks of the MySQL process reveals that the stop‑slave thread is blocked on pthread_cond_timedwait while the SQL thread holds the lock needed by the dump operation. The stack traces show the functions terminate_slave_thread and stop_slave waiting on mi->stop_cond , confirming the lock dependency.
Key Functions
terminate_slave_thread is called after setting abort_slave and waits for the SQL thread to finish, holding term_lock . If need_lock_term is false, the caller must already own the mutex, which remains held after the function returns.
Backup Operations Involved
master-data=2 together with --single-transaction adds a global read lock.
FLUSH LOGS refreshes the relay log.
STOP SLAVE Operations
STOP SLAVE waits for the I/O thread to finish, then releases LOCK_msp_map and the associated master_info .
The sequence of events leads to the deadlock: mysqldump holds the global read lock, STOP SLAVE holds mi->stop_cond , and FLUSH LOGS attempts to acquire a system lock that is already held by the stop‑slave thread, causing both sides to wait indefinitely.
References:
https://bugs.mysql.com/bug.php?id=70307
https://bugs.mysql.com/file.php?id=20542&bug_id=70307
https://www.percona.com/blog/2012/03/23/how-flush-tables-with-read-lock-works-with-innodb-tables/
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.