Databases 19 min read

Analysis of MySQL Backup Blocking Issues and Optimization Strategies

The article investigates a production MySQL outage caused by FLUSH NO_WRITE_TO_BINLOG TABLES blocking during backups, presents three experimental scenarios to identify the root cause, and offers practical recommendations such as setting query timeouts, adding indexes, and using Percona XtraBackup options to avoid backup stalls.

Aikesheng Open Source Community

May 6, 2022

Analysis of MySQL Backup Blocking Issues and Optimization Strategies

Background

During an early‑morning incident a middle‑platform (coupon) database could not be accessed, showing many connections in the state "Waiting for table flush". The queries were simple SELECTs, yet MySQL reported they were waiting for a table flush.

| 55279262 | xxx_app | 192.168.50.143:44623 | xxx_mam | Query | 206 | Waiting for table flush | SELECT count(*) FROM mam_user_coupon_package ...

To quickly relieve the pressure the author generated kill statements for the offending sessions:

select concat('kill ', id, ';') from PROCESSLIST where USER='xxx_app' and STATE='Waiting for table flush';

The issue re‑occurred repeatedly, so the server was restarted, which temporarily restored normal operation.

Post‑mortem Analysis

The root cause was a conflict between a long‑running backup operation and a concurrent FLUSH NO_WRITE_TO_BINLOG TABLES command triggered by Percona XtraBackup.

220425 01:01:48 Executing FLUSH NO_WRITE_TO_BINLOG TABLES...

The flush command does not write to the binary log, preventing replicas from executing the same operation, and it acquires a global lock on all tables.

Backup Procedure Overview

The backup workflow includes:

Check connectivity and version.

Read configuration (data and log file locations).

Start redo‑log copy thread from the latest checkpoint.

Copy InnoDB files (.ibd, ibdata1, undo, etc.).

Execute FLUSH NO_WRITE_TO_BINLOG TABLES and FLUSH TABLES WITH READ LOCK to close all tables and obtain a backup lock.

Copy non‑InnoDB files.

Write binlog position information.

Execute FLUSH NO_WRITE_TO_BINLOG ENGINE LOGS to flush redo logs.

Stop the redo‑log copy thread.

Release the lock with UNLOCK TABLES and finalize the backup.

Experimental Scenarios

Scenario 1 – Uncommitted Write Transaction

A transaction that begins, updates a row, and never commits does not block the backup.

begin;
update tb1 set id =10 where id =1;
/usr/bin/innobackupex ...
... completed OK!

Scenario 2 – Slow Query Inside a Transaction

A long‑running SELECT (using SLEEP(1000)) inside an open transaction blocks the backup until the query times out.

begin;
select id,sleep(1000) from tb1;
/usr/bin/innobackupex ...
... backup completes only after query timeout ...

Scenario 3 – Slow Query Without a Transaction

A slow SELECT executed outside a transaction also blocks the backup, because the table remains locked until the query is killed or times out.

select id,sleep(1000) from tb1;
/usr/bin/innobackupex ...
... backup succeeds after query timeout ...

FLUSH TABLES Explanation

The statement FLUSH NO_WRITE_TO_BINLOG TABLES is equivalent to FLUSH TABLES but does not write to the binary log. It closes all open tables, forces a global lock, and flushes query and prepared‑statement caches.

The FLUSH statement has several variant forms that clear or reload internal caches, flush tables, or acquire locks.
By default the server writes FLUSH statements to the binary log; adding NO_WRITE_TO_BINLOG suppresses logging.
FLUSH TABLES closes all open tables, forces them to be closed, and flushes caches.

Production‑Level Findings

A 10‑hour slow query was identified in the production slow‑query log:

# Time: 2022-04-25T09:12:53.416049+08:00
# User@Host: BlueMoon[BlueMoon] @  [192.168.50.144]  Id: 55128898
# Query_time: 38910.325000  Lock_time: 0.000328  Rows_sent: 0  Rows_examined: 14637241
SELECT d.coupon_master_code FROM mam_coupon_activity_interact a ...

The mam_coupon_send_log table lacks an index on interact_id, which likely contributed to the long execution time. Adding the index is recommended:

alter table mam_coupon_send_log add index idx_interact_id(interact_id);

The server’s max_execution_time variable was set to 0 (no timeout). Setting a global timeout (e.g., 120 seconds) helps prevent runaway queries:

set global max_execution_time = 120000;  -- 120 seconds

Backup Optimization Options

Percona XtraBackup provides two useful options to avoid backup stalls caused by long‑running queries:

--ftwrl-wait-timeout=SECONDS   # wait for queries that block FLUSH TABLES WITH READ LOCK
--kill-long-queries-timeout=SECONDS   # kill queries that still block after the timeout

Example command:

/usr/bin/innobackupex --defaults-file=/etc/my.cnf --user=bigdata_admin --password=123456 \
  --socket=/mysqlData/data/mysql.sock --port=3306 \
  --ftwrl-wait-timeout=5 --kill-long-queries-timeout=5 /data/backup/

Summary

1. Business layer: Regularly optimize slow queries; aim for < 0.2 s execution time for OLTP workloads.

2. Database layer: Set a reasonable max_execution_time (e.g., 120 s) to automatically kill excessively long statements.

3. Backup layer: Use --ftwrl-wait-timeout and --kill-long-queries-timeout to limit lock‑wait time and prevent backup blockage.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mysql Database Optimization slow-query Percona XtraBackup flush tables

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.