Databases 16 min read

Analyzing MySQL Master‑Slave Replication Lag and Optimization Tips

This article examines a real‑world MySQL master‑slave replication delay case, analyzes the root causes using slave status, InnoDB metrics, OS‑level I/O and error logs, and provides concrete configuration and operational recommendations to reduce lag.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Analyzing MySQL Master‑Slave Replication Lag and Optimization Tips

The author, a DBA from the iKangSheng delivery service team, shares a recent production incident where the replica lagged behind the master and explains how to diagnose and resolve the issue.

Phenomenon and Analysis

The monitoring system reported master‑slave delay, prompting an inspection of the replica using SHOW SLAVE STATUS\G and SHOW PROCESSLIST . The output showed a Seconds_Behind_Master of 808 seconds and many threads waiting for events from the coordinator.

show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.196.131.152
... 
Seconds_Behind_Master: 808

Further investigation of InnoDB engine status revealed high semaphore wait counts, indicating heavy lock contention.

show engine innodb status\G
... RW‑shared spins 0, rounds 2610682598, OS waits 181278068 ...

Operating‑system metrics collected with iostat -x 1 and top confirmed very high disk I/O and CPU usage by the MySQL process.

iostat -x 1
Device            r/s     w/s     rkB/s     wkB/s   %util
sdb             38.00 7235.00  316.00 249236.00  65.60

The MySQL error log showed repeated notes from the page_cleaner thread taking several seconds to flush pages and numerous multi‑threaded slave (MTS) coordinator wait messages, indicating large transactions and insufficient memory for pending jobs.

2021-09-26T11:10:44.341226+08:00 [Note] InnoDB: page_cleaner: 1000ms intended loop took 4504ms.
2021-09-26T11:12:14.456899+08:00 2 [Note] Multi‑threaded slave: Coordinator has waited 7481 times hitting slave_pending_jobs_size_max; current event size = 40906.

Findings and Conclusions

The page_cleaner thread took several seconds to flush ~2000 pages, indicating a large amount of dirty data and that the I/O subsystem cannot keep up; lowering innodb_io_capacity_max is advisable.

Multi‑threaded replication queues were frequently full, suggesting the need to increase slave_parallel_workers or adjust slave_pending_jobs_size_max .

The replica’s slave_pending_jobs_size_max was set to 16 MiB while the master’s max_allowed_packet allowed up to 1 GiB, causing memory pressure for large events.

show variables like '%slave_pending_jobs_size_max%';
+-----------------------------+----------+
| Variable_name               | Value    |
+-----------------------------+----------+
| slave_pending_jobs_size_max | 16777216 |
+-----------------------------+----------+

show variables like 'max_allowed_packet';
+--------------------+------------+
| Variable_name      | Value      |
+--------------------+------------+
| max_allowed_packet | 1073741824 |
+--------------------+------------+

The official documentation explains that slave_pending_jobs_size_max must be at least as large as max_allowed_packet on the source to avoid replication lag caused by large events.

Tips for Reducing MySQL Replication Lag

1. Transmission Lag

Avoid large batches of DML on the master.

Upgrade network bandwidth or hardware.

Improve the replica’s I/O thread performance (e.g., RAID + flash).

Use faster storage for the replica (PCIe SSD).

Consider switching to a different database engine if necessary.

2. Application Lag

Enable group commit in MySQL 5.7.

Ensure every table has a primary key and proper indexes.

Temporarily disable binlog for massive batch jobs, then re‑enable after completion.

Distribute load across multiple replicas.

By adjusting these parameters and monitoring the relevant metrics, the replication delay can be significantly reduced.

performanceMySQLReplicationdatabasestuninglag
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.