Root Cause Analysis of MySQL Insert Latency Using blktrace and Disk I/O Monitoring
This article details a systematic investigation of MySQL insert slowdown caused by disk I/O anomalies, describing log analysis, blktrace data collection, RAID controller firmware bugs, and the final resolution through firmware upgrade, providing practical guidance for database performance troubleshooting.
The client reported frequent low‑efficiency MySQL operations, suspecting disk‑related issues after initial identification of a disk problem but without deeper investigation.
Log Analysis
Slow Log : The slow‑log entries were recorded in batches rather than in real time.
[root@big hao]# less slow_back.log |grep "# Time: 2022-08-26T" | awk -F "." '{print $1}'|termsql -0 "select COL2 time,COUNT(COL2) sql_count from tbl group by COL2"Error Log : Approximately every 22 minutes a page_cleaner note appeared, indicating long flush times.
[root@dqynj142113 was]# grep "2022-08-26T" mysql-error.log | grep flush
2022-08-26T00:19:29.317569+08:00 0 [Note] [MY-011953] [InnoDB] Page cleaner took 10614ms to flush 9 and evict 0 pages
...The page_cleaner messages were interpreted: the duration (e.g., 12151 ms) shows the time spent flushing dirty pages, and the flush 19 and evict 0 pages part indicates that 19 dirty pages were attempted to be written.
Continuous long flushes caused the first round to be logged in mysql-error , while subsequent rounds were not, until the fourth round which was logged again.
Disk I/O Observation
Using the iostat command, periods of abnormal I/O were observed where the disk usage was 100 % while actual I/O activity was near zero.
Customer Requirement
The client confirmed that disk I/O issues were affecting normal business and asked for further investigation to pinpoint the problematic I/O stage.
blktrace Tool
blktrace can trace I/O paths and measure time spent in the scheduler queue versus on the hardware.
Data collection command (25 s capture) :
// blktrace collection command
blktrace -w 25 -d /dev/sda1 -o sda1$(date "+%Y%m%d%H%M%S")Analysis commands :
// Merge binary files
blkparse -i sda120220915112930 -d blkpares2930.out
// Show per‑stage timings
btt -i blkpares2930.outThe stages are defined as:
Q2G – time to generate I/O request (including remap/split)
G2I – time for request to enter I/O scheduler (including merge)
I2D – time waiting in scheduler (scheduler performance metric)
D2C – time on driver/hardware (hardware performance metric)
Q2C – total I/O time (equivalent to iostat await)
Q2Q – interval between two I/O requestsAnalysis showed the bottleneck at the D2C stage, meaning the request spent excessive time on the driver/hardware.
Comparative Test
A similar machine with SATA disks in a non‑RAID configuration did not exhibit the I/O anomaly, suggesting the issue lay in the RAID controller.
Final Investigation
The RAID controller firmware version 25.5.6.0009 was identified as having a known bug that temporarily degrades disk I/O. The recommended fix is to upgrade to version 25.5.8.0001.
Resolution
After upgrading the RAID firmware, the abnormal I/O spikes disappeared, and MySQL insert performance returned to normal.
References
[1] blktrace usage details: https://developer.aliyun.com/article/698568
[2] Firmware version 25.5.8.0001: https://www.dell.com/support/home/zh-cn/drivers/driversdetails?driverid=fy18x
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.