Locating Large Transactions in MySQL Binlog Using GTID and Shell Scripts
This article explains how to identify and extract large MySQL transactions from binary log files by monitoring binlog size, parsing GTID information, and using shell scripts to retrieve transaction details, timestamps, positions, and DML statistics, with optional use of the my2sql tool.
1 Introduction
Large transactions are common in MySQL, and the first step to handling them is to locate them. By examining binlog file sizes—especially when a binlog exceeds max_binlog_size —you can suspect the presence of a big transaction.
The official documentation states: A transaction is written in one chunk to the binary log, so it is never split between several binary logs. Therefore, if you have big transactions, you might see binary log files larger than max_binlog_size.
Observing unusually large binlog files allows you to target those files for further parsing, though large transactions can also be hidden in normally sized binlogs.
2 Practice
For instances with GTID enabled, locating the GTID of a large transaction is sufficient. The following example demonstrates parsing a binlog to extract the top N largest transactions.
Environment
Test Environment
binlog format
row
binlog version
v4
GTID
Enabled
Verified MySQL versions
MySQL 5.7.30, MySQL 8.0.28
First, extract the top N transactions by size:
# 为了方便保存为脚本,这里定义几个基本的变量
BINLOG_FILE_NAME=$1 # binlog文件名
TRANS_NUM=$2 # 想要获取的事务数量
MYSQL_BIN_DIR='/data/mysql/3306/base/bin' # basedir
# 获取前TRANS_NUM个大事务
${MYSQL_BIN_DIR}/mysqlbinlog ${BINLOG_FILE_NAME} | grep "GTID$(printf ' ')last_committed" -B 1 | grep -E '^# at' | awk '{print $3}' | awk 'NR==1 {tmp=$1} NR>1 {print ($1-tmp,tmp);tmp=$1}' | sort -n -r -k 1 | head -n ${TRANS_NUM} > binlog_init.tmpAfter obtaining the size and POS information, parse each line to retrieve GTID, start/end times, positions, and DML statistics:
while read line
do
# 事务大小这里取近似值,因为不是通过(TRANS_END_POS-TRANS_START_POS)计算出的
TRANS_SIZE=$(echo ${line} | awk '{print $1}')
logWriteWarning "TRANS_SIZE: $(echo | awk -v TRANS_SIZE=${TRANS_SIZE} '{ print (TRANS_SIZE/1024/1024) }')MB"
FLAG_POS=$(echo ${line} | awk '{print $2}')
# 获取GTID
${MYSQL_BIN_DIR}/mysqlbinlog -vvv --base64-output=decode-rows ${BINLOG_FILE_NAME} | grep -m 1 -A3 -Ei "^# at ${FLAG_POS}" > binlog_parse.tmp
GTID=$(cat binlog_parse.tmp | grep -i 'SESSION.GTID_NEXT' | awk -F "'" '{print $2}')
# 通过GTID解析出事务的详细信息
${MYSQL_BIN_DIR}/mysqlbinlog --base64-output=decode-rows -vvv --include-gtids="${GTID}" ${BINLOG_FILE_NAME} > binlog_gtid.tmp
START_TIME=$(grep -Ei '^BEGIN' -m 1 -A 3 binlog_gtid.tmp | grep -i 'server id' | awk '{print $1,$2}' | sed 's/#//g')
END_TIME=$(grep -Ei '^COMMIT' -m 1 -B 1 binlog_gtid.tmp | head -1 | awk '{print $1,$2}' | sed 's/#//g')
TRANS_START_POS=$(grep -Ei 'SESSION.GTID_NEXT' -m 1 -A 1 binlog_gtid.tmp | tail -1 | awk '{print $3}')
TRANS_END_POS=$(grep -Ei '^COMMIT' -m 1 -B 1 binlog_gtid.tmp | head -1 | awk '{print $7}')
logWrite "GTID: ${GTID}"
logWrite "START_TIME: $(date -d "${START_TIME}" '+%F %T')"
logWrite "END_TIME: $(date -d "${END_TIME}" '+%F %T')"
logWrite "TRANS_START_POS: ${TRANS_START_POS}"
logWrite "TRANS_END_POS: ${TRANS_END_POS}"
logWrite "该事务的DML语句及相关表统计:"
grep -Ei '^### insert' binlog_gtid.tmp | sort | uniq -c
grep -Ei '^### delete' binlog_gtid.tmp | sort | uniq -c
grep -Ei '^### update' binlog_gtid.tmp | sort | uniq -c
done < binlog_init.tmpThese steps produce GTID, transaction timestamps, start/end positions, and a count of DML statements per table. The script can be packaged as a daily operations tool.
Final Output Example
[root@localhost ~]$ sh parse_binlog.sh /opt/sandboxes/rsandbox_5_7_35/master/data/mysql-bin.000003 2
2023-12-12 15:15:40 [WARNING] 开始解析BINLOG: /opt/sandboxes/rsandbox_5_7_35/master/data/mysql-bin.000003
2023-12-12 15:15:53 [WARNING] TRANS_SIZE: 0.00161743MB
2023-12-12 15:16:06 [INFO] GTID: 00020236-1111-1111-1111-111111111111:362779
2023-12-12 15:16:06 [INFO] START_TIME: 2023-12-12 15:14:35
2023-12-12 15:16:06 [INFO] END_TIME: 2023-12-12 15:14:35
2023-12-12 15:16:06 [INFO] TRANS_START_POS: 362096066
2023-12-12 15:16:06 [INFO] TRANS_END_POS: 362097697
2023-12-12 15:16:06 [INFO] 该事务的DML语句及相关表统计:
1 ### INSERT INTO `sbtest`.`sbtest100`
1 ### DELETE FROM `sbtest`.`sbtest100`
2 ### UPDATE `sbtest`.`sbtest100`
2023-12-12 15:16:06 [WARNING] TRANS_SIZE: 0.00161648MB
2023-12-12 15:16:25 [INFO] GTID: 00020236-1111-1111-1111-111111111111:505503
2023-12-12 15:16:25 [INFO] START_TIME: 2023-12-12 15:15:36
2023-12-12 15:16:25 [INFO] END_TIME: 2023-12-12 15:15:36
2023-12-12 15:16:25 [INFO] TRANS_START_POS: 603539112
2023-12-12 15:16:25 [INFO] TRANS_END_POS: 603540742
2023-12-12 15:16:25 [INFO] 该事务的DML语句及相关表统计:
1 ### INSERT INTO `sbtest`.`sbtest100`
1 ### DELETE FROM `sbtest`.`sbtest100`
1 ### UPDATE `sbtest`.`sbtest100`
1 ### UPDATE `sbtest`.`sbtest87`For more advanced analysis, the open‑source tool my2sql can filter by rows or time, and works in file mode with stats work‑type to parse any MySQL instance’s binlog without needing the original schema.
# 统计指定 binlog 中各个表的 DML 操作数量(不加 row 和 time 限制)
[root@localhost ~]$ mkdir tmpdir
[root@localhost ~]$ ./my2sql -user root -password xxx -host 127.0.0.1 -port 3306 -mode file -local-binlog-file ./mysql-bin.005375 -work-type stats -start-file mysql-bin.005375 -output-dir ./tmpdir
# 按照事务的行数倒序排序
[root@localhost ~]$ less tmpdir/biglong_trx.txt | sort -nr -k 6 | less
# 按照事务的执行时间倒序排序
[root@localhost ~]$ less tmpdir/biglong_trx.txt | sort -nr -k 7 | less
# 输出示例(binlog starttime stoptime startpos stoppos rows duration tables)
mysql-bin.005375 2023-12-12_16:04:06 2023-12-12_16:16:59 493014756 582840954 123336 53 [test.t1(inserts=61668, updates=0, deletes=0) test.t2(inserts=61668, updates=0, deletes=0)]Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.