Databases 14 min read

Analysis of Relay Log Recovery and Crash Safety in MySQL GTID‑Based and Binary‑Log‑File‑Position Replication

This article examines how MySQL slaves recover their replication state in GTID‑based and binary‑log‑file‑position scenarios, identifies unsafe factors that can cause crash‑unsafe behavior, and explains how enabling relay_log_recovery=on mitigates these risks while outlining remaining limitations.

Tencent Database Technology
Tencent Database Technology
Tencent Database Technology
Analysis of Relay Log Recovery and Crash Safety in MySQL GTID‑Based and Binary‑Log‑File‑Position Replication

The article continues a previous discussion on MySQL slave crash‑unsafe issues by focusing on the role of relay_log_recovery in GTID‑based and binary‑log‑file‑position replication. It first outlines two key questions: whether relay_log_recovery helps make GTID‑based replication crash‑safe, and how the parameter works in binary‑log‑file‑position replication.

GTID‑Based Replication

After a crash, a slave determines its held GTID set ( held_gtids ) as the union of @@GLOBAL.gtid_executed and the difference between Retrieved_gtid_set and last_received_GTID . The article provides the formula and shows pseudo‑code for building held_gtids , retrieving GTID sets from checkpoints, and filtering incomplete transactions.

held_gtids = UNION(@@GLOBAL.gtid_executed, Retrieved_gtid_set - last_received_GTID)

When relay_log_recovery=off , the recovery process may leave last_received_GTID non‑empty, leading to two unsafe factors: partially written events and incomplete transactions, which can cause errors or deadlocks during MTS replay.

Enabling relay_log_recovery=on skips the Retrieved_gtid_set reconstruction and instead adds any GTID gaps from MTS to held_gtids :

held_gtids = UNION(@@GLOBAL.gtid_executed, Gap GTID in MTS)

The slave then sends held_gtids to the master using the COM_BINLOG_DUMP_GTID packet, and the master selects the appropriate binlog file and position by scanning previous GTID events until it finds a suitable start point.

Binary‑Log‑File‑Position Replication

In this mode the slave reads mysql.slave_master_info to obtain held_file_and_pos . If sync_master_info is not 1, the stored position may lag behind the actual binlog, causing duplicate events and data inconsistency. Setting relay_log_recovery=on forces the slave to reset its position from mysql.slave_relay_log_info and perform MTS recovery, which eliminates most unsafe factors except for non‑atomic DDL transactions in versions prior to 8.0.

int4store(ptr_buffer, DBUG_EVALUATE_IF("request_master_log_pos_3", 3,
                                       static_cast<uint32>(mi->get_master_log_pos())));
ptr_buffer+= ::BINLOG_POS_OLD_INFO_SIZE;
memcpy(ptr_buffer, mi->get_master_log_name(), BINLOG_NAME_INFO_SIZE);
ptr_buffer+= BINLOG_NAME_INFO_SIZE;

The article also lists three primary unsafe scenarios when relay_log_recovery=off : partial event writes, partial transaction writes, and mismatched IO/SQL positions, each illustrated with error screenshots.

Conclusion and Outlook

Enabling relay_log_recovery=on resolves most crash‑unsafe factors, but in cases of network failure or master unavailability, large‑scale crashes may still cause data loss if relay logs are purged. The authors propose a conditional recovery strategy that only applies the relay_log_recovery logic when partial events or transactions are detected, minimizing data loss risk. The approach has been submitted to the MySQL community for consideration.

References to official MySQL documentation and bug reports are provided for further reading.

mysqlReplicationbinary logGTIDCrash SafetyMTSRelay Log Recovery
Tencent Database Technology
Written by

Tencent Database Technology

Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.