Databases 11 min read

Mastering MySQL Replication: Principles, Lag Fixes, and Failover Strategies

This article explains MySQL master‑slave replication fundamentals, the binlog‑based data flow, common consistency and latency problems, practical solutions such as row or mixed binlog formats, caching and query routing, and the trade‑offs of one‑master‑one‑slave versus one‑master‑many‑slaves architectures.

ITPUB
ITPUB
ITPUB
Mastering MySQL Replication: Principles, Lag Fixes, and Failover Strategies

1. MySQL Master‑Slave Overview

MySQL master‑slave replication creates two identical databases: a master that handles read‑write operations and one or more slaves that serve read‑only queries.

Benefits include:

Read‑Write Separation : slaves handle reads, reducing load on the master and improving query performance.

High Availability : if the master fails, a slave can be promoted to maintain service continuity.

Data Backup : slaves store a copy of the data, providing disaster‑recovery capability.

2. Replication Mechanics

Replication relies on the binary log (binlog), which records every data‑changing statement on the master.

Master writes binlog : every INSERT, UPDATE, DELETE is appended to the binlog.

Master sends binlog : a dedicated dump thread streams the binlog to each slave.

Slave writes relay log : the slave’s I/O thread receives the binlog and stores it in a relay log.

Slave replays : an SQL thread reads the relay log and re‑executes the events, keeping the slave consistent with the master.

2.1 Ensuring Consistency

When the binlog format is STATEMENT, the master and slave may choose different indexes for the same DELETE statement, leading to divergent results. Example:

delete from t where a > '666' and create_time < '2022-03-01' limit 1;

To avoid this, switch the binlog format to ROW, which records the affected primary‑key rows instead of the raw SQL, guaranteeing identical execution on the slave. The ROW format can increase log size for large transactions, so a MIXED format is often used: MySQL automatically chooses ROW when a potential inconsistency is detected, otherwise it stays in STATEMENT mode.

3. Replication Lag

Lag is the time difference between when the master writes a binlog entry and when the slave finishes replaying it, causing temporary data inconsistency for read queries.

3.1 Main Causes

Single‑threaded replication: binlog writes are sequential and fast, but the slave’s I/O and SQL threads process events one at a time.

High TPS on the master generates more binlog entries than the slave can apply.

Large transactions or DDL statements block the slave’s SQL thread.

Resource‑limited slave hardware (CPU, disk I/O, network bandwidth).

Network latency between master and slave.

Older MySQL versions that support only single‑threaded replication.

3.2 Mitigation Strategies

Monitor replication delay as a key metric; alerts should trigger when lag exceeds a few seconds.

Use caching : write data to both MySQL and a cache (e.g., Redis) and read from the cache to reduce slave load.

Read from the master for critical paths : for inventory or payment flows, query the master directly, accepting the higher load.

Data redundancy via messaging : push full payloads to a message queue instead of relying on slave reads for asynchronous processing.

Batch large deletes and avoid massive DDL on busy tables.

Upgrade MySQL to a version that supports multi‑threaded slave replication.

Improve network bandwidth (e.g., upgrade from 20 Mbps to 100 Mbps).

4. Failover Patterns

4.1 One‑Master‑One‑Slave

Two machines: A (master) handles reads/writes, B (slave) handles reads. If A fails, B is promoted.

Pros : simple setup, automatic failover, slave can serve as backup.

Cons : only one slave, limited read scalability, still a single point of failure for write traffic.

4.2 One‑Master‑Many‑Slaves

One master with multiple slaves (B, C, D). If the master fails, any slave can be promoted, and the restored master becomes a slave.

Pros : significantly higher read throughput, better load distribution, widely used by large‑scale services.

Cons : write throughput remains limited to the single master.

One master one slave diagram
One master one slave diagram
One master many slaves diagram
One master many slaves diagram
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mysqlreplicationDatabase PerformanceMaster‑Slave
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.