Mastering MySQL Replication: Principles, Lag Fixes, and Failover Strategies
This article explains MySQL master‑slave replication fundamentals, the binlog‑based data flow, common consistency and latency problems, practical solutions such as row or mixed binlog formats, caching and query routing, and the trade‑offs of one‑master‑one‑slave versus one‑master‑many‑slaves architectures.
1. MySQL Master‑Slave Overview
MySQL master‑slave replication creates two identical databases: a master that handles read‑write operations and one or more slaves that serve read‑only queries.
Benefits include:
Read‑Write Separation : slaves handle reads, reducing load on the master and improving query performance.
High Availability : if the master fails, a slave can be promoted to maintain service continuity.
Data Backup : slaves store a copy of the data, providing disaster‑recovery capability.
2. Replication Mechanics
Replication relies on the binary log (binlog), which records every data‑changing statement on the master.
Master writes binlog : every INSERT, UPDATE, DELETE is appended to the binlog.
Master sends binlog : a dedicated dump thread streams the binlog to each slave.
Slave writes relay log : the slave’s I/O thread receives the binlog and stores it in a relay log.
Slave replays : an SQL thread reads the relay log and re‑executes the events, keeping the slave consistent with the master.
2.1 Ensuring Consistency
When the binlog format is STATEMENT, the master and slave may choose different indexes for the same DELETE statement, leading to divergent results. Example:
delete from t where a > '666' and create_time < '2022-03-01' limit 1;To avoid this, switch the binlog format to ROW, which records the affected primary‑key rows instead of the raw SQL, guaranteeing identical execution on the slave. The ROW format can increase log size for large transactions, so a MIXED format is often used: MySQL automatically chooses ROW when a potential inconsistency is detected, otherwise it stays in STATEMENT mode.
3. Replication Lag
Lag is the time difference between when the master writes a binlog entry and when the slave finishes replaying it, causing temporary data inconsistency for read queries.
3.1 Main Causes
Single‑threaded replication: binlog writes are sequential and fast, but the slave’s I/O and SQL threads process events one at a time.
High TPS on the master generates more binlog entries than the slave can apply.
Large transactions or DDL statements block the slave’s SQL thread.
Resource‑limited slave hardware (CPU, disk I/O, network bandwidth).
Network latency between master and slave.
Older MySQL versions that support only single‑threaded replication.
3.2 Mitigation Strategies
Monitor replication delay as a key metric; alerts should trigger when lag exceeds a few seconds.
Use caching : write data to both MySQL and a cache (e.g., Redis) and read from the cache to reduce slave load.
Read from the master for critical paths : for inventory or payment flows, query the master directly, accepting the higher load.
Data redundancy via messaging : push full payloads to a message queue instead of relying on slave reads for asynchronous processing.
Batch large deletes and avoid massive DDL on busy tables.
Upgrade MySQL to a version that supports multi‑threaded slave replication.
Improve network bandwidth (e.g., upgrade from 20 Mbps to 100 Mbps).
4. Failover Patterns
4.1 One‑Master‑One‑Slave
Two machines: A (master) handles reads/writes, B (slave) handles reads. If A fails, B is promoted.
Pros : simple setup, automatic failover, slave can serve as backup.
Cons : only one slave, limited read scalability, still a single point of failure for write traffic.
4.2 One‑Master‑Many‑Slaves
One master with multiple slaves (B, C, D). If the master fails, any slave can be promoted, and the restored master becomes a slave.
Pros : significantly higher read throughput, better load distribution, widely used by large‑scale services.
Cons : write throughput remains limited to the single master.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
