Databases 30 min read

Understanding MySQL Replication: Principles, Mechanisms, and Practical Applications

MySQL replication copies data changes from a primary server to one or more replicas using binlog events—supporting statement, row, or mixed formats and GTID positioning—to provide real‑time backup, read‑write separation, high‑availability failover, and integration pipelines via asynchronous, semi‑synchronous, or centralized binlog services.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Understanding MySQL Replication: Principles, Mechanisms, and Practical Applications

MySQL Replication (master‑slave replication) enables data changes made on one MySQL server to be copied to one or more other MySQL servers, improving high‑availability, scalability and load‑balancing of database services.

Background

In production environments MySQL is widely used and many services depend on its availability. A failure of the primary MySQL instance can cause severe business impact and data loss. MySQL provides a reliable, log‑based replication mechanism that can create one or more replicas to achieve:

Real‑time data backup : the primary’s writes are continuously persisted on redundant replicas, reducing data‑loss risk.

Horizontal scaling / read‑write separation : read traffic can be offloaded to replicas, improving read throughput.

High‑availability guarantee : when the primary fails, a replica can be promoted quickly without service interruption.

The current production MySQL service at vivo uses asynchronous master‑slave replication, offering 99.99% availability and 99.9999% data reliability.

2.1 Binlog Introduction

Replication relies on the binary log (binlog), which records every change to the MySQL data. Three binlog formats exist:

Statement : the original SQL statement is sent to the replica.

Row : each row change is recorded and applied on the replica.

Mixed : MySQL chooses between statement and row format per statement.

Row‑based replication is now the default because it provides higher accuracy and richer change information, even though it consumes more resources.

2.2 Binlog Details

Binlog events are categorized (e.g., XID_EVENT, QUERY_EVENT, GTID_EVENT, TABLE_MAP_EVENT, ROTATE_EVENT, FORMAT_DESCRIPTION_EVENT). The lifecycle of a binlog file is controlled by log_bin and sync_binlog ; files are rotated based on size and retained according to an expiration policy.

Two primary replication positions are used:

File:Position – the classic method where the master and replica agree on a specific binlog file and byte offset.

GTID – a globally unique transaction identifier (format server_uuid:id ) introduced in MySQL 5.6, allowing automatic position tracking.

Examples:

File: binlog.000001
Position: 381808617
CHANGE MASTER TO MASTER_LOG_FILE='binlog.000001', MASTER_LOG_POSITION=381808617;
Executed_Gtid_Set: e2e0a733-3478-11eb-90fe-b4055d009f6c:1-753
CHANGE MASTER TO MASTER_AUTO_POSITION=1;

2.3 Replication Workflow

When log_bin=ON on the primary, the basic asynchronous replication steps are:

The replica starts an I/O thread and connects to the primary.

The primary launches a binlog dump thread that streams binlog events to the replica’s I/O thread, which writes them into a local Relay Log .

The replica’s SQL thread reads the Relay Log and replays the events, updating the replica’s data.

Only one thread runs on the primary, while the replica runs two threads (I/O and SQL).

Diagram (original image retained):

2.4 Semi‑synchronous Replication

Pure asynchronous replication may lose data if the primary crashes before a replica receives the events. Semi‑synchronous replication (available since MySQL 5.5) makes the primary wait for an ACK from at least one replica before committing the transaction, reducing data‑loss risk with limited performance impact.

Key points of semi‑synchronous mode:

Improves data safety without requiring a full synchronous setup.

HA components can automatically handle failover for semi‑synchronous clusters.

Not all MySQL versions support it, and it may introduce slight latency.

3 High‑Availability & Data Reliability Solutions

Because MySQL’s native asynchronous replication lacks automatic failover, the following solutions are employed:

HA component : monitors cluster health and performs automatic failover.

Middleware Proxy : manages traffic, read/write splitting, and access control.

Remote Log Replication : HA nodes fetch missing binlog files from a failed primary (MHA‑style).

BinlogServer (centralized binlog storage) : stores binlog files centrally; replicas pull from the BinlogServer instead of the primary.

Switch to semi‑synchronous replication for stronger data safety.

Each approach has advantages (e.g., no extra trust relationships for remote log replication) and disadvantages (e.g., additional resource consumption for centralized storage).

3.2 Binlog‑Based Data Transmission

Binlog can be used to stream MySQL changes to other systems (ElasticSearch, Kafka, etc.). Two common ways to capture binlog:

Listen to binlog files directly (log‑collector style).

Register as a fake slave using the MySQL replication protocol.

Example of registering a slave (Go SDK):

data := make([]byte, 4+1+4+1+len(hostname)+1+len(b.cfg.User)+1+len(b.cfg.Password)+2+4+4)

Command layout (Register Slave):

Bytes 0‑3: unused (0).

Byte 4: Command_Register_Slave (value 21).

Bytes 5‑8: server_id (little‑endian).

Following bytes: hostname, user, password.

Next 2 bytes: port (little‑endian).

Last 8 bytes: usually zero; master_id set to 0 for a fake slave.

Example of issuing a binlog dump command:

data := make([]byte, 4+1+4+2+4+len(p.Name))

Command layout (Binlog Dump):

Bytes 0‑3: unused (0).

Byte 4: Command_Binlog_Dump (value 18).

Bytes 5‑8: binlog position (little‑endian).

Bytes 9‑10: dump flags (0 for BINLOG_DUMP_NEVER_STOP).

Bytes 11‑14: server_id (little‑endian).

Remaining bytes: binlog file name.

These commands establish a replication connection that can be observed on the primary.

4 Summary

MySQL replication greatly enhances database availability and reliability while providing a flexible binlog interface for downstream data pipelines. Future work will focus on strengthening the BinlogServer service to improve data safety and open more integration possibilities.

Reference:

MySQL Official Documentation

Database Kernel Monthly Report

MySQLbinlogReplicationData ReliabilityHigh AvailabilityGTIDSemi‑Synchronous
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.