Introducing the Binlog Server: Scaling MySQL Replication without Overloading the Master
The article presents the Binlog Server, a component that replaces intermediate masters in MySQL replication topologies to reduce network saturation, simplify disaster recovery, and improve high‑availability while supporting large numbers of slaves across remote sites.
At Booking.com, replication topologies often involve dozens or even hundreds of slaves replicating from a single master, which can saturate the master’s network interface when serving binary logs.
Two common scenarios that generate large binary logs are bulk row‑based deletions and online schema changes on large tables.
In a topology where a master (M) produces 1 MB/s of binary logs, a hundred slaves would generate roughly 100 MB/s of traffic, approaching the limit of a 1 Gbps link.
-----
| M |
-----
|
+------+------+--- ... ---+
| | | |
V V V V
----- ----- ----- -----
| S1| | S2| | S3| | Sn|
----- ----- ----- -----Figure 1: Replication tree with many slaves.
The traditional mitigation is to insert intermediate masters between the master and its slaves, effectively multiplying the amount of binary logs the master can emit before saturating the network.
-----
| M |
-----
|
+----------------+---- ... ----+
| | |
V V V
----- ----- -----
| M1| | M2| | Mm|
----- ----- -----
| | |
+------+ ... +---- ... +--- ... ---+
| | | | |
V V V V V
----- ----- ----- ----- -----
| S1| | S2| | Si| | Sj| | Sn|
----- ----- ----- ----- -----Figure 2: Replication tree with intermediate masters.
However, intermediate masters introduce replication lag and a single point of failure: if an intermediate master fails, all its slaves must be reinitialized.
Observing that the intermediate layer only needs to serve binary logs, the authors propose a Binlog Server that downloads binary logs from the master, stores them unchanged, and serves them to slaves as if they came directly from the master.
Downloads binary logs from the master.
Saves them to disk using the same filename and content structure.
Serves them to slaves, appearing as the master.
Because Binlog Servers do not apply changes locally, they reduce latency and can be easily repointed if one fails.
The Binlog Server can also simplify disaster recovery: after a master failure, any slave can be promoted to master by noting its binary‑log position (SHOW MASTER STATUS) and re‑attaching other slaves at that position.
Other use‑case: avoiding deep nested replication on remote sites
For remote sites with limited WAN bandwidth, a topology with multiple intermediate masters can cause delay and single‑point failures. Replacing intermediate masters with a Binlog Server on the remote site reduces bandwidth usage while avoiding replication lag.
-----
| A |
-----
|
+------+------+------------+
| | | |
V V V V
----- ----- ----- -----
| B | | C | | D | | E |
----- ----- ----- -----
|
+------+------+
| | |
V V V
----- ----- -----
| F | | G | | H |
----- ----- -----Figure 3: Remote site deployment with intermediate master.
By inserting a Binlog Server (X) at the remote site, the topology combines low bandwidth usage with no intermediate‑master delay.
-----
| A |
-----
|
+------+------+------------+
| | | |
V V V V
----- ----- ----- / \
| B | | C | | D | / X \
----- ----- ----- -----
|
+------+------+------+
| | | |
V V V V
----- ----- ----- -----
| E | | F | | G | | H |
----- ----- ----- -----Figure 5: Remote side deployment with a Binlog Server.
Running two Binlog Servers (X and Y) provides redundancy without requiring additional hardware, as they can share servers with existing nodes.
Other use‑case: easy high availability
Placing a Binlog Server between the master and many slaves allows rapid failover: if the Binlog Server fails, slaves can be repointed to the master; if the master fails, slaves converge to a common state via the Binlog Server, simplifying topology reorganization.
-----
| A |
-----
|
+------+------+------+------+------+
| | | | | |
V V V V V V
----- ----- ----- ----- ----- -----
| B | | C | | D | | E | | F | | G |
----- ----- ----- ----- ----- -----Figure 7: Replication tree with six slaves.
-----
| A |
-----
|
V
/ \
/ X \
-----
|
+------+------+------+------+------+
| | | | | |
V V V V V V
----- ----- ----- ----- ----- -----
| B | | C | | D | | E | | F | | G |
----- ----- ----- ----- ----- -----Figure 8: Replication tree with a Binlog Server.
For extreme scaling, multiple Binlog Servers can be arranged in a tree, each serving a subset of slaves; if one fails, its slaves are repointed to the most up‑to‑date Binlog Server.
-----
| M |
-----
|
+----------------+---- ... ----+
| | |
V V V
/ \ / \ / \
/ I1\ / I2\ / Im\
----- ----- -----
| | |
+------+ ... +---- ... +--- ... ---+
| | | | |
V V V V V
----- ----- ----- ----- -----
| S1| | S2| | Si| | Sj| | Sn|
----- ----- ----- ----- -----Figure 9: Replication tree with many Binlog Servers.
When a master fails, the Binlog Server with the most recent binary logs (e.g., I2) becomes the new hub, and other Binlog Servers are repointed to it, after which slaves converge to a common state.
-----
| M | <--- Failed master
-----
/--- Most up to date Binlog Server
V
/ \ / \ / \
/ I1\ <--------- / I2\ ------> / Im\
----- ----- -----
| | |
+------+ ... +---- ... +--- ... ---+
| | | | |
V V V V V
----- ----- ----- ----- -----
| S1| | S2| | Si| | Sj| | Sn|
----- ----- ----- ----- -----Figure 10: Converging slaves after master failure.
Conclusion
The Binlog Server enables horizontal scaling of MySQL slaves without overloading the master’s network interface and avoids the drawbacks of intermediate masters. It also supports remote‑site replication and simplifies high‑availability topology reorganization after failures.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Art of Distributed System Architecture Design
Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
