Why Master‑Master MySQL Replication Is Risky and How to Avoid Its Pitfalls
This article explains why the dbops tool does not support MySQL master‑master architecture, describes the two common dual‑master patterns, outlines their drawbacks such as role‑confusion, dirty‑data risk, split‑brain, and update loss, and provides practical workarounds and step‑by‑step deployment instructions to safely implement or avoid dual‑master setups.
Background
The dbops toolset supports deploying MySQL under various systems or architectures, but it does not support a master‑master architecture. The author considers master‑master unsuitable for production and therefore does not implement it. If you still want a master‑master setup, you can deploy a master‑slave topology with dbops on two machines and manually create a reverse replication link by executing a
CHANGE MASTER TOstatement on the slave.
Why Use a Dual‑Master Architecture?
There are two typical dual‑master patterns:
Dual‑master single‑write
Dual‑master double‑write
1. Dual‑master Single‑Write (Recommended)
This pattern is often implemented with “keepalived + dual‑master”. Keepalived monitors the MySQL instances; if the primary server with a virtual IP (VIP) fails, keepalived moves the VIP to the other server. Applications connect via the VIP, ensuring a single write point while the underlying bidirectional replication remains active, providing seamless failover.
Although the architecture appears similar to “keepalived + master‑slave”, the key is to allow only one node to accept writes at any time. The dual‑master nature (each node being both master and slave) introduces several issues:
Master/Slave Role Determination : Backup scripts that rely on
SHOW SLAVE STATUSmay see both nodes reporting as slaves, leading to duplicate backups unless the script is enhanced to check which node holds the VIP.
Risk of Dirty Data on the Backup : Accidental writes on the backup server can propagate dirty data to the primary, a scenario not present in a pure master‑slave setup.
Split‑Brain Complications : In a split‑brain event, both nodes become independent partitions. If bidirectional replication is not explicitly broken, the partitions will later synchronize and cause data inconsistency.
2. Dual‑master Double‑Write
In this scheme, applications use a round‑robin‑with‑fallback JDBC URL, causing both masters to accept writes. This leads to “update loss” because each MySQL instance manages its own locks independently. Simultaneous updates to the same row can succeed on both sides, resulting in nondeterministic final values after replication.
Typical mitigation attempts include:
Solution 1: Separate Auto‑Increment Keys (Not Recommended)
<code>auto_increment_offset = 1 # master 1</code>
<code>auto_increment_increment = 3 # step</code>While this avoids insert conflicts, it does not solve update‑loss problems.
Solution 2: Split by Table or Database (Recommended) Assign specific tables or whole databases to each master, ensuring no cross‑dependency. Further isolation can be achieved by deploying separate MySQL instances (e.g., ports 3306 and 3307) on different hosts, allowing independent maintenance without affecting the other.
For high‑availability across data centers, some teams consider asynchronous bidirectional replication, but this article focuses on local HA scenarios.
2. Replication Loops in Dual‑Master Setups
1. Deploying a Loop Architecture
The following steps use dbops to create a one‑master‑two‑slave topology and then modify it into a looped dual‑master configuration.
<code># Navigate to the playbooks directory</code>
<code>pwd</code>
<code>/usr/local/dbops/mysql_ansible/playbooks</code>
<code># Edit inventory</code>
<code>vi ../inventory/hosts.ini</code>
<code>[dbops_mysql]</code>
<code>192.168.199.171 ansible_user=root ansible_ssh_pass='gta@2015'</code>
<code>192.168.199.172 ansible_user=root ansible_ssh_pass='gta@2015'</code>
<code>192.168.199.173 ansible_user=root ansible_ssh_pass='gta@2015'</code>
<code># Define variables</code>
<code>master_ip: 192.168.199.171</code>
<code>slave_ips:</code>
<code> - 192.168.199.172</code>
<code> - 192.168.199.173</code>
<code># Run the playbook</code>
<code>ansible-playbook master_slave.yml</code>
<code># Confirm when prompted</code>After the initial one‑master‑two‑slave deployment, execute two
CHANGE MASTER TOstatements to create the bidirectional links.
<code># On 172, set 173 as master</code>
<code>CHANGE MASTER TO MASTER_HOST='192.168.199.173', MASTER_PORT=3306, MASTER_USER='repl', MASTER_PASSWORD='Repl@8888', MASTER_AUTO_POSITION=1 FOR CHANNEL 'master173';</code>
<code>START SLAVE;</code>
<code># On 173, set 172 as master</code>
<code>STOP SLAVE;</code>
<code>CHANGE MASTER TO MASTER_HOST='192.168.199.172', MASTER_PORT=3306, MASTER_USER='repl', MASTER_PASSWORD='Repl@8888', MASTER_AUTO_POSITION=1;</code>
<code>START SLAVE;</code>2. Verifying the Loop
Create a test database on the original master (171) and observe its propagation.
<code>CREATE DATABASE fander;</code>All three nodes now contain the
fanderdatabase. Inspect the relay log on 172 to see the replicated
CREATE DATABASE fanderstatement with GTID
b0ae588e-0eb4-11ee-b23f-000c297b0a30:3.
Because GTID mode records that this GTID has already been applied, the looped event is not re‑executed, avoiding duplicate errors. In traditional (non‑GTID) replication, the same statement would cause an error due to the database already existing.
3. Why Loops Occur
MySQL prevents replication loops primarily using the
server_ididentifier. Each binlog event carries the originating server’s ID; a replica ignores events whose
server_idmatches its own. In a three‑node example (IDs 171, 172, 173), events can still circulate because each node sees a different
server_id. GTID mode ensures that even if the event is received multiple times, it is applied only once.
Conclusion
Master‑master MySQL setups introduce complexity such as role ambiguity, risk of dirty data, split‑brain scenarios, and update‑loss problems. Recommended approaches are either single‑write dual‑master with VIP failover or, better yet, separating workloads by tables/databases or using dedicated instances. When a looped topology is unavoidable, GTID replication mitigates duplicate execution, but it still incurs extra network traffic.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.