Databases 17 min read

Why Master‑Master MySQL Replication Is Risky and How to Avoid Its Pitfalls

This article explains why the dbops tool does not support MySQL master‑master architecture, describes the two common dual‑master patterns, outlines their drawbacks such as role‑confusion, dirty‑data risk, split‑brain, and update loss, and provides practical workarounds and step‑by‑step deployment instructions to safely implement or avoid dual‑master setups.

Efficient Ops

Oct 16, 2023

Why Master‑Master MySQL Replication Is Risky and How to Avoid Its Pitfalls

Background

The dbops toolset supports deploying MySQL under various systems or architectures, but it does not support a master‑master architecture. The author considers master‑master unsuitable for production and therefore does not implement it. If you still want a master‑master setup, you can deploy a master‑slave topology with dbops on two machines and manually create a reverse replication link by executing a CHANGE MASTER TO statement on the slave.

Why Use a Dual‑Master Architecture?

There are two typical dual‑master patterns:

Dual‑master single‑write

Dual‑master double‑write

1. Dual‑master Single‑Write (Recommended)

This pattern is often implemented with “keepalived + dual‑master”. Keepalived monitors the MySQL instances; if the primary server with a virtual IP (VIP) fails, keepalived moves the VIP to the other server. Applications connect via the VIP, ensuring a single write point while the underlying bidirectional replication remains active, providing seamless failover.

Although the architecture appears similar to “keepalived + master‑slave”, the key is to allow only one node to accept writes at any time. The dual‑master nature (each node being both master and slave) introduces several issues:

Master/Slave Role Determination : Backup scripts that rely on SHOW SLAVE STATUS may see both nodes reporting as slaves, leading to duplicate backups unless the script is enhanced to check which node holds the VIP.

Risk of Dirty Data on the Backup : Accidental writes on the backup server can propagate dirty data to the primary, a scenario not present in a pure master‑slave setup.

Split‑Brain Complications : In a split‑brain event, both nodes become independent partitions. If bidirectional replication is not explicitly broken, the partitions will later synchronize and cause data inconsistency.

2. Dual‑master Double‑Write

In this scheme, applications use a round‑robin‑with‑fallback JDBC URL, causing both masters to accept writes. This leads to “update loss” because each MySQL instance manages its own locks independently. Simultaneous updates to the same row can succeed on both sides, resulting in nondeterministic final values after replication.

Typical mitigation attempts include:

Solution 1: Separate Auto‑Increment Keys (Not Recommended)

auto_increment_offset = 1  # master 1</code>
<code>auto_increment_increment = 3  # step

While this avoids insert conflicts, it does not solve update‑loss problems.

Solution 2: Split by Table or Database (Recommended) Assign specific tables or whole databases to each master, ensuring no cross‑dependency. Further isolation can be achieved by deploying separate MySQL instances (e.g., ports 3306 and 3307) on different hosts, allowing independent maintenance without affecting the other.

For high‑availability across data centers, some teams consider asynchronous bidirectional replication, but this article focuses on local HA scenarios.

2. Replication Loops in Dual‑Master Setups

1. Deploying a Loop Architecture

The following steps use dbops to create a one‑master‑two‑slave topology and then modify it into a looped dual‑master configuration.

# Navigate to the playbooks directory</code>
<code>pwd</code>
<code>/usr/local/dbops/mysql_ansible/playbooks</code>
<code># Edit inventory</code>
<code>vi ../inventory/hosts.ini</code>
<code>[dbops_mysql]</code>
<code>192.168.199.171 ansible_user=root ansible_ssh_pass='gta@2015'</code>
<code>192.168.199.172 ansible_user=root ansible_ssh_pass='gta@2015'</code>
<code>192.168.199.173 ansible_user=root ansible_ssh_pass='gta@2015'</code>
<code># Define variables</code>
<code>master_ip: 192.168.199.171</code>
<code>slave_ips:</code>
<code>  - 192.168.199.172</code>
<code>  - 192.168.199.173</code>
<code># Run the playbook</code>
<code>ansible-playbook master_slave.yml</code>
<code># Confirm when prompted

After the initial one‑master‑two‑slave deployment, execute two CHANGE MASTER TO statements to create the bidirectional links.

# On 172, set 173 as master</code>
<code>CHANGE MASTER TO MASTER_HOST='192.168.199.173', MASTER_PORT=3306, MASTER_USER='repl', MASTER_PASSWORD='Repl@8888', MASTER_AUTO_POSITION=1 FOR CHANNEL 'master173';</code>
<code>START SLAVE;</code>
<code># On 173, set 172 as master</code>
<code>STOP SLAVE;</code>
<code>CHANGE MASTER TO MASTER_HOST='192.168.199.172', MASTER_PORT=3306, MASTER_USER='repl', MASTER_PASSWORD='Repl@8888', MASTER_AUTO_POSITION=1;</code>
<code>START SLAVE;

2. Verifying the Loop

Create a test database on the original master (171) and observe its propagation. CREATE DATABASE fander; All three nodes now contain the fander database. Inspect the relay log on 172 to see the replicated CREATE DATABASE fander statement with GTID b0ae588e-0eb4-11ee-b23f-000c297b0a30:3.

Because GTID mode records that this GTID has already been applied, the looped event is not re‑executed, avoiding duplicate errors. In traditional (non‑GTID) replication, the same statement would cause an error due to the database already existing.

3. Why Loops Occur

MySQL prevents replication loops primarily using the server_id identifier. Each binlog event carries the originating server’s ID; a replica ignores events whose server_id matches its own. In a three‑node example (IDs 171, 172, 173), events can still circulate because each node sees a different server_id. GTID mode ensures that even if the event is received multiple times, it is applied only once.

Conclusion

Master‑master MySQL setups introduce complexity such as role ambiguity, risk of dirty data, split‑brain scenarios, and update‑loss problems. Recommended approaches are either single‑write dual‑master with VIP failover or, better yet, separating workloads by tables/databases or using dedicated instances. When a looped topology is unavoidable, GTID replication mitigates duplicate execution, but it still incurs extra network traffic.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mysql GTID Master-Master Replication dbops dual-master pitfalls

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.