Databases 17 min read

Why Master‑Master MySQL Replication Is Risky and How to Avoid Its Pitfalls

This article explains why the dbops tool does not support MySQL master‑master architecture, describes the two common dual‑master patterns, outlines their drawbacks such as role‑confusion, dirty‑data risk, split‑brain, and update loss, and provides practical workarounds and step‑by‑step deployment instructions to safely implement or avoid dual‑master setups.

Efficient Ops
Efficient Ops
Efficient Ops
Why Master‑Master MySQL Replication Is Risky and How to Avoid Its Pitfalls

Background

The dbops toolset supports deploying MySQL under various systems or architectures, but it does not support a master‑master architecture. The author considers master‑master unsuitable for production and therefore does not implement it. If you still want a master‑master setup, you can deploy a master‑slave topology with dbops on two machines and manually create a reverse replication link by executing a

CHANGE MASTER TO

statement on the slave.

Why Use a Dual‑Master Architecture?

There are two typical dual‑master patterns:

Dual‑master single‑write

Dual‑master double‑write

1. Dual‑master Single‑Write (Recommended)

This pattern is often implemented with “keepalived + dual‑master”. Keepalived monitors the MySQL instances; if the primary server with a virtual IP (VIP) fails, keepalived moves the VIP to the other server. Applications connect via the VIP, ensuring a single write point while the underlying bidirectional replication remains active, providing seamless failover.

Although the architecture appears similar to “keepalived + master‑slave”, the key is to allow only one node to accept writes at any time. The dual‑master nature (each node being both master and slave) introduces several issues:

Master/Slave Role Determination : Backup scripts that rely on

SHOW SLAVE STATUS

may see both nodes reporting as slaves, leading to duplicate backups unless the script is enhanced to check which node holds the VIP.

Risk of Dirty Data on the Backup : Accidental writes on the backup server can propagate dirty data to the primary, a scenario not present in a pure master‑slave setup.

Split‑Brain Complications : In a split‑brain event, both nodes become independent partitions. If bidirectional replication is not explicitly broken, the partitions will later synchronize and cause data inconsistency.

2. Dual‑master Double‑Write

In this scheme, applications use a round‑robin‑with‑fallback JDBC URL, causing both masters to accept writes. This leads to “update loss” because each MySQL instance manages its own locks independently. Simultaneous updates to the same row can succeed on both sides, resulting in nondeterministic final values after replication.

Typical mitigation attempts include:

Solution 1: Separate Auto‑Increment Keys (Not Recommended)

<code>auto_increment_offset = 1  # master 1</code>
<code>auto_increment_increment = 3  # step</code>

While this avoids insert conflicts, it does not solve update‑loss problems.

Solution 2: Split by Table or Database (Recommended) Assign specific tables or whole databases to each master, ensuring no cross‑dependency. Further isolation can be achieved by deploying separate MySQL instances (e.g., ports 3306 and 3307) on different hosts, allowing independent maintenance without affecting the other.

For high‑availability across data centers, some teams consider asynchronous bidirectional replication, but this article focuses on local HA scenarios.

2. Replication Loops in Dual‑Master Setups

1. Deploying a Loop Architecture

The following steps use dbops to create a one‑master‑two‑slave topology and then modify it into a looped dual‑master configuration.

<code># Navigate to the playbooks directory</code>
<code>pwd</code>
<code>/usr/local/dbops/mysql_ansible/playbooks</code>
<code># Edit inventory</code>
<code>vi ../inventory/hosts.ini</code>
<code>[dbops_mysql]</code>
<code>192.168.199.171 ansible_user=root ansible_ssh_pass='gta@2015'</code>
<code>192.168.199.172 ansible_user=root ansible_ssh_pass='gta@2015'</code>
<code>192.168.199.173 ansible_user=root ansible_ssh_pass='gta@2015'</code>
<code># Define variables</code>
<code>master_ip: 192.168.199.171</code>
<code>slave_ips:</code>
<code>  - 192.168.199.172</code>
<code>  - 192.168.199.173</code>
<code># Run the playbook</code>
<code>ansible-playbook master_slave.yml</code>
<code># Confirm when prompted</code>

After the initial one‑master‑two‑slave deployment, execute two

CHANGE MASTER TO

statements to create the bidirectional links.

<code># On 172, set 173 as master</code>
<code>CHANGE MASTER TO MASTER_HOST='192.168.199.173', MASTER_PORT=3306, MASTER_USER='repl', MASTER_PASSWORD='Repl@8888', MASTER_AUTO_POSITION=1 FOR CHANNEL 'master173';</code>
<code>START SLAVE;</code>
<code># On 173, set 172 as master</code>
<code>STOP SLAVE;</code>
<code>CHANGE MASTER TO MASTER_HOST='192.168.199.172', MASTER_PORT=3306, MASTER_USER='repl', MASTER_PASSWORD='Repl@8888', MASTER_AUTO_POSITION=1;</code>
<code>START SLAVE;</code>

2. Verifying the Loop

Create a test database on the original master (171) and observe its propagation.

<code>CREATE DATABASE fander;</code>

All three nodes now contain the

fander

database. Inspect the relay log on 172 to see the replicated

CREATE DATABASE fander

statement with GTID

b0ae588e-0eb4-11ee-b23f-000c297b0a30:3

.

Because GTID mode records that this GTID has already been applied, the looped event is not re‑executed, avoiding duplicate errors. In traditional (non‑GTID) replication, the same statement would cause an error due to the database already existing.

3. Why Loops Occur

MySQL prevents replication loops primarily using the

server_id

identifier. Each binlog event carries the originating server’s ID; a replica ignores events whose

server_id

matches its own. In a three‑node example (IDs 171, 172, 173), events can still circulate because each node sees a different

server_id

. GTID mode ensures that even if the event is received multiple times, it is applied only once.

Conclusion

Master‑master MySQL setups introduce complexity such as role ambiguity, risk of dirty data, split‑brain scenarios, and update‑loss problems. Recommended approaches are either single‑write dual‑master with VIP failover or, better yet, separating workloads by tables/databases or using dedicated instances. When a looped topology is unavoidable, GTID replication mitigates duplicate execution, but it still incurs extra network traffic.

High AvailabilityMySQLGTIDmaster-master replicationdbopsdual-master pitfalls
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.