Analysis of Galera GTID vs MySQL GTID Issues During Percona XtraDB Cluster Split at Qunar
This article examines how differences between Galera GTID and MySQL GTID caused replication interruptions during a Percona XtraDB Cluster (PXC) split at Qunar, details the migration procedure, reproduces the problem with step‑by‑step commands, explains GTID concepts, and proposes operational improvements to avoid data inconsistency.
MySQL introduced GTID replication to simplify topology initialization and failover, but Qunar heavily uses Percona XtraDB Cluster (PXC) where Galera GTID differs from standard MySQL GTID; overlooking this difference caused issues during a cluster split.
Background: PXC cluster C1 had grown beyond 5 TB per node, making routine maintenance (backup, scaling, migration) time‑consuming. The decision was to split the two largest databases, DB1 and DB2, into a new PXC cluster C2.
Solution outline: a full copy of a C1 node was taken, two additional nodes were built to form a three‑node C2 cluster, and C2’s write node was set as a slave of a C1 node to keep data synchronized. DB1 was migrated first following four steps (shut down services, rename tables, confirm no lag, rename back), then DB2 was migrated similarly, after which unused databases would be removed.
Problem: after completing DB1 migration, replication between C1 and C2 broke; many tables from other databases were missing on C2, forcing the migration plan to be aborted. Investigation revealed that the part of the GTID was identical on both clusters, causing the slave to consider the master’s transactions as older and skip them.
Analysis: because the values matched, the slave generated GTIDs with a larger than the master’s, leading the slave to ignore those binlog events and resulting in data divergence.
Reproduction steps (code shown in ... blocks): # cat xtrabackup_binlog_info mysql-bin.000015 997 401cdbc9-e228-ee17-496f-5c53bc36ae5b:1-1123, c05582e9-dc11-ee14-6b06-c041b8b7ff2d:1-4, da5e0de8-dc13-ee14-76e6-f074e061cc69:1-2 # cat xtrabackup_galera_info 3faa7d16-23ee-11eb-94f9-3fbe474800d2:4 # bootstrap PXC instance /etc/init.d/mysql.server -P 3311 bootstrap-pxc mysql> reset slave all; mysql> set wsrep_on = 0; mysql> reset master; mysql> set wsrep_on = 1; mysql> SET GLOBAL gtid_purged='401cdbc9-e228-ee17-496f-5c53bc36ae5b:1-1123,c05582e9-dc11-ee14-6b06-c041b8b7ff2d:1-4,da5e0de8-dc13-ee14-76e6-f074e061cc69:1-2'; mysql> change master to master_host='10.86.41.xxx', master_port=3306, master_user='replication', master_password='xxxxxxxxxx', master_auto_position=1; mysql> start slave; These commands demonstrate that after bootstrapping, the of the new node matches the original cluster, causing GTID collisions.
GTID concepts: GTID = . The (server_uuid) is a 32‑byte identifier; the trx_number (gno) is a monotonically increasing transaction number. GTID sets can contain multiple intervals.
GTID generation: MySQL creates a GTID at commit time using a global counter next_free_gno. The server_uuid is read from auto.cnf or generated at startup. Galera GTID in PXC uses the wsrep_cluster_state_uuid as the , which is shared by all nodes, so transactions appear to originate from the same logical source.
Improvement measures: define separate operation standards for adding nodes versus splitting clusters, automate the migration workflow, and consider using an intermediate filtered replication node to reduce the time needed to build the new cluster and avoid leftover databases.
Comparison: both GTID formats use , but in Galera the is the cluster UUID shared by all nodes, while in MySQL it is the individual server_uuid, allowing clear identification of the transaction source.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.