Databases 19 min read

Troubleshooting MySQL Group Replication Transaction Certification Errors and Recovery

An in‑depth analysis of MySQL Group Replication (MGR) transaction certification failures, covering error symptoms, root‑cause investigation, replication‑group transaction set mismatches, and step‑by‑step recovery procedures with code examples and best‑practice recommendations to keep MGR clusters clean.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Troubleshooting MySQL Group Replication Transaction Certification Errors and Recovery

1. Fault Phenomenon

A two‑year‑old MGR three‑node multi‑primary environment (5.7.25) imported a batch of data on node 1; subsequent updates on that data failed with the error:

update match_equip set name = ?, type = ?, equips = ?,score = ? where id = ? and
person_id = ?
Received #3101 error from MySQL server: "Plugin instructed the server to rollback
the current transaction."

1.1 Attempted Recovery Operation 1

Node 1 was restarted, but it could not rejoin the group, producing errors such as:

2021-05-27T07:37:53.290267Z 0 [ERROR] Plugin group_replication reported: 'This
member has more executed transactions than those present in the group. Local
transactions: 91f9d301-c234-11e9-b15f-fa163e13423a:1-156817757:156843131-157503127:158192163:158412212,

a71d98a2-c234-11e9-b6db-fa163e3407f8:1-92,

eba21052-c250-11e9-b0d0-fa163e134234:1-3 > Group transactions: 91f9d301-c234-
11e9-b15f-fa163e13423a:1-156817825:156843131-157503172:158192163:158412212,

eba21052-c250-11e9-b0d0-fa163e134234:1-3'
2021-05-27T07:37:53.290348Z 0 [ERROR] Plugin group_replication reported: 'The
member contains transactions not present in the group. The member will now exit
the group.'

Local transactions:

Local transactions:

91f9d301-c234-11e9-b15f-fa163e13423a:1-156817757:156843131-157503127:158192163-
158412212,

a71d98a2-c234-11e9-b6db-fa163e3407f8:1-92,

eba21052-c250-11e9-b0d0-fa163e134234:1-3

Group transactions:

Group transactions:

91f9d301-c234-11e9-b15f-fa163e13423a:1-156817825:156843131-157503172:158192163-
158412212,

eba21052-c250-11e9-b0d0-fa163e134234:1-3

The GTID a71d98a2-c234-11e9-b6db-fa163e3407f8:1-92 exists only locally, so the local transaction set is not a subset of the group set, causing the join failure.

1.2 Attempted Recovery Operation 2

Restoring from a physical backup also failed with errors indicating that the certification information is too large for transmission:

2021-05-27T08:35:01.331704Z 13 [ERROR] Plugin group_replication reported: 'The
certification information could not be set in this server: 'Certification
information is too large for transmission.''

2021-05-27T08:35:01.331752Z 13 [ERROR] Plugin group_replication reported: 'Error
when processing Certification information in the Recovery process'

2021-05-27T08:35:01.331762Z 13 [ERROR] Plugin group_replication reported: 'Fatal
error during the Recovery process of Group Replication. The server will leave the
group.

Querying the size of the certification table showed more than 11 million rows:

root@3306 performance_schema> select COUNT_TRANSACTIONS_ROWS_VALIDATING from
performance_schema.replication_group_member_stats;

+------------------------------------+
| COUNT_TRANSACTIONS_ROWS_VALIDATING |
+------------------------------------+
| 11239426 |
+------------------------------------+
1 row in set (0.00 sec)

This large amount of data is the real cause of the failure, not a MySQL bug.

2. Fault Analysis and Reproduction

2.1 Transaction Certification Mechanism

The MGR certification module decides whether a transaction should be committed or rolled back by checking whether the GTID set of the keys modified by the transaction (stored in certification_info ) is a subset of the transaction's snapshot version. If it is, the transaction commits; otherwise it rolls back.

Each committed transaction inserts or updates its writeset into certification_info . As more transactions are certified, the table grows.

Cleanup works by comparing each record's GTID set with the stable GTID set (the intersection of all nodes' gtid_executed collected every 60 seconds). Records whose GTID set is a subset of the stable set can be removed.

2.2 Fault Analysis

2.2.1 Why does certification_info contain over 11 million rows? Node 1 has a local transaction a71d98a2-c234-11e9-b6db-fa163e3407f8:1-92 . When DML is executed on node 1, the writeset is added to certification_info with a snapshot version that includes this local GTID. Because the local GTID never appears in the group’s gtid_executed intersection, the record is never cleaned, causing the table to grow indefinitely.

2.2.2 Why can other nodes not update data modified on node 1? The snapshot version on other nodes is G:1-XXX , while the certification info for the update contains G:1-XXX,A:1-XXX . Since the latter is not a subset of the former, certification fails and the transaction is rolled back.

2.3 Fault Simulation and Reproduction

2.3.1 Environment preparation – create a node (node 3) with a local transaction.

# Disable binary logging on node 2 and node 3
SET SQL_LOG_BIN=0;
CREATE USER rpl_user@'%' IDENTIFIED BY '*****';
GRANT REPLICATION SLAVE ON *.* TO rpl_user@'%';
FLUSH PRIVILEGES;
SET SQL_LOG_BIN=1;
CHANGE MASTER TO MASTER_USER='rpl_user', MASTER_PASSWORD='*****' FOR CHANNEL
'group_replication_recovery';
INSTALL PLUGIN group_replication SONAME 'group_replication.so';

Start the cluster with node 1 as the bootstrap node:

SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;

When node 2 or node 3 tries to join, they fail because they contain local GTIDs not present in the group. Setting group_replication_allow_local_disjoint_gtids_join=1 lets them join, but this masks the underlying problem.

2.3.2 Fault Simulation Scenarios

Scenario 1 – Node 3 adds data, other nodes cannot update it.

# On node 3
CREATE TABLE mgr_test1(id INT PRIMARY KEY, name VARCHAR(10));
INSERT INTO mgr_test1 VALUES(1,'a');

# On node 1 and node 2
UPDATE mgr_test1 SET name='aa' WHERE id=1;
-- Both return ERROR 3101 (rollback)

Conclusion: nodes 1 and 2 cannot modify rows created by node 3 because the certification check fails.

Scenario 2 – Nodes 1 and 2 add data, node 3 updates it, then nodes 1/2 fail to update again.

# Nodes 1 and 2 insert rows
INSERT INTO mgr_test1 VALUES(11,'aa');
INSERT INTO mgr_test1 VALUES(22,'aa');

# Node 3 updates both rows successfully
UPDATE mgr_test1 SET name='11' WHERE id=11;
UPDATE mgr_test1 SET name='22' WHERE id=22;

# Nodes 1/2 try to update and receive ERROR 3101

2.3.2.5 Recovery – binlog for node 3’s local transaction still exists

Restarting group replication on node 1 synchronises the missing GTID:

STOP GROUP_REPLICATION;
START GROUP_REPLICATION;

After the restart, node 3’s local GTID 34668704-bf55-11eb-b120-000c29ed3768:1 appears in the group.

2.3.2.6 Recovery – binlog for node 3’s local transaction has been purged

When the binlog is removed, node 2 cannot recover the missing GTID and stays in RECOVERING state. The fix is to reset the master and manually set gtid_purged before restarting replication:

STOP GROUP_REPLICATION;
RESET MASTER;
SET GLOBAL gtid_purged='91f9d301-c234-11e9-b15f-fa163e13423a:1-36:1000029:2000029,34668704-bf55-11eb-b120-000c29ed3768:1';
START GROUP_REPLICATION;

3. Summary

Ensure the MGR cluster is "clean" before startup: only global GTIDs should exist, and local GTIDs must be removed. Disable binary logging during the initialization phase.

If a member fails to join with the error "This member has more executed transactions than those present in the group", it indicates the presence of local transactions. Do NOT enable group_replication_allow_local_disjoint_gtids_join ; this option was removed in MySQL 8.0.4 because it leaves hidden risks.

Regularly clean certification_info by allowing each node to broadcast its gtid_executed and pruning records whose GTID set is a subset of the stable GTID set.

DatabaseMySQLTroubleshootingGroup ReplicationTransaction Certification
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.