Introduction to MySQL Group Replication (MGR) and Its Application in Private Deployments
This article explains the fundamentals, architecture, deployment patterns, recovery procedures, performance tuning, and common troubleshooting of MySQL Group Replication (MGR) as a high‑availability solution for private deployments, illustrated with real‑world examples and practical SQL commands.
Laizhe Technology uses MySQL Group Replication (MGR) as a private‑deployment high‑availability architecture, serving many customers with stable service for over a year.
The article assumes a basic understanding of MySQL and shares internal knowledge about MGR.
What is MGR?
MySQL Group Replication (MGR) is an official MySQL feature based on the Paxos protocol that provides distributed state‑machine replication, achieving eventual consistency, high availability, scalability, and reliability.
MGR supports single‑primary and multi‑primary modes; the single‑primary mode is recommended and used by Laizhe Technology.
Why Choose MGR?
Compared with traditional master‑slave architectures, MGR requires fewer machines, supports complex data structures, avoids brain‑split issues with Paxos, and provides built‑in failover, though it needs at least three nodes for HA and a complete monitoring system.
Limitations and Characteristics
Minimum 3 nodes, maximum 9 nodes.
More nodes increase fault tolerance but may affect efficiency.
Only InnoDB engine is supported.
All tables must have a primary key.
Nodes are added one at a time.
IPv4 only.
Row‑based binlog required.
Filtering specific information is not supported.
Application at Laizhe Technology
When customers provide their own HA environment, Laizhe uses it; otherwise, a three‑node MGR cluster is built, with ProxySQL as a routing layer for read/write splitting and fault detection.
The architecture ensures automatic primary failover, secondary node isolation, and continued service when one or two nodes fail.
Architecture and Operation
A typical three‑node cluster processes transactions through conflict detection, GTID allocation, and group‑commit distribution. Successful transactions are written to binlog on the primary and relay logs on secondaries; failures trigger rollbacks.
mysql> SELECT * FROM replication_group_members;Recovery of a failed node can be local (using its relay log) or global (dumping binlog from a donor). The recovery thread uses the node’s gtid_executed to request missing transactions.
Performance Tuning
Parallel replication: set slave_parallel_workers , slave_preserve_commit_order=1 , slave_parallel_type=LOGICAL_CLOCK .
Compression: SET GLOBAL group_replication_compression_threshold = 2097152;
Adjust GCT: SET GLOBAL group_replication_poll_spin_loops = 10000;
Security
Whitelist nodes via group_replication_ip_whitelist .
Enable SSL for recovery channel with group_replication_recovery_use_ssl and related certificate variables.
Common Errors and Troubleshooting
Key steps include checking cluster state on all nodes, identifying the primary, reviewing error logs, and using performance_schema views.
SELECT MEMBER_HOST, MEMBER_PORT, MEMBER_STATE FROM performance_schema.replication_group_members m INNER JOIN performance_schema.global_status g ON m.MEMBER_ID=g.VARIABLE_VALUE AND VARIABLE_NAME='group_replication_primary_member';Typical errors such as IP whitelist rejections, GTID mismatches, or network partitions are addressed by adjusting configuration, performing full data restores (e.g., mysqldump --set-gtid-purged=ON ), or re‑adding nodes after fixing the root cause.
Useful Queries
SELECT TABLE_SCHEMA, TABLE_NAME FROM information_schema.tables WHERE TABLE_SCHEMA NOT IN ('information_schema','mysql','performance_schema','test','sys') AND TABLE_NAME NOT IN (SELECT table_name FROM information_schema.TABLE_CONSTRAINTS WHERE TABLE_SCHEMA NOT IN ('information_schema','mysql','performance_schema','test','sys') AND CONSTRAINT_TYPE='PRIMARY KEY'); SELECT TABLE_SCHEMA, TABLE_NAME FROM information_schema.tables WHERE TABLE_SCHEMA NOT IN ('information_schema','mysql','performance_schema','test','sys') AND ENGINE <> 'InnoDB';Author: Liu Tongyong. Feel free to like, comment, and follow.
Laiye Technology Team
Official account of Laiye Technology, featuring its best tech innovations, practical implementations, and cutting‑edge industry insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.