Investigation of MySQL Group Replication Message Cache Behavior and Parameter Effects
This article examines how MySQL Group Replication's message cache fills under load, the impact of the group_replication_message_cache_size and group_replication_member_expel_timeout parameters, and provides experimental observations and practical recommendations for balancing reliability, memory usage, and data consistency.
The article begins by outlining a scenario with a three‑node MySQL Group Replication (MGR) cluster running version 8.0.21, where one node (C) experiences network instability, causing it to become UNREACHABLE while the other nodes continue processing transactions and storing messages in a cache.
An experiment is conducted by configuring all three nodes with the smallest possible group_replication_message_cache_size (128 MB) and increasing group_replication_member_expel_timeout so the faulty node is not expelled. The cache usage is monitored, and database load is applied until the cache is filled.
After confirming the cache is full, the network of the third node is deliberately disconnected. The remaining nodes still process transactions, and the cache continues to grow. Memory statistics are reset on the primary node to observe cache eviction; after some time the cache releases more memory than its size, indicating a full rotation of cached messages.
When the network for the failed node is restored, the node attempts to retrieve missed messages from the cache, but they have already been evicted, leading to error logs and the node exiting the cluster. The auto‑rejoin mechanism then tries to bring the node back using binlog replication.
The article concludes with two key MGR parameters:
group_replication_member_expel_timeout – determines how long a node can be offline before being expelled; a larger value improves automatic recovery but requires a larger message cache and increases the risk of reading stale data.
group_replication_message_cache_size – controls the amount of memory allocated for the message cache; a larger cache improves the chance of successful automatic catch‑up after failures but consumes more memory.
A set of practical tips is offered for choosing these parameters, balancing tolerance for unstable environments, automation needs, the probability of reading expired data, and physical resource consumption.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.