Using Orchestrator for Automatic MySQL Cluster Failover: Configuration and Test Cases
This article demonstrates how to configure the open-source Orchestrator tool for automatic MySQL cluster failover, explains key parameters, and presents three test cases covering normal failover, lag‑induced prevention, and the effect of disabling global recoveries.
Parameter Description:
Reference: https://github.com/openark/orchestrator/blob/master/go/config/config.go
Purpose:
Use Orchestrator to configure automatic failover for a MySQL cluster.
Managed database instances (1 master 1 slave architecture):
10.186.65.5:3307
10.186.65.11:3307Orchestrator related parameters:
"RecoveryIgnoreHostnameFilters": [],
"RecoverMasterClusterFilters": ["*"],
"RecoverIntermediateMasterClusterFilters": ["*"],
"ReplicationLagQuery": "show slave status",
"ApplyMySQLPromotionAfterMasterFailover": true,
"FailMasterPromotionOnLagMinutes": 1,Test scenarios are executed on the raft-leader node.
Case 1:
Scenario:
Shut down the master and verify failover when the replication lag is less than FailMasterPromotionOnLagMinutes .
Operation:
# Confirm existing clusters
orchestrator-client -c clusters
# View topology, cluster is 10.186.65.11:3307
orchestrator-client -c topology -i 10.186.65.11:3307
# Stop master node
ssh [email protected] "service mysqld_3307 stop"
# Confirm clusters again; original cluster splits into two
orchestrator-client -c clusters
# View topology, now cluster is 10.186.65.5:3307
orchestrator-client -c topology -i 10.186.65.5:3307Conclusion:
Failover succeeded.
The new master has read_only and super_read_only disabled, allowing read‑write operations.
Case 2:
Scenario:
Shut down the master and verify failover when the replication lag exceeds FailMasterPromotionOnLagMinutes (configured as 1 minute).
Operation:
# Show current FailMasterPromotionOnLagMinutes value
orchestrator -c dump-config --ignore-raft-setup | jq .FailMasterPromotionOnLagMinutes
# Confirm existing clusters
orchestrator-client -c clusters
# View topology, cluster is 10.186.65.11:3307
orchestrator-client -c topology -i 10.186.65.11:3307
# Create a delayed slave (e.g., 120 s)
stop slave ;
change master to master_delay=120;
start slave ;
# or
orchestrator-client -c delay-replication -i 10.186.65.5:3307 -S 120
# Wait 120 s
sleep 120
# View topology, cluster remains 10.186.65.11:3307
orchestrator-client -c topology -i 10.186.65.11:3307
# Stop master node
ssh [email protected] "service mysqld_3307 stop"
# Confirm clusters and topology again
orchestrator-client -c clusters
orchestrator-client -c topology -i 10.186.65.11:3307Conclusion:
No failover occurred.
When the slave lag exceeds FailMasterPromotionOnLagMinutes , failover is prevented.
Case 3:
Scenario:
Disable global recovery and shut down the master while lag is less than FailMasterPromotionOnLagMinutes .
Operation:
# Disable global recoveries
orchestrator-client -c disable-global-recoveries
orchestrator-client -c check-global-recoveries
# Confirm clusters and topology
orchestrator-client -c clusters
orchestrator-client -c topology -i 10.186.65.11:3307
# Stop master node
ssh [email protected] "service mysqld_3307 stop"
# Confirm clusters and topology again
orchestrator-client -c clusters
orchestrator-client -c topology -i 10.186.65.11:3307Conclusion:
No failover occurred.
Disabling global recoveries prevents automatic failover.
Summary:
After configuring Orchestrator, automatic failover can be controlled via parameters such as RecoveryIgnoreHostnameFilters , RecoverMasterClusterFilters , RecoverIntermediateMasterClusterFilters , as well as conditions like FailMasterPromotionOnLagMinutes and ReplicationLagQuery . When lag exceeds the configured minutes or global recovery is disabled, failover does not occur.
Testing scenarios are limited; further tests may be needed for specific cases.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.