Analysis of MHA Master Crash Failover Process and Source Code
This article examines the MHA open‑source MySQL high‑availability solution, detailing the master‑crash failover workflow, source‑code analysis, configuration checks, binlog handling, new‑master selection, and slave recovery, and provides a concise step‑by‑step checklist for practitioners.
Introduction: MHA has been an open‑source MySQL HA solution for nearly a decade, but it is now outdated; nevertheless the author examines its master‑crash failover logic.
Source code analysis: The main failover routine is shown, highlighting configuration checks, SSH connectivity, binlog handling, GTID vs non‑GTID paths, selection of a new master, and recovery of slaves. Key functions such as do_master_failover , init_config , force_shutdown , save_master_binlog , and recover_slaves are described.
sub main {
...
eval { $error_code = do_master_failover(); };
if ($@) { $error_code = 1; }
if ($error_code) { finalize_on_error(); }
return $error_code;
...
sub do_master_failover {
my $error_code = 1; # error code
my ($dead_master, $new_master);
eval {
my ($servers_config_ref, $binlog_server_ref) = init_config();
$log->info("Starting master failover.");
$log->info("* Phase 1: Configuration Check Phase..\n");
MHA::ServerManager::init_binlog_server($binlog_server_ref, $log);
$dead_master = check_settings($servers_config_ref);
if ($_server_manager->is_gtid_auto_pos_enabled()) {
$log->info("Starting GTID based failover.");
} else {
$_server_manager->force_disable_log_bin_if_auto_pos_disabled();
$log->info("Starting Non-GTID based failover.");
}
$log->info("* Phase 1: Configuration Check Phase completed.\n");
$log->info("* Phase 2: Dead Master Shutdown Phase..\n");
force_shutdown($dead_master);
$log->info("* Phase 2: Dead Master Shutdown Phase completed.\n");
$log->info("* Phase 3: Master Recovery Phase..\n");
check_set_latest_slaves();
if (!$_server_manager->is_gtid_auto_pos_enabled()) {
$log->info("* Phase 3.2: Saving Dead Master's Binlog Phase..\n");
save_master_binlog($dead_master);
}
$log->info("* Phase 3.3: Determining New Master Phase..\n");
my $latest_base_slave;
if ($_server_manager->is_gtid_auto_pos_enabled()) {
$latest_base_slave = $_server_manager->get_most_advanced_latest_slave();
} else {
$latest_base_slave = find_latest_base_slave($dead_master);
}
$new_master = select_new_master($dead_master, $latest_base_slave);
my ($master_log_file, $master_log_pos, $exec_gtid_set) = recover_master($dead_master, $new_master, $latest_base_slave, $binlog_server_ref);
$new_master->{activated} = 1;
$log->info("* Phase 3: Master Recovery Phase completed.\n");
$log->info("* Phase 4: Slaves Recovery Phase..\n");
$error_code = recover_slaves($dead_master, $new_master, $latest_base_slave, $master_log_file, $master_log_pos, $exec_gtid_set);
if ($g_remove_dead_master_conf && $error_code == 0) {
MHA::Config::delete_block_and_save($g_config_file, $dead_master->{id}, $log);
}
cleanup();
};
if ($@) {
if ($dead_master && $dead_master->{not_error}) { $log->info($@); }
else { MHA::ManagerUtil::print_error("Got ERROR: $@", $log); }
$_server_manager->disconnect_all() if $_server_manager;
undef $@;
}
eval { send_report($dead_master, $new_master); };
return $error_code;
}
}Step‑by‑step summary: (1) Check configuration, node versions, SSH reachability, and slave status; (2) Shut down the failed master’s IO threads and run VIP failover scripts; (3) Gather slave status, save binlogs from the dead master, and determine the most advanced slave; (4) Choose a new master based on GTID, replication lag, and candidate flags; (5) Recover the new master, start replication on slaves, and clean up.
The article concludes with a concise checklist of the failover process and references to related MySQL troubleshooting posts.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.