Databases 11 min read

MySQL High Availability Architecture and Practices at AutoHome

This article explains MySQL high‑availability concepts, defines HA, RPO and RTO, outlines common HA architectures such as master‑slave+VIP, MHA and MGR+Proxy, and details AutoHome's evolution from simple master‑slave setups to a container‑based MGR solution with automated failover and monitoring platforms.

HomeTech
HomeTech
HomeTech
MySQL High Availability Architecture and Practices at AutoHome

MySQL, being open‑source, easy to operate, and high‑performance, is the most widely used database at AutoHome; as a critical backend storage component, its high availability (HA) is essential.

Compared with commercial databases, achieving HA with open‑source MySQL requires users to design and develop the solution themselves. This article introduces the development history and implementation practice of AutoHome's MySQL HA architecture.

1. HA definition and metrics

High Availability (HA) refers to a system's ability to operate without interruption, representing its level of availability. Key metrics include Recovery Point Objective (RPO) – the maximum data loss tolerated during a disaster, and Recovery Time Objective (RTO) – the time required to restore the system to a running state.

Figure 1: RPO calculation

Figure 2: RTO calculation

2. MySQL HA challenges

The main HA problem is ensuring that when a MySQL instance crashes, the service remains uninterrupted and data loss is avoided (RPO) while recovery time is short (RTO). Challenges include preventing data loss during sudden master failure, maintaining data consistency across nodes, and achieving automatic failover with minimal business impact.

3. Common MySQL HA architectures

3.1 Master‑Slave Replication + VIP – uses virtual IP for automatic failover, with DBA scripts for manual switch.

Figure 5: Master‑Slave Replication + VIP

3.2 Master‑Slave Replication + MHA – MHA (Master High Availability) is a third‑party tool that, upon master failure, transfers binary logs to slaves and rebuilds the master‑slave topology, ensuring no data loss.

Figure 6: Master‑Slave Replication + MHA

3.3 MySQL Group Replication (MGR) + Proxy – MGR provides HA, strong consistency, and automatic primary election; combined with a proxy, applications can switch to the new primary without reconfiguration.

Figure 7: MGR Replication + Proxy

4. AutoHome MySQL HA practice

4.1 Development stages – (1) Master‑Slave + VIP era (pre‑2016), (2) Master‑Slave + MHA era (since 2016), (3) MGR + automation platform era (since 2020), each improving fault detection, automatic failover, and data consistency.

Figure 8: AutoHome MySQL HA evolution

4.2 HA operation platform – consists of three parts: MGR replication architecture, a Prometheus‑based monitoring platform that detects master failures, and an automated operation platform that performs failover within 2‑3 minutes.

Figure 9: HA design diagram

4.3 Containerized MySQL HA – MySQL runs in Kubernetes; the MySQL‑Operator monitors master status every 10 seconds and triggers the HA module after three consecutive failures, achieving failover in 1‑2 minutes.

Figure 11: Container deployment of MySQL HA

5. Future plans

AutoHome intends to further tune the cluster to avoid automatic master switches caused by network jitter or large transactions, and to explore intelligent self‑healing mechanisms for database faults.

In summary, AutoHome’s MySQL HA solution combines MGR replication, monitoring, and an automated operation platform to provide rapid, automatic failover for both physical and containerized MySQL instances, ensuring high service stability.

MonitoringDatabaseHigh AvailabilitykubernetesMySQLMHAMGR
HomeTech
Written by

HomeTech

HomeTech tech sharing

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.