Databases 11 min read

MySQL High Availability Architecture and Practices at AutoHome

This article explains MySQL high‑availability concepts, defines HA, RPO and RTO, outlines common HA architectures such as master‑slave+VIP, MHA and MGR+Proxy, and details AutoHome's evolution from simple master‑slave setups to a container‑based MGR solution with automated failover and monitoring platforms.

HomeTech

Apr 6, 2022

MySQL High Availability Architecture and Practices at AutoHome

MySQL, being open‑source, easy to operate, and high‑performance, is the most widely used database at AutoHome; as a critical backend storage component, its high availability (HA) is essential.

Compared with commercial databases, achieving HA with open‑source MySQL requires users to design and develop the solution themselves. This article introduces the development history and implementation practice of AutoHome's MySQL HA architecture.

1. HA definition and metrics

High Availability (HA) refers to a system's ability to operate without interruption, representing its level of availability. Key metrics include Recovery Point Objective (RPO) – the maximum data loss tolerated during a disaster, and Recovery Time Objective (RTO) – the time required to restore the system to a running state.

Figure 1: RPO calculation

Figure 2: RTO calculation

2. MySQL HA challenges

The main HA problem is ensuring that when a MySQL instance crashes, the service remains uninterrupted and data loss is avoided (RPO) while recovery time is short (RTO). Challenges include preventing data loss during sudden master failure, maintaining data consistency across nodes, and achieving automatic failover with minimal business impact.

3. Common MySQL HA architectures

3.1 Master‑Slave Replication + VIP – uses virtual IP for automatic failover, with DBA scripts for manual switch.

Figure 5: Master‑Slave Replication + VIP

3.2 Master‑Slave Replication + MHA – MHA (Master High Availability) is a third‑party tool that, upon master failure, transfers binary logs to slaves and rebuilds the master‑slave topology, ensuring no data loss.

Figure 6: Master‑Slave Replication + MHA

3.3 MySQL Group Replication (MGR) + Proxy – MGR provides HA, strong consistency, and automatic primary election; combined with a proxy, applications can switch to the new primary without reconfiguration.

Figure 7: MGR Replication + Proxy

4. AutoHome MySQL HA practice

4.1 Development stages – (1) Master‑Slave + VIP era (pre‑2016), (2) Master‑Slave + MHA era (since 2016), (3) MGR + automation platform era (since 2020), each improving fault detection, automatic failover, and data consistency.

Figure 8: AutoHome MySQL HA evolution

4.2 HA operation platform – consists of three parts: MGR replication architecture, a Prometheus‑based monitoring platform that detects master failures, and an automated operation platform that performs failover within 2‑3 minutes.

Figure 9: HA design diagram

4.3 Containerized MySQL HA – MySQL runs in Kubernetes; the MySQL‑Operator monitors master status every 10 seconds and triggers the HA module after three consecutive failures, achieving failover in 1‑2 minutes.

Figure 11: Container deployment of MySQL HA

5. Future plans

AutoHome intends to further tune the cluster to avoid automatic master switches caused by network jitter or large transactions, and to explore intelligent self‑healing mechanisms for database faults.

In summary, AutoHome’s MySQL HA solution combines MGR replication, monitoring, and an automated operation platform to provide rapid, automatic failover for both physical and containerized MySQL instances, ensuring high service stability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Database high availability Kubernetes MySQL MHA MGR

Written by

HomeTech

HomeTech tech sharing

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.