iQIYI's Custom MySQL High‑Availability Architecture
iQIYI’s custom MySQL HA solution replaces the single‑point‑failure MHA manager with a Raft‑based trio of HA Masters and HA Agents that continuously monitor instances, automatically perform same‑DC, cross‑DC, or cross‑region failovers, reconcile diff binlogs, ensure data consistency, and integrate with CMDB, tracing and chaos‑engineering platforms.
iQIYI provides 24/7 video services to hundreds of millions of users. To ensure uninterrupted playback, its application and database services must adopt a high‑availability (HA) architecture.
The technical product team classifies applications by importance and assigns different SLA guarantees (e.g., S‑level applications have minute‑level RTO, A‑level applications have 10‑minute RTO). This article introduces iQIYI's MySQL HA solution.
Self‑Developed MySQL HA System
The solution is built on a secondary development of MHA (Master‑HA), a mature open‑source MySQL HA framework. MHA consists of a Manager and multiple Nodes. The Manager runs on a dedicated machine, monitors replication status, master health, and performs failover. Each Node runs on a MySQL instance, replicates the master binlog, applies relay‑log differences to other slaves, and cleans up relay logs.
While MHA can switch within 30 seconds and preserve data consistency, it has drawbacks: configuration‑file‑based master‑slave mapping, inability to perform repeated switchover, need to restart Manager when adding/removing instances, and a single‑point‑of‑failure Manager.
Because iQIYI's deployment spans multiple data centers and regions, and the cluster count is large, a custom HA solution was created.
MySQL HA Architecture Overview
The custom system comprises HA Master and HA Agent components. Three HA Masters form a minimal cluster unit, analogous to MHA’s Manager, and achieve HA via the Raft consensus protocol, eliminating the Manager single‑point‑of‑failure and supporting repeated failovers. HA Agents perform functions similar to MHA Nodes: fault detection, binlog parsing and transmission, relay‑log cleanup, and MGR (MySQL Group Replication) HA.
(1) HA Master
The HA Master handles the failover process. It periodically scans a list of “bad instances” and triggers automatic or manual switchover according to policies defined in the CMDB (same‑DC, cross‑DC, or cross‑region). The switchover flow is illustrated in the accompanying diagram.
In addition to master‑node failover, the system also supports slave‑node failover by detecting a failed slave and performing a DNS‑based switch to keep the slave service available.
(2) HA Agent
Agents monitor instances marked as “online” in the CMDB, checking the mysqld process and reporting heartbeat failures to the HA Master. If an instance is deemed down, the HA Master initiates a failover for that instance. To avoid false positives caused by network jitter, the agent timeout is set to 1 minute; short‑duration glitches are ignored.
Agents also monitor the primary node of MySQL Group Replication (MGR). When a primary switch occurs, the agent rebinds the associated domain name to the new primary, making MGR failover transparent to business applications.
(3) Master Election Rules
Given the complex multi‑DC environment, a detailed election algorithm is applied:
Exclude slaves listed in the “bad slaves” set.
Select candidates with the highest priority among the latest slaves.
If no priority is set, consider all non‑bad slaves.
Prefer slaves based on the switch strategy: same‑DC → same‑region → cross‑region.
Among eligible slaves, discard those on machines with excessive master/slave counts, then choose the slave on the machine with the most free disk space.
If no suitable slave is found, an alarm is sent to DBAs for manual intervention.
(4) Diff Binlog Completion
During master failover, three types of diff binlog may appear:
Incomplete relay logs on a slave (partial transactions or events).
Diff relay logs between the latest slave and other slaves.
Unsent diff binlogs from a dead master, if the dead master is still reachable.
The recovery order is illustrated in the diagram. For GTID‑based replication, three diff binlog files are generated and applied sequentially. For non‑GTID replication, the process first changes master to the latest slave, lets the slave recover from it, and then applies the dead master’s diff binlogs.
(5) Data Consistency
If semi‑synchronous replication is used and the master crashes without a network timeout, the HA system can guarantee data consistency after failover. However, if a network timeout occurs at the moment of master failure, semi‑synchronous replication may degrade to asynchronous, potentially causing data loss. In such cases, business logic must provide compensation mechanisms. With MySQL Group Replication (MGR), data loss does not occur.
Conclusion
iQIYI integrates multiple internal monitoring systems, asset management, CMDB, tracing, and a chaos‑engineering platform into a unified application‑operation platform. This platform offers one‑stop services such as probing, inspection, resource analysis, traceability, and fault‑injection drills. Continuous database attack‑defense exercises improve availability and security, achieving a state of “preparedness without fear”.
iQIYI Technical Product Team
The technical product team of iQIYI
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.