Achieving High Availability for MySQL & Redis on MaShang Cloud with Distributed Sentinel
This article explains MaShang Cloud's RDS high‑availability design, detailing the distributed sentinel monitoring system, proxy layer, multi‑AZ disaster‑recovery strategies, and real‑world case studies that demonstrate how MySQL and Redis services maintain continuous, consistent access with minimal RTO and RPO.
Introduction
MaShang Cloud provides relational database (RDS) products such as MySQL, TiDB, and PostgreSQL, offering transparent access through virtual IPs, EIPs, or domain names. This article focuses on the MySQL‑based high‑availability design, architecture, and comparisons with other cloud providers.
High‑Availability Architecture
Engineers designed a multi‑node distributed sentinel system that monitors both MySQL clusters and proxy instances, automatically detecting failures and performing failover without relying on simple ping checks.
Key components include:
DB Sentinel Cluster – monitors MySQL/RDS node health.
Proxy Sentinel Cluster – monitors proxy service health.
Synchronization Configuration Service – stores topology and role information in Zookeeper/etcd/Nacos.
Proxy Layer
The stateless proxy cluster handles user requests, routes traffic based on read/write roles, enforces access control, maintains connection pools, and can filter or throttle SQL statements. Common proxy software includes ProxySQL, DBProxy, MySQL Proxy, ArkProxy, and Sharding‑Sphere Proxy.
Sentinel Clusters
Both DB and Proxy sentinel clusters consist of 3‑5 nodes distributed across fault domains. They use a custom RAFT‑based consensus to determine node status (SDOWN, ODOWN) and trigger failover or switchover actions.
Each sentinel runs two internal coroutines:
Prober – performs health checks.
Failover – processes SDOWN/ODOWN events, elects a leader, and executes failover.
Additional coroutines manage endpoint refreshes and RPC handling.
High‑Availability Features
Supports all MySQL architectures (master‑slave, MGR, Galera, etc.) and versions 5.7/8.0.
Custom monitoring and failover logic reduces false positives.
Accurate failure detection via distributed consensus.
Redundant sentinel deployment tolerates up to half of the nodes failing.
Network and data‑center partition tolerance.
Zero‑intrusion for tenant applications.
Performance Metrics
RDS achieves RTO ≈ 30 seconds and RPO ≈ 0, meeting stringent availability requirements.
Data Access Middleware
A proxy layer sits between applications and databases, abstracting underlying architectures, managing read/write routing, enforcing permissions, and providing connection pooling and traffic control.
Use Cases
Examples include a lifestyle service platform using multi‑AZ read/write separation, and a bank employing proxy‑based read/write splitting for near‑zero‑loss failover.
Single‑AZ and Multi‑AZ Solutions
Single‑AZ designs combine RDS/Redis with distributed sentinel and dual‑node proxies. Multi‑AZ designs add cross‑AZ data replication via DTS, separate virtual IPs, and rapid failover to achieve 99.99% availability.
Applicability Beyond Cloud
The same high‑availability patterns can be applied to on‑premise IDC environments, offering low‑cost, non‑intrusive solutions for small‑to‑medium enterprises.
Conclusion
MaShang Cloud's self‑developed RDS/Redis high‑availability solution satisfies tenant requirements for reliability and disaster recovery, while being adaptable to external IDC deployments, providing a robust technical foundation for financial and internet enterprises.
Instant Consumer Technology Team
Instant Consumer Technology Team
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.