Operations 21 min read

Evolution and High‑Availability Construction of the Haodafu Offline Message Push System

This article describes how the Haodafu offline push service grew from a simple PHP notification tool into a robust, highly‑available micro‑service platform by redesigning architecture, adopting vendor push channels, adding message‑queue reliability, implementing comprehensive monitoring, observability, and a fault‑diagnosis platform to ensure delivery rates and operational stability.

HaoDF Tech Team
HaoDF Tech Team
HaoDF Tech Team
Evolution and High‑Availability Construction of the Haodafu Offline Message Push System

Background

With the rapid development of mobile Internet, most apps provide push notifications to actively deliver personalized information to users. Haodafu’s push service, used for doctor‑patient communication and subscription notifications, must guarantee high delivery rates and timeliness, prompting a complete overhaul of its offline message push system.

System High‑Availability Construction

1. Service Prototype – Push Tool

Initially the push function was a simple notification tool built with PHP, designed only to remind users of new messages. Over time, increasing user expectations and complaints revealed many shortcomings such as lack of retry, certificate management issues, poor Android channel support, and missing monitoring.

2. Service Evolution – Push System

2.1 Requirement Analysis

The redesign aimed to treat push as a full‑featured system rather than a mere tool, separating core functions, user‑experience features, and operational services.

2.2 Technical Selection

After evaluating third‑party SDKs (Jiguang, Getui, Umeng, Baidu) and vendor‑native channels (Mi Push, Huawei Push, FlyMe Push), the team chose to directly integrate vendor services for stability and data security, using third‑party services only as a fallback.

2.3 High‑Availability Channel Optimization

To avoid single‑point failures, link backup strategies were introduced: multiple APNS exits for iOS, and a fallback Umeng channel for Android when vendor APIs fail.

2.4 Guaranteeing No Message Loss

A message‑queue layer (with retry and compensation mechanisms) was added to ensure that transient network or machine failures do not cause message loss.

2.5 Other Optimizations

Switched APNS authentication to p8 token for automatic renewal.

Adopted open‑source Pushy for iOS stability.

Customized high‑priority channels for key business pushes.

Upgraded SDKs to increase payload limits.

Supported single‑ and batch‑push modes.

Unified push implementations across doctor and patient apps.

Implemented end‑to‑end message lifecycle tracking and click analytics.

System Stability Operations

3.1 Monitoring & Alerting

Built a monitoring system based on Google SRE principles, focusing on message failure rate and delivery latency, with alert rules and on‑call escalation.

3.2 Observability

Collected Metrics, Tracing, and Logging using Prometheus, Grafana, and ClickHouse, providing dashboards for overall health, device‑type analysis, long‑term trends, risk assessment, and anomaly detection.

3.3 Fault Diagnosis Platform

Developed a one‑click diagnosis portal for operations staff to view recent push stats, device status, notification‑switch state, send test messages, and trace message paths, dramatically reducing troubleshooting time.

Summary

The push system has evolved from a basic PHP tool to a mature micro‑service architecture that satisfies both technical and operational requirements, delivering reliable notifications while minimizing user disturbance.

Future Plans

Future work will focus on simplifying push strategies, enhancing interactive UI, and adding conversion‑rate analytics to turn notifications into measurable business value.

monitoringpush notificationsObservabilityHigh AvailabilitySREMessage QueueMobile Backend
HaoDF Tech Team
Written by

HaoDF Tech Team

HaoDF Online tech practice and sharing—join us to discuss and help create quality healthcare through technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.