Design and Lessons from Meizu Real-Time Push Architecture
The article recounts Meizu architect Yu Xiaobo's presentation on the company's real‑time push system, describing its massive scale, four‑layer backend architecture, challenges such as power consumption, mobile network instability, massive connections, and the monitoring and gray‑release strategies employed to ensure reliability.
The article is based on a talk by Meizu architect Yu Xiaobo at the Meizu Technology Open Day, where he shared the design of Meizu's real‑time push system, the pitfalls encountered, and key insights.
System overview: the platform serves about 25 million online users, handles roughly 50 billion page views per day, and can push up to 6 million messages per minute.
Architecture design: the logical design is divided into four layers – the bottom layer provides access for Meizu phones, the second layer is the message distribution service that handles upstream routing and downstream delivery using a user routing table, the third layer manages subscription information, and the fourth layer handles storage, including offline message storage and subscription storage.
Pitfalls & insights – Power consumption : two main issues are traffic and battery. Traditional protocols such as XMPP and SIP are heavy and consume a lot of bandwidth; Meizu therefore created a lightweight IDG protocol that reduces traffic by 50‑70 % and, combined with an intelligent heartbeat strategy, lowers battery drain.
Mobile network issues : unstable and high‑latency mobile networks cause duplicate messages. The solution uses sequence‑based interaction, DNS/IP fallback, and a client‑side IP ranking and probing mechanism to select the fastest server, while the server adds a small delay when its load exceeds a threshold.
Massive connection handling : the system achieves 4 million concurrent long‑connections per machine using C++ for performance, multi‑process + epoll, Tcmalloc for memory pooling, and kernel tuning (CPU affinity for NIC interrupts and increasing TCP RTO from 200 ms to about 3 s).
Load balancing : instead of a single‑point LVS, Meizu performs client‑side load balancing by sorting IP lists by load, probing multiple IPs, and selecting the fastest response; the server also adds a configurable delay based on current connection count to avoid overloading a single node.
System monitoring and gray release : strict monitoring metrics include error count, send/receive queue depth, request rate, interface latency, and service availability. Gray‑release allows user‑transparent deployments, gradually rolling out changes to a subset of nodes, monitoring health, and then expanding to the full fleet.
These practices together enable Meizu's real‑time push service to handle massive scale while maintaining low latency, high reliability, and efficient resource usage.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.