Design and Implementation of Snowball Unified Push Platform
This article details the design, challenges, and solutions of Snowball's unified push platform, covering problem analysis, channel capability construction for Android and iOS, system and data architecture, business operations, and future enhancements to achieve high‑availability, scalable, and reliable push notifications across multiple mobile manufacturers.
1. Introduction
Snowball uses technology to connect users with stocks, news, content, and financial products. The Snowball community generates massive daily content, and users subscribe to topics of interest. The "Snowball Unified Push Platform" bridges community and users, providing broad coverage, timely delivery, and precise targeting.
1.1 What is the Push Platform
Early push capabilities suffered from lack of ACK, persistence, idempotent retransmission, unified management, SDK coupling, monitoring, and dynamic strategies.
1.2 Problems Solved
Issue
Impact
Missing ACK
Asynchronous push cannot confirm delivery; third‑party delays or loss.
No persistence
Messages processed individually without storing state.
No idempotent retransmission
Long call chains cause loss.
No unified management
Configuration, permissions, quotas, rate‑limit scattered.
SDK strong coupling
Changing client requires rewriting.
Lack of monitoring
Insufficient metrics for analysis.
No dynamic strategy
Cannot adapt to user preferences, causing annoyance.
By researching industry solutions and leveraging major vendors, Snowball built a self‑managed push channel to address these difficulties.
2. Core Construction and Design
2.1 Channel Capability Construction
Integrated native channels from Apple, Huawei, Xiaomi, OPPO, VIVO, Meizu, and third‑party Umeng for other devices.
2.1.1 Android Channel
Each vendor imposes different content review, quota, and traffic controls. Unified push alliance is not yet suitable for Snowball's current needs.
Optimization plan includes adjusting daily push volume and handling vendor‑specific limits.
Optimize daily push total.
Vendor daily limits:
Channel
Status Code
Official Description
Xiaomi
200001
Push exceeds daily limit, request fails.
OPPO
33
Message count exceeds daily limit.
VIVO
10070
Single/group push cannot exceed daily total.
Solutions:
Tailor push logic per vendor to ensure critical content delivery.
Apply for increased quota based on app type and vendor rules.
message Message {
int64 messageId = 1;
string title = 2; // push title
string payload = 3; // content body
string description = 4; // notification summary
string callback = 5; // callback URL
string summary_callback = 6; // image URL
Type type = 7; // business type for priority
Application app = 8; // target client app
repeated int64 target = 9; // target user IDs
int64 created = 10; // creation time
int32 ttl = 11; // expiration (ms)
map
ext = 12; // custom fields
map
version_filter = 13; // version funnel
Application targetType = 14; // ID type
}Rate‑limit handling example for Xiaomi:
// Xiaomi push rate‑limit handling
if ("200002".equals(obj.get("code").asText())) {
// 200002 limit, retry later
limitCounter.increment();
LOGGER.warn("Xiaomi API rate‑limit triggered, retrying users: {}", uidList);
pushStatusProducer.sendMessageRetry(message.toBuilder().clearTarget().addAllTarget(uidList).build());
return;
}2.1.2 iOS & Other Channels
iOS uses APNs, initially via JDK, now via the open‑source Pushy SDK. Occasionally network issues arise; deploying push nodes close to APNs can mitigate latency.
Meizu channel meets QPS and total limits per its API.
Umeng/Jiguang act as fallback channels to improve coverage and robustness.
2.2 Platform Capability Construction
The platform now enriches system, data, and business capabilities on top of basic channel functions.
2.2.1 System Capability
Deployed on 8 servers (4 vCPU, 8 GiB each), handling >80 w/s messages and supporting >10 billion daily pushes. Bottlenecks now lie mainly on vendor limits.
Key issues include vendor channel selection and full‑chain tracing.
2.2.2 Data Capability
Message push is followed by closed‑loop management and effect tracking via dashboards covering dozens of business scenarios across three apps.
// Message bus data format
public void sendByDevice(PushResultEnum result, PushFailedTypeEnum failType, String reason, UserStateProto.Device device, MessageProto.Message message) {
MessageAck ack = new MessageAck();
ack.setUploadTime(System.currentTimeMillis());
ack.setMsgId(message.getMessageId());
ack.setUid(device.getUid());
ack.setChannel(device.getDeviceChannel());
ack.setResult(result.getTypeName());
ack.setFailedType(failType.getTypeName());
ack.setFailedReason(reason);
ack.setAppVersion(device.getAppVersion());
ack.setToken(device.getDeviceToken());
ack.setDescription(message.getDescription());
ack.setApp(message.getApp().name());
ack.setBizType(message.getExtMap().get(TrackingExtKey.BIZ_TYPE));
ack.setExt(message.getExtMap());
ack.setCallback(message.getCallback());
sendMessageACK(ack);
}Data enables app uninstall analysis, content hotness tags, delivery‑rate metrics, and user experience improvements.
2.2.3 Business Capability
The operation console records detailed lifecycle data for each push, providing funnel analysis and helping optimize topics and audiences.
Operation side: dynamic targeting, algorithmic personalization.
Review side: strict content standards and data governance.
3. Review and Summary
Key takeaways:
Decouple business logic from data via a message bus.
Standardize channel interaction using HTTP APIs for easier maintenance and performance tuning.
Corresponding solutions for earlier issues include using HTTP callbacks for ACK, msg_id+uid for persistence, vendor‑provided idempotent parameters for retransmission, unified configuration management, standardized data fields for SDK decoupling, and dynamic push strategies based on user preferences.
4. Future Outlook
4.1 In‑site and Out‑site Push Synchronization
Combine offline vendor pushes with online long‑connection pushes to reduce platform pressure.
4.2 SMS and PUSH Complementary Design
Integrate SMS reminders to improve critical message reach.
4.3 Service Elasticity
Leverage scalable infrastructure, stateless services, and small‑batch sending to handle extreme traffic scenarios.
References
APNs: https://developer.apple.com/library/archive/documentation/NetworkingInternet/Conceptual/RemoteNotificationsPG/APNSOverview.html
MiPush: https://dev.mi.com/console/doc/detail?pId=40
HMS: https://developer.huawei.com/consumer/cn/service/hms/catalog/huaweipush_agent.html
OPush: https://storepic.oppomobile.com/openplat/resource/201908/23/OPPO%E6%8E%A8%E9%80%81%E5%B9%B3%E5%8F%B0%E6%9C%8D%E5%8A%A1%E7%AB%AFAPI-V1.7.pdf
VPush: https://dev.vivo.com.cn/documentCenter/doc/155
Meizu Push: http://open.res.flyme.cn/fileserver/upload/file/201803/be1f71eac562497f92b42c750196a062.pdf
Snowball Engineer Team
Proactivity, efficiency, professionalism, and empathy are the core values of the Snowball Engineer Team; curiosity, passion, and sharing of technology drive their continuous progress.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.