Backend Development 16 min read

Vivo Push Notification Platform: Architecture Evolution and Engineering Practices

The article details Vivo’s push notification platform, describing its evolution from cloud‑based beginnings to a self‑built, three‑region architecture that supports over 1 million concurrent connections, billions of daily messages, and incorporates optimizations such as adaptive heartbeats, advanced load‑balancing, distributed caching, multi‑layer rate limiting, circuit‑breaker mechanisms, and comprehensive content security.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Vivo Push Notification Platform: Architecture Evolution and Engineering Practices

This article presents a comprehensive overview of the Vivo push notification platform, covering its product positioning, technical architecture evolution, and engineering practices for achieving high performance and stability at scale.

1. Platform Overview

From a product perspective, the Vivo push notification platform establishes a stable, secure, and scalable message push service capable of supporting 1 million pushes per second and billions of concurrent online devices. The platform leverages long connection technology to provide real-time, bidirectional content and service transmission capabilities for smart devices.

From a technical perspective, the platform uses TCP long connections to deliver messages to user devices. The platform's core advantages include: over 100 million concurrent device connections, support for 1 million pushes per second, daily message throughput exceeding 10 billion, real-time push effect analysis, and real-time full-volume push message auditing.

2. Architecture Evolution

Since its inception in 2015, the platform has undergone significant architectural transformations. Initially deployed on cloud infrastructure while relying on internal data centers, latency issues emerged. In 2018, the platform migrated all core logic modules to self-built data centers, implementing a three-region deployment strategy for user proximity access and disaster recovery capabilities.

In 2019, the platform underwent comprehensive reconstruction to support higher concurrency requirements, providing richer product features with improved stability and performance.

3. System Stability and High Performance Practices

Long Connection Gateway Optimization: The platform addresses the challenge of supporting 1 million concurrent connections per server (compared to traditional 10,000 connections). Optimization strategies include adjusting system file descriptor limits, configuring network card interrupt load balancing, tuning TCP parameters (keepalive), and utilizing AES-NI hardware instructions for encryption acceleration. After optimization, an 8-core 32GB server can stably support 1.7 million long connections.

Intelligent Heartbeat Mechanism: To maintain connection stability across network infrastructure, the platform implements adaptive heartbeat strategies with different frequencies for varying network environments, reducing unnecessary heartbeats while ensuring connection reliability.

Load Balancing for Billion-Level Devices: The traffic scheduling system employs four strategies: proximity access, public network detection, machine load monitoring, and interface success rate evaluation to ensure optimal gateway selection.

High Concurrency Handling: The platform uses distributed caching with a 99.9% hit rate to shield the central storage (TiDB) from most requests, ensuring system stability even during temporary storage failures.

Rate Limiting and Flow Control: The platform implements multi-layer rate limiting: push gateway rate limiting using token bucket algorithm with dynamic adjustment, internal node rate limiting for smooth label-based push delivery, and application-level distributed rate limiting using Redis-based leaky bucket implementation.

Circuit Breaker and Degradation: The platform uses message queues and containerized solutions with automatic scaling to handle突发流量 (burst traffic), eliminating the need for manual intervention.

Content Security: The platform implements a content auditing mechanism combining automated review (primary) with manual review (secondary), utilizing local rule-based auditing and the Diting anti-spam system for comprehensive content filtering.

system architecturePush Notificationload balancingTCPRate LimitingLong Connectiondistributed cachinghigh concurrency
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.