Cloud Native 11 min read

Designing a Scalable Architecture for Million‑Level DAU Internet Applications

The article explains how to build a highly available, horizontally scalable architecture for million‑level daily active users by combining DNS routing, L4/L7 load balancing, micro‑service decomposition, caching, sharded databases, hybrid‑cloud deployment, elastic scaling and multi‑level degradation strategies.

High Availability Architecture
High Availability Architecture
High Availability Architecture
Designing a Scalable Architecture for Million‑Level DAU Internet Applications

Recent incidents such as the Xi'an "One‑Code‑Pass" outage highlighted the importance of designing systems with proper scalability and automatic scaling capabilities to handle traffic spikes that exceed normal loads by many times.

The typical request flow for a million‑level DAU internet application includes several layers:

DNS : Resolves user IPs to regional IDC locations, leveraging caching to keep lookups fast.

L4 Load Balancer : Performs traffic forwarding based on domain, often implemented with software solutions like LVS (capacity >100k QPS) or hardware appliances such as F5.

L7 Load Balancer (Gateway) : Usually Nginx clusters that handle application‑level routing, authentication, logging, and monitoring.

Server Layer : Hosts the business logic, either as a monolithic application for simple or small teams, or as micro‑services when the codebase grows beyond a few hundred interfaces or teams exceed ten developers.

Cache : Uses systems like Memcached or Redis (single‑node capacity ~100k QPS, latency 1‑2 ms) to offload frequent reads from the database.

Database : Implements master‑slave separation for read‑write splitting and sharding (both by time and by user ID) to handle terabyte‑scale data volumes.

While this architecture suffices for million‑level DAU, supporting tens or hundreds of millions of daily users requires additional improvements:

Hybrid Cloud Architecture : Combines private‑cloud IDC bandwidth with public‑cloud resources. During traffic surges that exceed private‑cloud egress capacity, a portion of the load is shifted to public‑cloud zones, requiring seamless network interconnects (often dedicated lines) and a platform such as BridgX to abstract resource differences.

Full‑Link Elastic Scaling : Both L4/L7 layers, server instances, cache clusters, and databases must be able to scale out on demand. For example, a public‑cloud SLB can handle millions of concurrent connections, while Nginx instances can be added dynamically based on weighted QPS calculations.

Three‑Level Degradation Mechanism :

Level 1: Transparent to users, releases < 30 % capacity.

Level 2: User‑visible degradation, releases up to 50 % capacity.

Level 3: Severe degradation, releases 50‑100 % capacity, used only as a last resort.

Additional supporting mechanisms such as decision‑support systems, on‑call alerting, and automated scaling tools (e.g., CudgX) are essential for maintaining high availability at massive scale.

microservicesHigh Availabilityload balancingcachingscalable architecturehybrid-cloudsharded databases
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.