How Hupu Scaled to Millions: Inside the Flex Auto‑Scaling Platform
This article details Hupu's massive sports‑traffic environment, the design and implementation of the Flex auto‑scaling platform, its architecture, core functions such as resource statistics, node and pod scaling, scenario scheduling, and the performance optimizations that enable rapid, cost‑effective scaling across multi‑cloud Kubernetes clusters.
Overview
Hupu, a sports community founded in 2004, serves over 100 million registered users, with daily peak traffic of 230 million and 11 million monthly active users, demanding high reliability.
1. Hupu Business Scenario
Hupu runs about 500 online production services using Java, Python, Go, PHP, and Node across four environments (development, test, pre‑release, production). Since 2019, 80 % of workloads have been containerized on Kubernetes; peak node utilization exceeds 60 % and cloud resource cost has been reduced by 50 %.
Multiple clusters are deployed on Tencent Cloud and Alibaba Cloud, covering test, pre‑release, production, operation, and disaster‑recovery environments.
Traffic Characteristics
Traffic is low at night and stable during the day, but spikes dramatically during sports events or unexpected incidents, making it hard to predict.
Challenges
Hupu must keep sufficient standby resources while minimizing cost (only 15‑20 % buffer in production clusters), support multi‑cloud deployment, and ensure compatibility between Pods and VMs.
2. Flex Auto‑Scaling Platform
Overview
Flex provides a dashboard with cluster‑level and application‑level resource views, scenario scheduling status, and hot‑match information for operators.
The left menu includes scaling for Pods, cloud VMs, nodes, database scaling, scenario switches, audit logs, and permission management.
Architecture
Flex stores application metadata (name, project, owner, department, resource configuration, replica count) in a CMDB. When scaling is required, Flex retrieves the target application and ensures a minimum number of ready instances.
For scaling up, Prometheus collects metrics from Kubernetes and cloud VMs, processes alerts via the Rig system, and passes decisions to Flex. An API layer abstracts calls to K8s, Tencent Cloud, and Alibaba Cloud, enabling unified Pod and VM scaling.
Core Functions
Resource statistics, node scaling, and scenario scheduling.
Resource Statistics
The dashboard shows node counts, distinguishing between subscription‑based and pay‑as‑you‑go nodes. Pay‑as‑you‑go nodes running more than 8 hours trigger evaluation to add subscription nodes for cost optimization.
Resource allocation is tuned via Requests; when usage approaches thresholds, scaling is considered, otherwise excess subscription capacity is reduced. Application‑level rankings display replica, resource usage, and cost, helping identify hot spots for performance tuning and cost reduction.
Node Scaling
Initially scaling was triggered only by Pod Pending events. Later a reserved‑resource dimension was added, initially 35‑40 % reserve, later compressed to 15‑20 % after cost analysis.
Standard resource packages (e.g., 2C4G, 8C16G) are reserved to avoid fragmentation; with 8C16G at least ten instances can be created. Over three years, node scaling has reached 200 k operations.
Pod & Cloud‑VM Scaling
Scaling decisions consider CPU, memory, QPS, JVM thread count, and both real‑time and predictive metrics (10‑minute forecast). Predictive scaling can pre‑empt traffic spikes.
Black‑list rules can forbid scaling up or down for specific applications, and thresholds can be defined by percentage or absolute numbers.
An API allows services to control replica counts directly.
Scenario Scheduling
Pre‑defined strategies handle special events (e.g., NBA start) by activating specific scaling phases before operators arrive, ensuring resources are ready for sudden traffic surges.
Other Scaling
An alpha version adds MySQL/Redis scaling based on CPU, memory, disk, connections, and shard count.
3. Problems and Optimizations
Slow Node Join
Node join time on Alibaba Cloud was ~2 minutes; after script tuning it dropped to 10 seconds. Asynchronous batch creation of up to 50 nodes further improved speed.
Pod Scheduling Delays
Initial pod creation of 200 Pods took ~20 seconds due to the arms‑pilot component; after removal, creation fell to 1 second. DaemonSet overhead was reduced by embedding functionality into nodes.
A separate Scale step was added after HPA patch to shorten reaction time.
Image Pull Latency
Layered Docker images are used: base layer (CentOS), language layer (OpenJDK/Golang/Python), intermediate layer (common packages), and business layer (application code). A custom NodeImage plugin pre‑pulls core images, and Harbor proxy accelerates concurrent pulls.
Pod Startup Slowness
Three tactics are applied: lazy‑load agents, adjust health checks (delay liveness, advance readiness), and move postStart/preStop logic to external processes.
Master Node Performance
During massive scaling, master CPU spikes to 100 % and memory exhausts. Profiling showed JSON decoding overhead; scaling the master to 64C/128G (or 64C/256G) during peak periods balances cost and stability.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.