Elastic Scaling Practices in Cloud‑Native Kubernetes Environments
To overcome native HPA limits and business‑specific constraints in a fully containerized, cloud‑native Kubernetes environment, we implemented a dual‑threshold water‑level and scheduled scaling engine, hybrid‑cloud ClusterAutoScale, mixed‑deployment resource prioritization, and comprehensive Prometheus‑based observability, achieving higher utilization, lower costs, and a roadmap toward deeper optimization and AIOps.
After the organization completed full‑network containerization, first‑line developers encountered a series of usage issues such as timing, capacity, efficiency, and cost. Elastic scaling became an inevitable technology choice in a cloud‑native containerized environment.
Problems with native HPA
Initial attempts to use the native Horizontal Pod Autoscaler (HPA) revealed many limitations: lack of custom metrics, no scheduled scaling, reliance on resources.requests , and a single‑goroutine execution model. Business‑specific constraints, such as non‑interruptible job instances and downstream database availability, further complicated its use.
Business‑driven elastic capability
We built an elastic mechanism based on actual instance water‑level and effective load, featuring:
High‑low dual‑threshold control to bound stability for fluctuating workloads.
Ceiling‑based scaling (ceil) for expansion and floor‑based scaling (floor) for contraction.
Data denoising to exclude non‑ready instances, strong business‑relationship instances, and metric gaps.
Performance enhancement through namespace‑level listening and concurrency control.
Fusion of water‑level and timed elasticity
The solution merges water‑level thresholds with scheduled scaling, ensuring that expansion chooses the larger of the two triggers while contraction never falls below the scheduled replica count.
Hybrid‑cloud ClusterAutoScale
To address the growing resource pool in a hybrid‑cloud scenario, we designed a ClusterAutoScale that integrates:
Image‑as‑a‑service.
CloudProvider adapters for private‑cloud APIs.
Node initialization and reclamation workflows.
Two trigger strategies are used: unschedulable pod events and resource‑pool water‑level thresholds. Additional challenges solved include private‑cloud capacity assessment, pod CIDR routing, and gray‑scale resource reclamation.
Operational considerations
When deploying in production, attention must be paid to business‑pool capacity, instance volatility standards, health‑probe vs. readiness‑probe differences, metric thresholds, rule inspections, minimal filtering, and independence from external platforms.
Middleware containerization and mixed deployment
We merged Redis and Flink resource pools for time‑shared reuse, eliminating resource fragmentation, reducing cross‑cluster data aggregation, and simplifying operations.
Mixed‑deployment strategy
The overall approach abstracts factors into three layers: application tiering, mixed‑deployment scheduling, and resource QoS. Key directions include:
Application tier labeling (S1‑S4) stored in CMDB and reflected as K8s priority labels.
Resource pool prioritization for critical services while dispersing lower‑priority workloads.
Request recommendation using VPA histogram percentile (P95) multiplied by a water‑level factor, combined with elasticity and health‑state machines.
Load scheduling based on ideal‑value weighting and bin‑packing algorithms, filtering high‑water‑level nodes and predicting future node water‑levels.
Resource dispersion strategies (host‑level, zone‑level, MDU) to maximize distribution.
Results and challenges
Resource utilization improved markedly and cost bills decreased year‑over‑year. However, larger physical‑machine failure radii introduced new stability concerns and increased difficulty in root‑cause analysis.
K8s observability and stability
We built a Prometheus‑based monitoring platform comprising Thanos, Vertex Exporter, SentryD, CheckD, and an alerts system. Additional components include:
Event persistence with full‑resource list‑watch collection.
Log aggregation via Kafka.
Trace analysis with ID‑based querying, tag filtering, and topology inspection.
A stability dashboard tracking native component health, cluster capacity water‑level, resource load, abnormal instances, and cloud‑platform availability.
Future roadmap
The plan focuses on four areas: deeper mixed‑deployment and optimization, containerizing data stores (databases, NoSQL), exploring serverless scenarios for algorithmic and job workloads, and leveraging AIOps with time‑series prediction for proactive fault detection.
HelloTech
Official Hello technology account, sharing tech insights and developments.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.