Cloud Native 13 min read

Design Principles and Implementation Details of Kubernetes Horizontal Pod Autoscaler and Custom Water Pod Autoscaler

The article explains Kubernetes’ built‑in Horizontal Pod Autoscaler, then details the custom Water Pod Autoscaler (WPA) that extends HPA with dual‑signal (load and SOA registration) detection, dual‑threshold scaling, noise filtering, configurable cooldown, frequency limits, tolerance buffers, and integrated alerting for reliable elastic scaling.

HelloTech

Dec 23, 2022

Design Principles and Implementation Details of Kubernetes Horizontal Pod Autoscaler and Custom Water Pod Autoscaler

What is Elastic Scaling

Elastic scaling (Horizontal Pod Autoscaler, HPA) is a built‑in Kubernetes controller that monitors pod load and automatically adjusts the number of service pods to meet the expected value of the scaling algorithm. It has become a standard feature in major cloud providers.

The scaling loop consists of three main components:

Metrics Server – collects runtime load metrics from Deployments/RCs and provides data for HPA.

RC/Deployment – the resources that HPA can scale (any resource exposing a Scale sub‑resource, such as Deployment, StatefulSet, RC).

HPA Controller – the brain that fetches metrics, applies the scaling algorithm, and updates the replica count.

Application Scenarios

Elastic scaling resolves the tension between capacity planning and sudden load spikes. A typical example is a hot search on a social platform: traffic surges require rapid scaling up, and after the peak the system must scale down to reduce cost.

Advantages

Automatic – the controller handles scaling without human intervention.

Stable – scaling up adds instances to sustain high load, keeping the service stable.

Cost‑effective – scaling down releases unused instances, reducing waste.

Cloud‑Native Platform Implementation Details

The native HPA is simple to deploy but cannot directly serve the specific business needs of the company because it does not understand SOA registration states. To address this, a custom Water Pod Autoscaler (WPA) was built, extending HPA with dual‑signal detection (load + SOA registration).

WPA is implemented as a CRD. It gathers metrics from the Metrics Server and SOA registration information from the hahas platform, aggregates them, and computes the desired replica count.

Core Algorithm

To avoid frequent scaling, WPA uses a dual‑threshold (upper and lower) instead of a single line. The upper threshold triggers scaling up, the lower threshold triggers scaling down.

Scale‑Up Algorithm (average mode)

When the average mode is selected, averaged = n; in absolute mode, averaged = 1. Example: 5 current replicas, average load 1500m, max threshold 1200m → upScaleProposal = Ceil(5 * 1500 / 1200) = 7, so WPA adds 2 replicas.

Scale‑Down Algorithm (average mode)

Example: 7 replicas, average load 300m, min threshold 400m → downScaleProposal = floor(7 * 300 / 400) = 5, so WPA removes 2 replicas. The algorithm uses ceil for up‑scaling and floor for down‑scaling to maximize responsiveness.

Noise Handling

Two main noise sources are:

Pods in Starting or Stopping states inflate the count.

New pods without metrics appear as empty values.

WPA filters out non‑running pods and handles missing metrics. The relevant code is preserved below:

if pod.DeletionTimestamp != nil || pod.Status.Phase == corev1.PodFailed {
    ignoredPods.Insert(pod.Name)
    continue
}

if pod.Status.Phase == corev1.PodPending {
    unReadyPods.Insert(pod.Name)
    continue
}

if condition == nil || pod.Status.StartTime == nil {
    unReady = true
} else {
    if pod.Status.StartTime.Add(cpuInitializationPeriod).After(time.Now()) {
        unReady = condition.Status == corev1.ConditionFalse || metric.Timestamp.Before(condition.LastTransitionTime.Time.Add(metric.Window))
    } else {
        unReady = condition.Status == corev1.ConditionFalse && pod.Status.StartTime.Add(delayOfInitialReadinessStatus).After(condition.LastTransitionTime.Time)
    }
}

if unReady {
    unReadyPods.Insert(pod.Name)
    continue
}

if ignoredPods != nil && ignoredPods.Len() > 0 {
    removeMetricsForPods(metrics, ignoredPods)
}

if unReadyPods != nil && unReadyPods.Len() > 0 {
    removeMetricsForPods(metrics, unReadyPods)
}

For missing metrics, WPA substitutes a tolerant value or zero depending on the scaling direction:

// Pods missing metrics
metric, found := metrics[pod.Name]
if !found {
    missingPods.Insert(pod.Name)
    continue
}
if len(missPods) > 0 {
    if action == v1alpha1.CronScaleDown {
        for podName := range missPods {
            metrics[podName] = metricsclient.PodMetric{Value: metric.Resource.HighWatermark.MilliValue() + metric.Resource.HighWatermark.MilliValue()*wpa.Spec.Tolerance.MilliValue()/1000}
        }
    } else {
        for podName := range missPods {
            metrics[podName] = metricsclient.PodMetric{Value: 0}
        }
    }
}

Cooldown Period

The cooldown period defines a waiting time after a scaling action to avoid rapid oscillations. It can be configured directly in the platform.

Frequency Control

Limits the number of replicas changed in a single scaling operation. The formula is:

Example: 3 current replicas, target 6, up‑scale percent 20 %, max 7 → tentative total = min(3 + max(1, floor(3 * 0.2)), 7) = 4. After frequency control, only 1 replica is added.

Similar logic applies to down‑scaling.

Tolerance

Tolerance adds a buffer to the upper and lower thresholds (default 1 %). This smooths out minor metric fluctuations.

Upper threshold: highWaterMark * (1 + Tolerance) Lower threshold: lowWaterMark * (1 - Tolerance) Alerting and Notification

The custom autoscaler integrates with monitoring and alerting systems, providing real‑time notifications for scaling actions and version mismatches, and alerts for critical inconsistencies.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Kubernetes Metrics Autoscaling hpa scaling algorithms WPA

Written by

HelloTech

Official Hello technology account, sharing tech insights and developments.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.