Cloud Native 10 min read

Master Kubernetes HPA: Auto-Scale Pods Efficiently with Real-World Examples

This guide explains what Kubernetes Horizontal Pod Autoscaler (HPA) is, how it works, its key features, and provides step‑by‑step configuration, verification, and scaling policy details with practical code examples for cloud‑native applications.

Linux Ops Smart Journey
Linux Ops Smart Journey
Linux Ops Smart Journey
Master Kubernetes HPA: Auto-Scale Pods Efficiently with Real-World Examples

What is Kubernetes HPA?

Kubernetes Horizontal Pod Autoscaler (HPA) is an automatic scaling mechanism that adjusts the number of Pods based on actual workload, ensuring stable performance while optimizing resource usage and avoiding waste.

How HPA Works

Monitoring: HPA periodically retrieves resource usage metrics from sources such as Metrics Server or custom providers.

Decision: Based on collected data and predefined targets (e.g., CPU utilization, memory), HPA decides whether to scale the Pods.

Execution: If usage exceeds the threshold, HPA updates the relevant controller (Deployment or ReplicaSet) to increase Pods; if usage is below a threshold, it reduces the number of Pods.

Key Features

Resource‑based scaling: By default, HPA scales on CPU utilization, but it can be configured for memory, network bandwidth, etc.

Custom metrics: Supports external metrics from systems like Prometheus.

Cooldown period: Prevents rapid flapping by defining a cooldown interval during which no scaling occurs.

Flexible configuration: Minimum and maximum replica counts and target metrics can be set via YAML or other methods.

Practical HPA Configuration

Example deployment triggers scaling when total CPU request reaches 500m.

<code>$ cat <<'EOF' | kubectl apply -f -
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: simple
  namespace: default
spec:
  maxReplicas: 10
  minReplicas: 1
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageValue: 500m  # average CPU usage across all replicas
  scaleTargetRef:
    apiVersion: apps/v1
    name: simple
    kind: Deployment
EOF</code>

Check the HPA resource:

<code>$ kubectl get hpa
NAME    REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
simple  Deployment/simple  0/500m    1         10        1          8m24s</code>

Tip: If the Deployment sets replicas to 2 and HPA minimum is 1, HPA will adjust the Deployment to 1 replica.

Verification of HPA Functionality

Create a debugging container with a load‑testing tool:

<code>$ cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
  name: tools
  namespace: default
spec:
  replicas: 1
  selector:
    matchLabels:
      app: tools
  template:
    metadata:
      labels:
        app: tools
    spec:
      containers:
      - name: tools
        image: core.jiaxzeng.com/library/tools:v1.2
EOF</code>

Inspect current CPU usage:

<code>$ kubectl get hpa
NAME    REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
simple  Deployment/simple  0/500m    1         10        1          73m</code>

Run a load test against the HPA‑managed service:

<code>$ kubectl exec -it deploy/tools -- wrk -c 2 -t 1 -d 90s http://simple.default.svc/who/hostname
Running 2m test @ http://simple.default.svc/who/hostname
  1 threads and 2 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   219.94us  424.96us  19.51ms   98.82%
    Req/Sec    10.18k    1.18k   12.72k    67.78%
  911430 requests in 1.50m, 122.56MB read
Requests/sec:  10126.58
Transfer/sec:      1.36MB</code>

Watch scaling changes in real time:

<code>$ kubectl get hpa -w
... (output showing replica count adjustments as CPU usage rises and falls) ...</code>

Default Scaling Policies

Scale‑down stability window: 300 seconds; only one strategy allows 100% of current replicas to be removed, enabling the target to shrink to the minimum.

Scale‑up: No stability window; when metrics indicate scaling up, pods are added immediately. Two strategies exist: add up to 4 Pods or 100% of current replicas every 15 seconds until the HPA stabilizes.

Algorithm Details

The controller computes a scaling ratio from current and desired metrics. For example, if the current metric is 200m and the target is 100m, the replica count doubles (200/100 = 2). If the current metric is 50m, the count halves (50/100 = 0.5). When the ratio is close to 1.0 within a configurable tolerance (default 0.1), no scaling occurs.

Reference Documentation

Official documentation: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/

Conclusion

Kubernetes HPA is a key tool for achieving elastic scaling in cloud‑native environments. Understanding its concepts and practical configuration enables you to build more efficient and reliable applications. Experiment with your own workloads to explore advanced features.

cloud-nativekubernetesDevOpsautoscalingK8sHPA
Linux Ops Smart Journey
Written by

Linux Ops Smart Journey

The operations journey never stops—pursuing excellence endlessly.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.