Boosting Resource Utilization with Event‑Driven Autoscaling for Caribbean Panda’s Game System
By integrating a Flask‑based custom metric service with KEDA on Amazon EKS, Caribbean Panda reduced its pod count from 360 to 36 during idle periods while still scaling rapidly during traffic spikes, achieving significantly higher resource efficiency and lower costs.
Problem with Traditional Autoscaling
Caribbean Panda’s global game platform experiences highly variable traffic across time zones. The default Kubernetes Horizontal Pod Autoscaler (HPA) reacts only to CPU and memory metrics, which do not accurately reflect the load of asynchronous, message‑driven workloads. This leads to three concrete issues:
Scaling actions that are too slow or insufficient.
Resource waste and sudden cost spikes.
Inability to respond quickly to bursty request volumes.
Custom Metric Service and Event‑Driven Scaling
To address these gaps, the team built a Flask‑based custom metric service that serves as an external scaler for KEDA. The service exposes two business‑level metrics: waittime: the queue waiting time, used for a graduated scaling policy that expands more aggressively when wait time grows. tasklen: the number of in‑flight tasks, applied when waittime is zero to avoid premature down‑scaling.
KEDA consumes these metrics and forwards them to the native HPA, enabling precise, business‑aligned scaling decisions.
Observed Benefits
After several months in production, the system reduced the average number of Pods from 360 (idle) to 36 while maintaining service‑level agreements. During peak periods the system scaled up quickly to absorb traffic spikes. The result was a marked decrease in compute‑resource costs and reduced manual intervention for the operations team.
Solution Overview
The architecture combines the following components:
Amazon EKS – managed Kubernetes control plane and node groups.
KEDA – extends Kubernetes autoscaling with event‑driven triggers.
Kubernetes HPA – executes the scaling decisions based on the metrics supplied by KEDA.
Cluster Autoscaler or Karpenter (optional) – provides node‑level auto‑scaling.
External event sources – e.g., Amazon SQS, HTTP request counters, Kafka streams.
These pieces preserve native Kubernetes observability while adding business‑event awareness.
Architecture Diagram and Scaling Flow
Typical event‑driven autoscaling proceeds as follows:
Business events occur (e.g., messages enter a queue, sudden request surge).
KEDA monitors the configured trigger and reads the custom metrics.
KEDA forwards the metric values to the Kubernetes HPA.
The HPA adjusts the number of Pod replicas.
If needed, the node‑level autoscaler (Cluster Autoscaler/Karpenter) adds or removes nodes.
The system stabilizes and scales down automatically as event rates decline.
Implementation Steps
1. Install KEDA
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda \
--namespace keda \
--create-namespace2. Define a ScaledObject
The ScaledObject links a Deployment to an external trigger. Below is an example that scales based on the length of an Amazon SQS queue:
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: sqs-scaledobject
spec:
scaleTargetRef:
name: worker-deployment
minReplicaCount: 0
maxReplicaCount: 10
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.ap-southeast-1.amazonaws.com/123456789012/my-queue
queueLength: "5"
awsRegion: ap-southeast-1This configuration tells KEDA to increase replicas when the queue length reaches five or more messages, while also supporting "Scale to Zero" when no events are present.
Best Practices
Combine multiple triggers in a single ScaledObject to react to different business dimensions (e.g., wait time and task count) simultaneously.
Avoid configuring both a traditional HPA and a KEDA ScaledObject for the same workload, as they may compete and produce inconsistent scaling behavior. Prefer KEDA’s built‑in triggers when you need event‑driven decisions alongside resource metrics.
Conclusion
The integration of KEDA with Amazon EKS and the native HPA enables event‑driven, elastic autoscaling that aligns resource allocation with real business load. This approach is especially suited for message‑driven and asynchronous task processing, delivering higher resource utilization, lower costs, and faster response to traffic fluctuations while retaining the observability and control of standard Kubernetes mechanisms.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Amazon Cloud Developers
Official technical community of Amazon Cloud. Shares practical AI/ML, big data, database, modern app development, IoT content, offers comprehensive learning resources, hosts regular developer events, and continuously empowers developers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
