KapacityStack: Open‑Source Cloud‑Native Intelligent Capacity Management and IHPA
KapacityStack is an open‑source, cloud‑native capacity platform from Ant Group that introduces the Intelligent Horizontal Pod Autoscaler (IHPA) to provide predictive, multi‑level, and stable autoscaling, reducing resource waste, carbon emissions, and operational costs while supporting extensible, modular integration with Kubernetes workloads.
KapacityStack, built on Ant Group's large‑scale production experience, offers a comprehensive cloud‑native capacity technology that aims to improve cost efficiency, reduce carbon emissions, and address capacity challenges with robust risk management.
The project’s source code is hosted at https://github.com/traas-stack/kapacity .
In the digital economy, rapid growth in data and compute demand leads to high resource consumption and carbon emissions. Ant Group has pursued "green computing" since 2019, developing technologies such as hybrid deployment, AI‑elastic capacity, cloud‑native time‑slice scheduling, and green AI.
During the 2022 Double‑11 event, Ant Group saved 1.538 M kWh of electricity and reduced 947 t of CO₂, equivalent to the annual carbon sequestration of 79 000 trees.
Leveraging cloud‑native architecture, Ant has researched and built AI‑elastic capacity capabilities—including elastic capacity, intelligent capacity data, stability, and operations—accumulating algorithms and best‑practice risk mitigations that now save roughly 100 k cores of compute annually.
KapacityStack open‑sources this technology, providing an extensible, intelligent capacity system for the community.
Key Technical Features
The native Kubernetes Horizontal Pod Autoscaler (HPA) has limitations: reactive scaling, linear metric assumptions, lack of risk controls, and tight coupling to specific K8s versions.
Kapacity’s first core open‑source capability, the Intelligent Horizontal Pod Autoscaler (IHPA), addresses all these issues.
▌ Intelligent Elasticity
IHPA treats elasticity as a data‑driven decision process, supporting multiple algorithms (timed, reactive, predictive, burst‑type) and allowing custom strategy composition for precise scaling.
For predictive scaling, IHPA uses a machine‑learning pipeline: Swish Net for Time Series Forecasting (SNTSF) predicts influencing traffic streams, then a Linear‑Residual Model combines these forecasts with capacity metrics to recommend replica counts, handling non‑linear relationships and multi‑period traffic.
▌ Multi‑Level Elasticity
IHPA defines four pod states to enable fine‑grained control:
Online – running and ready (default for new pods).
Cutoff – running but not ready; used for rapid scaling‑down with a stability observation period.
Standby – resources swapped out, fully released, with minute‑level rollback to Online.
Deleted – pod fully removed.
Combining these states enables advanced techniques such as large‑scale time‑slice scheduling and hot‑pool management.
▌ Stability Assurance
IHPA incorporates Ant’s extensive production experience to provide stability guarantees, including gray‑scale rollout, multi‑stage gray‑scale using Cutoff/Standby, and custom stability checks with automatic circuit‑breakers for unattended elastic changes.
▌ Extensible Design
IHPA is modular, split into control, decision, and execution components, each replaceable. Extensibility includes custom algorithms, pod state logic, stability checks, and pod‑priority policies, allowing integration with other open‑source solutions.
Current Status and Future Roadmap
Version 0.1 (early stage) provides multi‑level elasticity, gray‑scale changes, and basic timed/reactive algorithms. Version 0.2 will open the predictive algorithm. Future work includes burst‑detection, enhanced stability checks, richer custom metrics, standby‑based time‑slice scheduling, intelligent resource recommendation (CPU/Memory, VPA), and a visual console for cost and carbon accounting.
For updates, see the roadmap at https://kapacity.netlify.app/zh-cn/docs/roadmap .
Join the Community
Kapacity aims to build an open, collaborative community. Contributions, issues, pull requests, and discussions are welcomed via the GitHub repository. Community groups on WeChat, DingTalk, and the official public account provide channels for further engagement.
AntTech
Technology is the core driver of Ant's future creation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.