Post-Cloud-Native Era: Scalable Operations Challenges and the KusionStack Solution
The article examines how the proliferation of heterogeneous cloud‑native, IaaS, and internal services in the post‑cloud‑native era has amplified the difficulty of large‑scale operations, critiques traditional PaaS approaches, and presents the open‑source KusionStack platform as a unified, automated solution for efficient, collaborative infrastructure management.
More than eight years have passed since the first Kubernetes commit, and cloud‑native technologies are now the standard for modern applications. Modern apps rely on a mix of Kubernetes ecosystems, IaaS services, and internal systems, often across multi‑cloud or hybrid environments, ushering in a "post‑cloud‑native era" where Kubernetes‑only operational tools are insufficient.
Within enterprises, these services are maintained by different teams, requiring extensive cross‑team coordination for large‑scale operations. The complexity of the technology combined with inefficient collaboration makes scaling operations exponentially harder.
While the challenge of scaling heterogeneous infrastructure predates cloud‑native, the rise of DevOps and Platform Engineering has not fully resolved it. Questions about Dev and Ops collaboration, responsibility division, and rapid delivery of infrastructure capabilities remain, prompting a search for solutions that align with current technological trends.
Traditional PaaS platforms—web consoles that abstract infrastructure via front‑end, back‑end, and API layers—have served well for a decade but now suffer from two fatal drawbacks: they are "people‑intensive" and "time‑intensive." Adding even a small feature often requires weeks of coordinated development across multiple teams.
To address these issues, the authors propose two ideas: (1) enable App Developers to self‑serve infrastructure capabilities without PaaS mediation, and (2) build a centralized collaboration platform that standardizes communication and automates workflows.
After two years of internal experimentation at Ant Group, the team distilled their findings into an end‑to‑end open‑source solution called KusionStack . The system comprises three core components:
Konfig: a Git‑based monorepo serving as a centralized platform for multi‑team operational intent.
KCL: Ant’s custom configuration policy DSL that facilitates communication between teams.
Kusion: the KusionStack engine that executes all operational actions.
Platform developers define infrastructure capability models in KCL; App Developers then import and mix these models into their AppConfig, exposing only the properties they need while hiding underlying complexity. After compilation, AppConfig generates resources for heterogeneous infrastructures, which are delivered to the Kusion engine via CI, CLI, or GUI.
The engine validates, orchestrates, previews, applies, and monitors resources, providing Kubernetes‑friendly workflows, observability, health checks, and simplified reconciliation feedback. A demo GIF illustrates how the engine visualizes resource reconciliation until they become usable.
The solution emphasizes three characteristics:
Application‑centric configuration management covering compute, network, storage, and full lifecycle from code to production.
Unified operation of heterogeneous infrastructure, offering Kubernetes‑friendly workflows and Terraform‑based multi‑runtime resource management.
Scalable collaborative platform with flexible workflows, separation of App Dev and Platform Dev concerns, and a client‑side risk‑left‑shift approach.
Deployed across Ant Group’s multi‑cloud application delivery, compute and data infrastructure, site operations, and database management, KusionStack has involved over 400 developers, generated nearly 800 K commits (mostly automated), executed ~1 K pipelines daily, and compiled over 10 K KCL instances, producing more than 3 M lines of YAML.
The authors invite the broader community to contribute to this open‑source effort, aiming to build a solution that truly addresses enterprise‑scale operations in the emerging post‑cloud‑native era.
AntTech
Technology is the core driver of Ant's future creation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.