KubeDoor: AI‑Driven Kubernetes Load‑Aware Scheduling & Capacity Management
KubeDoor is an open‑source platform built with Python and Vue that leverages Kubernetes admission control, AI recommendations, and expert experience to provide load‑aware scheduling, capacity governance, real‑time resource analytics, and automated scaling for microservices, featuring a web UI, Grafana dashboards, and extensible control mechanisms.
KubeDoor
Flower blossom – KubeDoor is a microservice resource governance platform developed with Python and Vue, based on Kubernetes admission control, focusing on daily peak‑time resource views to ensure that resource request rates match actual usage.
Overview
Flower blossom – KubeDoor collects daily P95 CPU and memory consumption of microservices during peak periods, along with request, limit values and pod counts, and presents them in a Grafana dashboard integrated into the web UI.
Project repository: https://github.com/CassInfra/KubeDoor
Architecture Diagram
Feature Description
Data Collection
Collect daily P95 CPU and memory usage of Kubernetes microservices during peak periods, as well as request, limit values and pod numbers, and build a Grafana dashboard integrated into the web UI.
Based on daily P95 resource data, long‑term resource trends can be observed smoothly even for a year of data.
Global peak‑time resource statistics and top‑10 resource consumers.
Namespace‑level peak‑time P95 usage and its proportion of total resources.
Microservice‑level peak‑time overall resource and utilization analysis.
Microservice and pod‑level resource curves (request, limit, usage).
Core Logic
Using admission control to ensure that each microservice’s pod count, request value, and limit value are consistent with the database, achieving unified governance and load‑aware scheduling.
Uncontrolled microservices will fail to deploy and require addition via the web UI before deployment.
Extending Kubernetes admission mechanisms enables custom interception, management, policy, and labeling of microservices.
Web UI
Displays latest daily P95 resources, pod counts, and limit values for microservices.
Supports immediate, scheduled, and periodic scaling and restart operations.
Based on NGINX basic authentication, supports LDAP and full audit logging with notifications.
Integrates Grafana dashboards for elegant data visualization.
Deployment Controls
When a microservice is updated or deployed, admission control validates pod count, request, and limit values against the database to ensure consistent usage and enable balanced scheduling.
Kubernetes QoS prioritizes pods with real demand values, guaranteeing critical services.
Roadmap (2025)
Multi‑cluster support with unified Web UI.
English version release.
Real‑time monitoring integration, one‑click deployment, AI‑driven anomaly analysis.
Microservice AI scoring to detect waste and suggest cost‑saving actions.
AI‑driven scaling based on peak‑period data.
Node resource‑based scheduling and control.
Collect additional metrics such as QPS, JVM, GC.
Fine‑grained pod operations: isolation, deletion, dump, jstack, jfr, jvm.
Deployment Guide
Prerequisites
Prometheus monitoring with
cadvisorand
kube-state-metricsto collect metrics:
container_cpu_usage_seconds_total container_memory_working_set_bytes container_spec_cpu_quota kube_pod_container_info kube_pod_container_resource_limits kube_pod_container_resource_requestsStep 1: Install Cert‑Manager
<code>kubectl apply -f https://StarsL.cn/kubedoor/00.cert-manager_v1.16.2_cn.yaml</code>Step 2: Deploy ClickHouse
<code># Default using Docker Compose under /opt/clickhouse
curl -s https://StarsL.cn/kubedoor/install-clickhouse.sh|sudo bash
cd /opt/clickhouse && docker compose up -d</code>If ClickHouse already exists, run the initialization SQL from
https://StarsL.cn/kubedoor/kubedoor-init.sql.
Step 3: Deploy KubeDoor
<code>wget https://StarsL.cn/kubedoor/kubedoor.tgz
tar -zxvf kubedoor.tgz
# Edit values.yaml as needed
vim kubedoor/values.yaml
helm install kubedoor ./kubedoor</code>Step 4: Access Web UI and Initialize Data
Visit NodeIP:NodePort of
kubedoor-webwith default credentials
kubedoor.
Open “Configuration Center”, set historical data length, click “Collect and Update” to load past data and update peak‑time records.
By default, ten days of data are collected from Prometheus (one month recommended); the maximum consumption day within these ten days is written to the control table. Re‑executing “Collect and Update” will not duplicate entries.
Control Switch
Toggle “Control Status” to enable or disable governance. When enabled, only Deployment creation, update, and scaling are intercepted; pod count, request, and limit values are governed.
<code># Enable control
kubectl apply -f https://StarsL.cn/kubedoor/99.kubedoor-Mutating.yaml
# Disable control
kubectl delete mutatingwebhookconfigurations kubedoor-webhook-configuration</code>Control Examples
Scaling a Deployment by 10 pods triggers interception; the system queries the database for the correct pod count and performs scaling via the KubeDoor web UI.
Updating a Deployment image triggers interception; the system retrieves pod count, request, and limit values from the database and applies the update accordingly.
Control Principles
Operations that do not modify pod count or trigger a restart will update the Deployment without restarting.
Operations that modify pod count will update the Deployment based on database values without restarting.
Operations that trigger a restart will apply database values and then restart the Deployment.
Acknowledgements
Thanks to the following projects that made KubeDoor possible:
Backend: Flask, Grafana, Nginx
Frontend: Vue, Element Plus, pure‑admin
Special thanks to CassTime and the support from 开思.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.