Cloud Native 11 min read

KubeDoor: AI‑Driven Kubernetes Load‑Aware Scheduling & Capacity Management

KubeDoor is an open‑source platform built with Python and Vue that leverages Kubernetes admission control, AI recommendations, and expert experience to provide load‑aware scheduling, capacity governance, real‑time resource analytics, and automated scaling for microservices, featuring a web UI, Grafana dashboards, and extensible control mechanisms.

Ops Development Stories
Ops Development Stories
Ops Development Stories
KubeDoor: AI‑Driven Kubernetes Load‑Aware Scheduling & Capacity Management

KubeDoor

Flower blossom – KubeDoor is a microservice resource governance platform developed with Python and Vue, based on Kubernetes admission control, focusing on daily peak‑time resource views to ensure that resource request rates match actual usage.

Overview

Flower blossom – KubeDoor collects daily P95 CPU and memory consumption of microservices during peak periods, along with request, limit values and pod counts, and presents them in a Grafana dashboard integrated into the web UI.

Project repository: https://github.com/CassInfra/KubeDoor

Architecture Diagram

Feature Description

Data Collection

Collect daily P95 CPU and memory usage of Kubernetes microservices during peak periods, as well as request, limit values and pod numbers, and build a Grafana dashboard integrated into the web UI.

Based on daily P95 resource data, long‑term resource trends can be observed smoothly even for a year of data.

Global peak‑time resource statistics and top‑10 resource consumers.

Namespace‑level peak‑time P95 usage and its proportion of total resources.

Microservice‑level peak‑time overall resource and utilization analysis.

Microservice and pod‑level resource curves (request, limit, usage).

Core Logic

Using admission control to ensure that each microservice’s pod count, request value, and limit value are consistent with the database, achieving unified governance and load‑aware scheduling.

Uncontrolled microservices will fail to deploy and require addition via the web UI before deployment.

Extending Kubernetes admission mechanisms enables custom interception, management, policy, and labeling of microservices.

Web UI

Displays latest daily P95 resources, pod counts, and limit values for microservices.

Supports immediate, scheduled, and periodic scaling and restart operations.

Based on NGINX basic authentication, supports LDAP and full audit logging with notifications.

Integrates Grafana dashboards for elegant data visualization.

Deployment Controls

When a microservice is updated or deployed, admission control validates pod count, request, and limit values against the database to ensure consistent usage and enable balanced scheduling.

Kubernetes QoS prioritizes pods with real demand values, guaranteeing critical services.

Roadmap (2025)

Multi‑cluster support with unified Web UI.

English version release.

Real‑time monitoring integration, one‑click deployment, AI‑driven anomaly analysis.

Microservice AI scoring to detect waste and suggest cost‑saving actions.

AI‑driven scaling based on peak‑period data.

Node resource‑based scheduling and control.

Collect additional metrics such as QPS, JVM, GC.

Fine‑grained pod operations: isolation, deletion, dump, jstack, jfr, jvm.

Deployment Guide

Prerequisites

Prometheus monitoring with

cadvisor

and

kube-state-metrics

to collect metrics:

container_cpu_usage_seconds_total
container_memory_working_set_bytes
container_spec_cpu_quota
kube_pod_container_info
kube_pod_container_resource_limits
kube_pod_container_resource_requests

Step 1: Install Cert‑Manager

<code>kubectl apply -f https://StarsL.cn/kubedoor/00.cert-manager_v1.16.2_cn.yaml</code>

Step 2: Deploy ClickHouse

<code># Default using Docker Compose under /opt/clickhouse
curl -s https://StarsL.cn/kubedoor/install-clickhouse.sh|sudo bash
cd /opt/clickhouse && docker compose up -d</code>

If ClickHouse already exists, run the initialization SQL from

https://StarsL.cn/kubedoor/kubedoor-init.sql

.

Step 3: Deploy KubeDoor

<code>wget https://StarsL.cn/kubedoor/kubedoor.tgz
tar -zxvf kubedoor.tgz
# Edit values.yaml as needed
vim kubedoor/values.yaml
helm install kubedoor ./kubedoor</code>

Step 4: Access Web UI and Initialize Data

Visit NodeIP:NodePort of

kubedoor-web

with default credentials

kubedoor

.

Open “Configuration Center”, set historical data length, click “Collect and Update” to load past data and update peak‑time records.

By default, ten days of data are collected from Prometheus (one month recommended); the maximum consumption day within these ten days is written to the control table. Re‑executing “Collect and Update” will not duplicate entries.

Control Switch

Toggle “Control Status” to enable or disable governance. When enabled, only Deployment creation, update, and scaling are intercepted; pod count, request, and limit values are governed.

<code># Enable control
kubectl apply -f https://StarsL.cn/kubedoor/99.kubedoor-Mutating.yaml
# Disable control
kubectl delete mutatingwebhookconfigurations kubedoor-webhook-configuration</code>

Control Examples

Scaling a Deployment by 10 pods triggers interception; the system queries the database for the correct pod count and performs scaling via the KubeDoor web UI.

Updating a Deployment image triggers interception; the system retrieves pod count, request, and limit values from the database and applies the update accordingly.

Control Principles

Operations that do not modify pod count or trigger a restart will update the Deployment without restarting.

Operations that modify pod count will update the Deployment based on database values without restarting.

Operations that trigger a restart will apply database values and then restart the Deployment.

Acknowledgements

Thanks to the following projects that made KubeDoor possible:

Backend: Flask, Grafana, Nginx

Frontend: Vue, Element Plus, pure‑admin

Special thanks to CassTime and the support from 开思.

cloud-nativemicroservicesKubernetesresource managementAdmission ControllerAI Scheduling
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.