Cloud Native 12 min read

Optimizing Kubernetes Cluster Load: From Static Scheduling to Advanced Resource Management

The article explains Kubernetes’ static scheduler causes fragmented, under‑utilized clusters, then proposes dynamic techniques—pod resource compression, node resource oversell via admission webhooks, and an enhanced per‑HPA autoscaling controller—while outlining future scheduler extensions, monitoring integration with Tencent Cloud, and a senior cloud‑native engineer recruitment call.

Tencent Cloud Developer

Sep 12, 2019

Optimizing Kubernetes Cluster Load: From Static Scheduling to Advanced Resource Management

Kubernetes uses a static scheduler that matches Pod request resources with Node allocatable resources. While static scheduling is simple and efficient, it often leads to a situation where the cluster appears fully allocated but the actual workload is low and node loads are imbalanced.

The article first explains why Kubernetes adopts static scheduling: dynamic scheduling that satisfies all enterprise workloads is extremely difficult, so a static approach is preferred for simplicity.

It then describes the composition of cluster resources, using CPU as an example. Each node reserves resources (system‑reserved, kube‑reserved, eviction‑hard) which are subtracted from the node’s allocatable capacity. In practice, about 5%–10% of resources become fragmented across nodes, making it hard to match user‑requested container specs with the available fragments.

To address low cluster utilization, several technical solutions are proposed:

Pod Resource Compression : Reduce the Pod request resources by a configurable ratio (stored in a workload annotation such as stke.platform/cpu-requests-ratio) at Pod creation or recreation. The compression ratio is dynamically adjusted by a custom reconciler based on historical workload metrics.

Node Resource Oversell : Apply a configurable oversell ratio (e.g., stke.platform/cpu-oversale-ratio) to Node allocatable and capacity resources via a Mutating Admission Webhook that patches Node status. The ratio is periodically tuned by a reconciler using node‑level load history.

Enhanced Autoscaling (HPAPlus‑Controller) : Replace the built‑in HPA controller with a per‑HPA goroutine model, allowing independent scaling policies, custom response times, workload‑aware disabling, and scaling based on pod limits and historical metrics. It also integrates with multiple monitoring back‑ends via an Extension API server.

Additional ideas such as scheduler extenders, business‑level quota management, and online/offline workload mixing are mentioned as future work.

The article concludes that these dynamic resource‑management techniques rely heavily on robust container monitoring and that the author’s team is collaborating with Tencent Cloud Monitoring to improve large‑scale cloud migration.

At the end of the article, a recruitment notice for senior cloud‑native engineers is included.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes Autoscaling node oversell Resource Compression Static Scheduling

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.