How Koordinator Enhances Kubernetes Scheduling for Mixed Workloads
Koordinator is a QoS‑based Kubernetes scheduler that boosts efficiency and reliability for latency‑sensitive services and batch jobs, offering fine‑grained resource coordination, flexible priority classes, load‑aware scheduling, and integrated monitoring tools to maximize cluster utilization.
Koordinator is a QoS‑based Kubernetes mixed‑workload scheduling system designed to improve runtime efficiency and reliability for latency‑sensitive workloads and batch jobs, simplify resource‑related configuration, and increase pod deployment density.
Mixed workloads run different types of jobs such as batch processing, interactive tasks, and real‑time data processing on the same hardware.
It is a high‑performance, scalable solution validated in large‑scale production environments.
Koordinator enhances the Kubernetes user experience with the following features:
Carefully designed priority and QoS mechanisms that allow different workload types to coexist on a cluster and run on the same node.
Resource over‑commitment to achieve high utilization while still meeting QoS guarantees through application analysis.
Fine‑grained resource coordination and isolation to improve efficiency of latency‑sensitive and batch workloads.
Flexible job scheduling supporting specific domains such as big data, AI, audio, and video.
A complete set of tools for monitoring, troubleshooting, and operations.
Koordinator QoS vs Kubernetes QoS
Kubernetes defines three QoS classes:
Guaranteed,
Burstable, and
BestEffort. Koordinator is compatible with native QoS and adds many enhancements. To avoid interfering with native QoS semantics, Koordinator introduces an independent field
koordinator.sh/qosClassto describe QoS in mixed‑workload scenarios.
Koordinator scheduler vs kube‑scheduler
The Koordinator scheduler does not replace kube‑scheduler; it improves mixed‑workload performance on Kubernetes by adding scheduling plugins for mixed‑workload and priority preemption, and aims to upstream these enhancements.
Architecture
Koordinator consists of two control‑plane components (Koordinator Scheduler and Koordinator Manager) and a DaemonSet component (Koordlet). It extends Kubernetes with mixed‑workload capabilities while remaining compatible with native workloads.
Components
The core components are:
Koord‑Scheduler
Deployed as a Deployment, it enhances Kubernetes with QoS‑aware scheduling, differentiated SLOs, and task scheduling features such as elastic quota management, gang scheduling, and heterogeneous resource scheduling.
QoS‑aware scheduling balances load across nodes and supports resource over‑commitment.
Differentiated SLOs provide fine‑grained CPU, memory, network, and disk I/O isolation.
Task scheduling includes elastic quota, gang scheduling, and heterogeneous resource support for big data and AI workloads.
Gang Scheduling treats a group of Pods with the same scheduling requirements as a single unit for scheduling, migration, and termination.
Additional capabilities include
Reservationand
Node Reservationfor reserving resources for specific Pods or non‑container workloads.
Koord‑Decheduler
Deployed as a Deployment, it provides an enhanced descheduler with a new framework for load‑aware rescheduling and safety‑watermark based eviction.
Redesign of the rescheduling framework improves scalability, determinism, and security.
Load‑aware rescheduling uses a safety water‑mark to trigger eviction of Pods from overloaded nodes.
Water‑mark: a resource usage threshold on a node that triggers rescheduling when reached.
Koord‑Manager
Deployed as a Deployment with leader and backup instances, it includes controllers and webhooks for mixed‑workload coordination, resource over‑commitment, and SLO management. It provides three main components: Colocation Profile, SLO controller, and a future Recommender that uses histograms to predict peak resource demand.
Koordlet
Deployed as a DaemonSet on each node, it supports resource over‑commitment, interference detection, and QoS enforcement. Its modules include resource profiling, isolation, interference detection, QoS management, and resource tuning.
Koord‑RuntimeProxy
Deployed as a systemd service, it proxies CRI requests between Kubelet and containerd/docker, enabling fine‑grained cgroup configuration for different QoS Pods.
Resource Model
The mixed‑workload resource model uses four lines:
limit(requested resources),
usage(actual usage),
short‑term reservation(estimated near‑future usage), and
long‑term reservation(estimated longer‑term usage), allowing efficient utilization of idle resources.
SLO Description
SLOs consist of priority (scheduling order) and QoS (runtime quality such as CPU share, cfs quota, memory limits, OOM priority).
Priority
Koordinator defines four PriorityClass values on top of Kubernetes priority:
koord-prod: latency‑sensitive production services.
koord-mid: long‑duration resources for AI training workloads.
koord-batch: short‑duration offline batch jobs.
koord-free: low‑priority batch jobs that use any leftover capacity.
Example PriorityClass definitions:
<code>apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: koord-prod
value: 9000
description: "This priority class should be used for prod service pods only."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: koord-mid
value: 7000
description: "This priority class should be used for mid service pods only."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: koord-batch
value: 5000
description: "This priority class should be used for batch service pods only."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: koord-free
value: 3000
description: "This priority class should be used for free service pods only."
</code>Example Pod using a PriorityClass:
<code>apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
koordinator.sh/priority: "5300"
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
priorityClassName: koord-batch
</code>QoS
Koordinator defines five QoS types that provide more granular control than native Kubernetes QoS, enabling fine‑tuned resource isolation for mixed workloads.
Installation
Koordinator requires Kubernetes 1.18+ and a Linux kernel 4.19+. It is typically installed via Helm:
<code>helm repo add koordinator-sh https://koordinator-sh.github.io/charts/
helm repo update
helm install koordinator koordinator-sh/koordinator --version 1.5.0 --set imageRepositoryHost=registry.cn-beijing.aliyuncs.com --set manager.hostNetwork=true
</code>Components are installed in the
koordinator-systemnamespace; you can view them with
kubectl get pods -n koordinator-system.
Usage
After installation, you can schedule workloads with Koordinator.
Load‑Aware Scheduling
This plugin selects nodes with the lowest load during scheduling, balancing resource usage and avoiding hotspots. It filters unhealthy nodes and scores nodes based on resource usage, considering both current usage and estimated future requests.
Global configuration can be set in the
koord-scheduler-configConfigMap, for example:
<code>apiVersion: v1
kind: ConfigMap
metadata:
name: koord-scheduler-config
data:
koord-scheduler-config: |
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: koord-scheduler
plugins:
filter:
enabled:
- name: LoadAwareScheduling
score:
enabled:
- name: LoadAwareScheduling
weight: 1
pluginConfig:
- name: LoadAwareScheduling
args:
filterExpiredNodeMetrics: true
nodeMetricExpirationSeconds: 300
resourceWeights:
cpu: 1
memory: 1
usageThresholds:
cpu: 75
memory: 85
prodUsageThresholds:
cpu: 55
memory: 65
scoreAccordingProdUsage: true
estimatedScalingFactors:
cpu: 80
memory: 70
aggregated:
usageThresholds:
cpu: 65
memory: 75
usageAggregationType: "p99"
scoreAggregationType: "p99"
</code>Parameters such as
filterExpiredNodeMetrics,
resourceWeights, and
usageThresholdscan be tuned per cluster needs.
Load‑Aware Descheduling
The
LowNodeLoadplugin in
koord‑deschedulerevicts Pods from nodes whose resource usage exceeds a high threshold and migrates them to idle nodes. It supports configurable high/low thresholds, namespace filters, pod selectors, and nodeFit checks.
<code>apiVersion: descheduler/v1alpha2
kind: ConfigMap
metadata:
name: koord-descheduler-config
data:
koord-descheduler-config: |
apiVersion: descheduler/v1alpha2
kind: DeschedulerConfiguration
deschedulingInterval: 60s
profiles:
- name: koord-descheduler
plugins:
balance:
enabled:
- name: LowNodeLoad
pluginConfig:
- name: LowNodeLoad
args:
lowThresholds:
cpu: 20
memory: 30
highThresholds:
cpu: 50
memory: 60
evictableNamespaces:
exclude:
- "kube-system"
- "koordinator-system"
</code>When a node exceeds the high thresholds, the descheduler evicts selected Pods (respecting filters) and places them on nodes below the low thresholds, ensuring the total migrated resource demand fits within the available capacity.
Conclusion
Koordinator provides a rich set of load‑aware scheduling and descheduling capabilities, fine‑grained QoS, priority classes, and resource reservation mechanisms that enable efficient mixed‑workload orchestration on Kubernetes.
Reference documentation: https://koordinator.sh/zh-Hans/docs
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.