Cloud Native 33 min read

How Koordinator Enhances Kubernetes Scheduling for Mixed Workloads

Koordinator is a QoS‑based Kubernetes scheduler that boosts efficiency and reliability for latency‑sensitive services and batch jobs, offering fine‑grained resource coordination, flexible priority classes, load‑aware scheduling, and integrated monitoring tools to maximize cluster utilization.

Ops Development Stories

Oct 21, 2024

How Koordinator Enhances Kubernetes Scheduling for Mixed Workloads

Koordinator is a QoS‑based Kubernetes mixed‑workload scheduling system designed to improve runtime efficiency and reliability for latency‑sensitive workloads and batch jobs, simplify resource‑related configuration, and increase pod deployment density.

Mixed workloads run different types of jobs such as batch processing, interactive tasks, and real‑time data processing on the same hardware.

It is a high‑performance, scalable solution validated in large‑scale production environments.

Koordinator enhances the Kubernetes user experience with the following features:

Carefully designed priority and QoS mechanisms that allow different workload types to coexist on a cluster and run on the same node.

Resource over‑commitment to achieve high utilization while still meeting QoS guarantees through application analysis.

Fine‑grained resource coordination and isolation to improve efficiency of latency‑sensitive and batch workloads.

Flexible job scheduling supporting specific domains such as big data, AI, audio, and video.

A complete set of tools for monitoring, troubleshooting, and operations.

Koordinator QoS vs Kubernetes QoS

Kubernetes defines three QoS classes: Guaranteed, Burstable, and BestEffort. Koordinator is compatible with native QoS and adds many enhancements. To avoid interfering with native QoS semantics, Koordinator introduces an independent field koordinator.sh/qosClass to describe QoS in mixed‑workload scenarios.

Koordinator scheduler vs kube‑scheduler

The Koordinator scheduler does not replace kube‑scheduler; it improves mixed‑workload performance on Kubernetes by adding scheduling plugins for mixed‑workload and priority preemption, and aims to upstream these enhancements.

Architecture

Koordinator consists of two control‑plane components (Koordinator Scheduler and Koordinator Manager) and a DaemonSet component (Koordlet). It extends Kubernetes with mixed‑workload capabilities while remaining compatible with native workloads.

Components

The core components are:

Koord‑Scheduler

Deployed as a Deployment, it enhances Kubernetes with QoS‑aware scheduling, differentiated SLOs, and task scheduling features such as elastic quota management, gang scheduling, and heterogeneous resource scheduling.

QoS‑aware scheduling balances load across nodes and supports resource over‑commitment.

Differentiated SLOs provide fine‑grained CPU, memory, network, and disk I/O isolation.

Task scheduling includes elastic quota, gang scheduling, and heterogeneous resource support for big data and AI workloads.

Gang Scheduling treats a group of Pods with the same scheduling requirements as a single unit for scheduling, migration, and termination.

Additional capabilities include Reservation and Node Reservation for reserving resources for specific Pods or non‑container workloads.

Koord‑Decheduler

Deployed as a Deployment, it provides an enhanced descheduler with a new framework for load‑aware rescheduling and safety‑watermark based eviction.

Redesign of the rescheduling framework improves scalability, determinism, and security.

Load‑aware rescheduling uses a safety water‑mark to trigger eviction of Pods from overloaded nodes.

Water‑mark: a resource usage threshold on a node that triggers rescheduling when reached.

Koord‑Manager

Deployed as a Deployment with leader and backup instances, it includes controllers and webhooks for mixed‑workload coordination, resource over‑commitment, and SLO management. It provides three main components: Colocation Profile, SLO controller, and a future Recommender that uses histograms to predict peak resource demand.

Koordlet

Deployed as a DaemonSet on each node, it supports resource over‑commitment, interference detection, and QoS enforcement. Its modules include resource profiling, isolation, interference detection, QoS management, and resource tuning.

Koord‑RuntimeProxy

Deployed as a systemd service, it proxies CRI requests between Kubelet and containerd/docker, enabling fine‑grained cgroup configuration for different QoS Pods.

Resource Model

The mixed‑workload resource model uses four lines: limit (requested resources), usage (actual usage), short‑term reservation (estimated near‑future usage), and long‑term reservation (estimated longer‑term usage), allowing efficient utilization of idle resources.

SLO Description

SLOs consist of priority (scheduling order) and QoS (runtime quality such as CPU share, cfs quota, memory limits, OOM priority).

Priority

Koordinator defines four PriorityClass values on top of Kubernetes priority: koord-prod: latency‑sensitive production services. koord-mid: long‑duration resources for AI training workloads. koord-batch: short‑duration offline batch jobs. koord-free: low‑priority batch jobs that use any leftover capacity.

Example PriorityClass definitions:

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: koord-prod
value: 9000
description: "This priority class should be used for prod service pods only."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: koord-mid
value: 7000
description: "This priority class should be used for mid service pods only."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: koord-batch
value: 5000
description: "This priority class should be used for batch service pods only."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: koord-free
value: 3000
description: "This priority class should be used for free service pods only."

Example Pod using a PriorityClass:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    env: test
    koordinator.sh/priority: "5300"
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  priorityClassName: koord-batch

QoS

Koordinator defines five QoS types that provide more granular control than native Kubernetes QoS, enabling fine‑tuned resource isolation for mixed workloads.

Installation

Koordinator requires Kubernetes 1.18+ and a Linux kernel 4.19+. It is typically installed via Helm:

helm repo add koordinator-sh https://koordinator-sh.github.io/charts/
helm repo update
helm install koordinator koordinator-sh/koordinator --version 1.5.0 --set imageRepositoryHost=registry.cn-beijing.aliyuncs.com --set manager.hostNetwork=true

Components are installed in the koordinator-system namespace; you can view them with kubectl get pods -n koordinator-system.

Usage

After installation, you can schedule workloads with Koordinator.

Load‑Aware Scheduling

This plugin selects nodes with the lowest load during scheduling, balancing resource usage and avoiding hotspots. It filters unhealthy nodes and scores nodes based on resource usage, considering both current usage and estimated future requests.

Global configuration can be set in the koord-scheduler-config ConfigMap, for example:

apiVersion: v1
kind: ConfigMap
metadata:
  name: koord-scheduler-config
data:
  koord-scheduler-config: |
    apiVersion: kubescheduler.config.k8s.io/v1beta2
    kind: KubeSchedulerConfiguration
    profiles:
    - schedulerName: koord-scheduler
      plugins:
        filter:
          enabled:
          - name: LoadAwareScheduling
        score:
          enabled:
          - name: LoadAwareScheduling
            weight: 1
      pluginConfig:
      - name: LoadAwareScheduling
        args:
          filterExpiredNodeMetrics: true
          nodeMetricExpirationSeconds: 300
          resourceWeights:
            cpu: 1
            memory: 1
          usageThresholds:
            cpu: 75
            memory: 85
          prodUsageThresholds:
            cpu: 55
            memory: 65
          scoreAccordingProdUsage: true
          estimatedScalingFactors:
            cpu: 80
            memory: 70
          aggregated:
            usageThresholds:
              cpu: 65
              memory: 75
            usageAggregationType: "p99"
            scoreAggregationType: "p99"

Parameters such as filterExpiredNodeMetrics, resourceWeights, and usageThresholds can be tuned per cluster needs.

Load‑Aware Descheduling

The LowNodeLoad plugin in koord‑descheduler evicts Pods from nodes whose resource usage exceeds a high threshold and migrates them to idle nodes. It supports configurable high/low thresholds, namespace filters, pod selectors, and nodeFit checks.

apiVersion: descheduler/v1alpha2
kind: ConfigMap
metadata:
  name: koord-descheduler-config
data:
  koord-descheduler-config: |
    apiVersion: descheduler/v1alpha2
    kind: DeschedulerConfiguration
    deschedulingInterval: 60s
    profiles:
    - name: koord-descheduler
      plugins:
        balance:
          enabled:
          - name: LowNodeLoad
      pluginConfig:
      - name: LowNodeLoad
        args:
          lowThresholds:
            cpu: 20
            memory: 30
          highThresholds:
            cpu: 50
            memory: 60
          evictableNamespaces:
            exclude:
            - "kube-system"
            - "koordinator-system"

When a node exceeds the high thresholds, the descheduler evicts selected Pods (respecting filters) and places them on nodes below the low thresholds, ensuring the total migrated resource demand fits within the available capacity.

Conclusion

Koordinator provides a rich set of load‑aware scheduling and descheduling capabilities, fine‑grained QoS, priority classes, and resource reservation mechanisms that enable efficient mixed‑workload orchestration on Kubernetes.

Reference documentation: https://koordinator.sh/zh-Hans/docs

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Kubernetes Resource Management Scheduler Load-Aware Scheduling Koordinator

Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.