Understanding kube-scheduler and How to Create a Custom Scheduler in Kubernetes
This article explains the role of kube-scheduler in Kubernetes, details its scheduling workflow and plugin framework, and demonstrates how to define a custom scheduler by configuring KubeSchedulerConfiguration, creating a ConfigMap, deploying it via a Deployment, and verifying its operation alongside the default scheduler.
kube-scheduler is a core control‑plane component of a Kubernetes cluster that assigns unscheduled Pods to Nodes.
The scheduler’s basic workflow consists of listening for Pods without a Node, filtering Nodes, scoring the filtered Nodes, and finally binding the Pod to the highest‑scoring Node (randomly choosing among ties).
These steps are visualised in the diagram below.
The scheduler is built from a series of extension points; users can implement custom code for these points to create a custom scheduler. Kubernetes provides many default plugins that implement one or more extension points, and the default scheduler is a composition of these plugins.
Extension points are executed sequentially, and each plugin may act on one or more points. The main extension points are:
queueSort : sorts Pods in the scheduling queue (only one can be enabled).
preFilter : pre‑processes or checks Pod/cluster information before filtering; can mark a Pod unschedulable.
filter : predicate plugins that filter out Nodes that cannot run the Pod.
postFilter : invoked when no feasible Node is found; can make the Pod schedulable.
preScore : informational hook before scoring.
score : assigns a score to each Node that passed filtering.
reserve : informs plugins when resources are reserved for a Pod; includes an Unreserve call on failure.
permit : can block or delay Pod binding.
preBind : performs work required before the Pod is bound.
bind : actually binds the Pod to a Node; once a bind succeeds remaining bind plugins are skipped.
postBind : informational hook after the Pod has been bound.
To create a custom scheduler without writing code, we can adjust the combination of default plugins. In the example below we change the scoring strategy of the NodeResourcesFit plugin to MostAllocated and set the VolumeBinding plugin’s bind timeout to 60 seconds.
Configuration KubeSchedulerConfiguration
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: my-custom-scheduler # scheduler name
plugins:
score:
enabled:
- name: NodeResourcesFit
weight: 1
pluginConfig:
- name: NodeResourcesFit
args:
scoringStrategy:
type: MostAllocated
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
- name: VolumeBinding
args:
bindTimeoutSeconds: 60Because the configuration object is essentially a kube‑scheduler config file, we store it in a ConfigMap for easy deployment.
apiVersion: v1
kind: ConfigMap
metadata:
name: my-scheduler-config
namespace: kube-system
data:
my-scheduler-config.yaml: |
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: my-custom-scheduler
plugins:
score:
enabled:
- name: NodeResourcesFit
weight: 1
pluginConfig:
- name: NodeResourcesFit
args:
scoringStrategy:
type: MostAllocated
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
- name: VolumeBinding
args:
bindTimeoutSeconds: 60Deploy the custom scheduler with a Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-custom-kube-scheduler
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
component: my-custom-kube-scheduler
template:
metadata:
labels:
component: my-custom-kube-scheduler
spec:
# serviceAccountName needs proper permissions
containers:
- command:
- kube-scheduler
- --leader-elect=false
- --config=/etc/kubernetes/my-scheduler-config.yaml
- -v=5
image: registry.k8s.io/kube-scheduler:v1.30.0
name: kube-scheduler
volumeMounts:
- name: my-scheduler-config
mountPath: /etc/kubernetes/my-scheduler-config.yaml
subPath: my-scheduler-config.yaml
volumes:
- name: my-scheduler-config
configMap:
name: my-scheduler-configThe Deployment uses the official kube‑scheduler image and only supplies a different configuration file.
To verify that the custom scheduler runs independently of the default scheduler, we deploy two Pods—one that uses the default scheduler and one that specifies schedulerName: my-custom-scheduler —and observe the logs of the custom scheduler.
apiVersion: v1
kind: Pod
metadata:
name: nginx-default
spec:
containers:
- image: nginx
name: nginx
---
apiVersion: v1
kind: Pod
metadata:
name: nginx-custom
spec:
schedulerName: my-custom-scheduler
containers:
- image: nginx
name: nginxThe logs show that the custom scheduler successfully schedules its Pod without interfering with the default scheduler.
References:
https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/
https://kubernetes.io/docs/reference/scheduling/config/
https://kubernetes.io/docs/reference/config-api/kube-scheduler-config.v1/
https://arthurchiao.art/blog/k8s-scheduling-plugins-zh/
System Architect Go
Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.