Kubernetes GPU Scheduling: Device Plugin, CDI, NFD, and GPU Operator Overview
This article explains how Kubernetes manages and schedules GPU resources by introducing the Device Plugin framework, the Container Device Interface (CDI), Node Feature Discovery (NFD), and the GPU Operator, detailing their workflows, APIs, and practical usage with NVIDIA GPUs.
With the rapid development of artificial intelligence ( AI ) and machine learning ( ML ), GPUs have become indispensable resources in Kubernetes, yet the original scheduler only supports conventional CPU and memory resources and lacks native support for heterogeneous hardware such as GPUs.
To efficiently manage and schedule GPUs and other specialized hardware, Kubernetes provides a set of extension mechanisms, including the Device Plugin, Container Device Interface ( CDI ), Node Feature Discovery ( NFD ) and the GPU Operator.
This article uses GPU scheduling as an example to outline the working principles and applications of these extensions.
Device Plugin
The Device Plugin is a plugin mechanism for managing special hardware resources in Kubernetes. It abstracts devices such as GPU , FPGA , NIC , and InfiniBand as resources recognizable by Kubernetes, enabling discovery, allocation, and scheduling.
The Device Plugin API uses gRPC to define the interaction between the kubelet and a device plugin and includes two services:
Registration service: the device plugin registers itself to the kubelet via the Register method.
DevicePlugin service: provides five methods: GetDevicePluginOptions : query optional plugin settings. ListAndWatch : kubelet watches device status changes. GetPreferredAllocation : kubelet may ask for the optimal allocation (e.g., best GPU combination for a multi‑GPU job). Allocate : kubelet requests the plugin to allocate devices for a container. PreStartContainer : optional hook before the container starts.
Example with an NVIDIA GPU:
The device plugin registers itself with Register and reports device status via ListAndWatch , which propagates GPU information to the kube‑apiserver.
A user creates a pod that requests GPU resources; the scheduler places the pod on a node with free GPUs.
When the pod is scheduled, the node’s kubelet: Calls Allocate on the device plugin to obtain the GPU. Communicates with the container runtime (e.g., containerd , cri‑o ) via the CRI gRPC interface. The runtime ultimately uses runc or kata‑container together with nvidia‑container‑runtime , which injects NVIDIA‑specific code into runc to enable GPU usage.
This flow demonstrates how Kubernetes manages and schedules GPU resources.
Container Device Interface (CDI)
Because there is no universal third‑party device standard, vendors often need to write multiple plugins or embed vendor‑specific code directly into runtimes (e.g., nvidia‑container‑runtime on top of runc ). The community therefore introduced the Container Device Interface ( CDI ) to decouple and standardize the interaction between container runtimes and device plugins.
CDI defines a JSON‑formatted device description file that specifies device properties, environment variables, mount points, and other metadata.
The CDI workflow is roughly:
Device plugins or vendors provide a CDI description file.
The device name is passed to the container runtime.
The runtime updates the container configuration according to the CDI file.
CDI is not a replacement for the Device Plugin; it works together with it, similar to how CNI works alongside CDI for networking.
Node Feature Discovery (NFD)
In some scenarios applications need nodes with specific hardware features (e.g., AVX/SSE instruction sets, GPUs, FPGAs, or particular CPU architectures). The default Kubernetes scheduler is unaware of these features, so it cannot make informed scheduling decisions. Node Feature Discovery fills this gap by automatically detecting node capabilities and exposing them as labels or annotations.
The NFD workflow:
Runs as a DaemonSet on each node, detecting hardware and software characteristics.
Detected features are added to the node as Labels / Annotations .
Users can schedule pods based on these labels using nodeSelector or nodeAffinity .
NFD only discovers node features; how those labels are used is up to the user.
GPU Operator
As described in the Device Plugin section, using GPUs requires a driver, a device plugin, the nvidia‑container‑runtime , and monitoring tools, which is complex to manage manually. The GPU Operator automates this process by using the Operator pattern to uniformly manage and configure all GPU‑related components.
Operators are a Kubernetes extension mechanism that allow users to define custom resources and controllers to cover more use cases.
Conclusion
By leveraging the Device Plugin, CDI , NFD , and Operator mechanisms, Kubernetes achieves automated management and efficient scheduling of special hardware resources such as GPUs. However, vendor‑specific hardware still requires additional attention during deployment and configuration.
(Follow me for ad‑free technical content and feel free to discuss.)
System Architect Go
Programming, architecture, application development, message queues, middleware, databases, containerization, big data, image processing, machine learning, AI, personal growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.