Deploying nvidia-docker2 on Kubernetes: Installation, Device Plugin Configuration, and GPU Scheduling
This article provides a step‑by‑step guide on installing nvidia‑docker2, configuring the NVIDIA container runtime, setting up the GPU device plugin on a Kubernetes cluster, and deploying a GPU‑enabled pod to demonstrate proper GPU scheduling and usage.
nvidia-docker2 enables containerizing legacy GPU‑accelerated applications, assigning specific GPU resources to containers, and sharing them across environments; this article documents its practical use on a large‑scale Kubernetes cluster.
1. Experimental Environment
CentOS Linux release 7.2.1511 (Core)
Kubernetes 1.9
GPU: nvidia‑tesla‑k80
2. Installation (version 2.0)
Follow the official installation guide:
https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)
Prerequisites:
GNU/Linux x86_64 with kernel > 3.10
Docker >= 1.12
NVIDIA GPU with Architecture > Fermi (2.1)
NVIDIA drivers ~= 361.93
Installation commands:
# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo yum remove nvidia-docker
# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \
sudo tee /etc/yum.repos.d/nvidia-docker.repo
# Install nvidia-docker2 and reload the Docker daemon configuration
sudo yum install -y nvidia-docker2
sudo pkill -SIGHUP dockerd
# Test nvidia-smi with the latest official CUDA image
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smiAfter installation, configure Docker to use the NVIDIA container runtime by editing /etc/docker/daemon.json :
{
"default-runtime":"nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}Restart Docker:
systemctl restart docker3. GPU on Kubernetes
Kubernetes has supported NVIDIA GPUs since v1.6 and AMD GPUs since v1.9, but multi‑container sharing of a single GPU is still unsupported.
To schedule GPUs, enable device plugins (required before v1.10 with --feature-gates="DevicePlugins=true" ) and install GPU drivers and the device plugin on each node.
Deploy the NVIDIA device plugin DaemonSet using the following manifest (nvidia-docker-plugin.yml):
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
name: nvidia-device-plugin-daemonset
namespace: kube-system
spec:
template:
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
name: nvidia-device-plugin-ds
spec:
tolerations:
- key: CriticalAddonsOnly
operator: Exists
containers:
- image: nvidia/k8s-device-plugin:1.9
name: nvidia-device-plugin-ctr
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-pluginsCreate the DaemonSet:
kubectl create -f nvidia-docker-plugin.ymlAfter successful creation, each GPU node runs the nvidia-device-plugin-daemonset pod.
Test GPU scheduling with a simple pod manifest (nvidia-docker2-gpu-pod.yml):
apiVersion: v1
kind: Pod
metadata:
name: cuda-vector-add
spec:
restartPolicy: OnFailure
containers:
- name: cuda-vector-add
image: "k8s.gcr.io/cuda-vector-add:v0.1"
resources:
limits:
nvidia.com/gpu: 1
nodeSelector:
accelerator: nvidia-tesla-k80 # or other supported GPU labelCreate and verify the pod:
kubectl create -f nvidia-docker2-gpu-pod.ymlEnter the container to confirm that the GPU device and CUDA libraries are correctly mounted and that only one GPU is assigned.
4. Summary
Using nvidia‑docker 1.0 requires manually mounting GPU drivers as volumes, which is error‑prone; nvidia‑docker 2.0 eliminates this by leveraging the Kubernetes device plugin framework, simplifying GPU resource management and showcasing the extensibility of Kubernetes through standardized interfaces for runtimes and device plugins.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.