Cloud Native 9 min read

Deploying nvidia-docker2 on Kubernetes: Installation, Device Plugin Configuration, and GPU Scheduling

This article provides a step‑by‑step guide on installing nvidia‑docker2, configuring the NVIDIA container runtime, setting up the GPU device plugin on a Kubernetes cluster, and deploying a GPU‑enabled pod to demonstrate proper GPU scheduling and usage.

360 Tech Engineering
360 Tech Engineering
360 Tech Engineering
Deploying nvidia-docker2 on Kubernetes: Installation, Device Plugin Configuration, and GPU Scheduling

nvidia-docker2 enables containerizing legacy GPU‑accelerated applications, assigning specific GPU resources to containers, and sharing them across environments; this article documents its practical use on a large‑scale Kubernetes cluster.

1. Experimental Environment

CentOS Linux release 7.2.1511 (Core)

Kubernetes 1.9

GPU: nvidia‑tesla‑k80

2. Installation (version 2.0)

Follow the official installation guide:

https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)

Prerequisites:

GNU/Linux x86_64 with kernel > 3.10

Docker >= 1.12

NVIDIA GPU with Architecture > Fermi (2.1)

NVIDIA drivers ~= 361.93

Installation commands:

# If you have nvidia-docker 1.0 installed: we need to remove it and all existing GPU containers
docker volume ls -q -f driver=nvidia-docker | xargs -r -I{} -n1 docker ps -q -a -f volume={} | xargs -r docker rm -f
sudo yum remove nvidia-docker

# Add the package repositories
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.repo | \
sudo tee /etc/yum.repos.d/nvidia-docker.repo

# Install nvidia-docker2 and reload the Docker daemon configuration
sudo yum install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

# Test nvidia-smi with the latest official CUDA image
docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi

After installation, configure Docker to use the NVIDIA container runtime by editing /etc/docker/daemon.json :

{
  "default-runtime":"nvidia",
  "runtimes": {
    "nvidia": {
      "path": "/usr/bin/nvidia-container-runtime",
      "runtimeArgs": []
    }
  }
}

Restart Docker:

systemctl restart docker

3. GPU on Kubernetes

Kubernetes has supported NVIDIA GPUs since v1.6 and AMD GPUs since v1.9, but multi‑container sharing of a single GPU is still unsupported.

To schedule GPUs, enable device plugins (required before v1.10 with --feature-gates="DevicePlugins=true" ) and install GPU drivers and the device plugin on each node.

Deploy the NVIDIA device plugin DaemonSet using the following manifest (nvidia-docker-plugin.yml):

apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: nvidia-device-plugin-daemonset
  namespace: kube-system
spec:
  template:
    metadata:
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      labels:
        name: nvidia-device-plugin-ds
    spec:
      tolerations:
      - key: CriticalAddonsOnly
        operator: Exists
      containers:
      - image: nvidia/k8s-device-plugin:1.9
        name: nvidia-device-plugin-ctr
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["ALL"]
        volumeMounts:
        - name: device-plugin
          mountPath: /var/lib/kubelet/device-plugins
      volumes:
      - name: device-plugin
        hostPath:
          path: /var/lib/kubelet/device-plugins

Create the DaemonSet:

kubectl create -f nvidia-docker-plugin.yml

After successful creation, each GPU node runs the nvidia-device-plugin-daemonset pod.

Test GPU scheduling with a simple pod manifest (nvidia-docker2-gpu-pod.yml):

apiVersion: v1
kind: Pod
metadata:
  name: cuda-vector-add
spec:
  restartPolicy: OnFailure
  containers:
  - name: cuda-vector-add
    image: "k8s.gcr.io/cuda-vector-add:v0.1"
    resources:
      limits:
        nvidia.com/gpu: 1
    nodeSelector:
      accelerator: nvidia-tesla-k80 # or other supported GPU label

Create and verify the pod:

kubectl create -f nvidia-docker2-gpu-pod.yml

Enter the container to confirm that the GPU device and CUDA libraries are correctly mounted and that only one GPU is assigned.

4. Summary

Using nvidia‑docker 1.0 requires manually mounting GPU drivers as volumes, which is error‑prone; nvidia‑docker 2.0 eliminates this by leveraging the Kubernetes device plugin framework, simplifying GPU resource management and showcasing the extensibility of Kubernetes through standardized interfaces for runtimes and device plugins.

DockerKubernetesGPUinstallationdevice pluginnvidia-docker2
360 Tech Engineering
Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.