Cloud Native 23 min read

One-Click GPU-Enabled Kind Cluster Setup for Running Large AI Models

This tutorial walks you through using a one‑click script to create a GPU‑enabled Kind Kubernetes cluster, evenly distribute GPU resources across nodes with nvkind, install necessary drivers and toolkits, deploy a vLLM‑served large language model, and verify its operation, all on a local or cloud environment.

Ops Development Stories

Jun 12, 2025

One-Click GPU-Enabled Kind Cluster Setup for Running Large AI Models

Prerequisites

The full script is available at GitHub and assumes an Ubuntu host.

A machine with an NVIDIA GPU

Administrator (sudo) privileges

A stable internet connection

If you lack a local GPU server, you can rent a GPU instance from cloud providers such as Alibaba Cloud, AWS, or Azure; Alibaba Cloud often offers the most cost‑effective option.

Run the Setup Script

Execute the following command; the script will install all dependencies and create a GPU‑enabled Kind cluster.

bash install.sh

Script Details

Install Command‑Line Tools

Docker, kubectl, Helm, Kind, and nvkind are installed.

sudo apt update
sudo apt install -y docker.io
sudo snap install kubectl --classic
# Add kubectl completion to bashrc
echo 'source <(kubectl completion bash)' >> ~/.bashrc
source ~/.bashrc
sudo snap install helm --classic
# Install kind
[ $(uname -m) = x86_64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.25.0/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind
# Install nvkind
curl -L -o ~/nvkind-linux-amd64.tar.gz https://github.com/Jeffwan/kind-with-gpus-examples/releases/download/v0.1.0/nvkind-linux-amd64.tar.gz
tar -xzvf ~/nvkind-linux-amd64.tar.gz
mv nvkind-linux-amd64 /usr/local/bin/nvkind

Install NVIDIA GPU Driver

The script installs NVIDIA driver version 565.57.01.

wget https://cn.download.nvidia.com/tesla/565.57.01/NVIDIA-Linux-x86_64-565.57.01.run
sh NVIDIA-Linux-x86_64-565.57.01.run --silent

Install and Configure NVIDIA Container Toolkit

The toolkit mounts NVIDIA devices into containers.

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

sudo nvidia-ctk runtime configure --runtime=docker --set-as-default --cdi.enabled
sudo nvidia-ctk config --set accept-nvidia-visible-devices-as-volume-mounts=true --in-place
sudo systemctl restart docker

Verify GPU Availability

Run the following checks:

Execute nvidia-smi to list GPUs.

Run a Docker container with the NVIDIA runtime to ensure GPU detection.

Confirm containers can access GPU devices.

# Run nvidia-smi to list GPU devices
nvidia-smi -L
if [ $? -ne 0 ]; then
  echo "nvidia-smi failed to execute."
  exit 1
fi

# Run a Docker container with NVIDIA runtime
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all ubuntu:20.04 nvidia-smi -L
if [ $? -ne 0 ]; then
  echo "Docker command with NVIDIA runtime failed to execute."
  exit 1
fi

# Run a Docker container with mounted /dev/null to check GPU accessibility
docker run -v /dev/null:/var/run/nvidia-container-devices/all ubuntu:20.04 nvidia-smi -L
if [ $? -ne 0 ]; then
  echo "Docker command with mounted /dev/null failed to execute."
  exit 1
fi

Create the Kind GPU Cluster

A configuration file is generated based on the number of GPUs, then nvkind creates the cluster.

cat <<'EOF' > one-worker-per-gpu.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
{{- range $gpu := until numGPUs }}
- role: worker
  extraMounts:
    # We inject all NVIDIA GPUs using the nvidia‑container‑runtime.
    # This requires `accept-nvidia-visible-devices-as-volume-mounts = true` in `/etc/nvidia-container-runtime/config.toml`
    - hostPath: /dev/null
      containerPath: /var/run/nvidia-container-devices/{{ $gpu }}
{{- end }}
EOF

nvkind cluster create --name gpu-cluster --config-template=one-worker-per-gpu.yaml

Install NVIDIA GPU Operator

The operator automates driver, device plugin, and DCGM exporter installation.

helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
helm install --wait -n gpu-operator --create-namespace \
    nvidia/gpu-operator \
    --set driver.enabled=false

Install Cloud Provider Kind

This component enables LoadBalancer‑type service exposure.

curl -L ${KIND_CLOUD_PROVIDER_URL} -o cloud-provider-kind.tar.gz
tar -xvzf cloud-provider-kind.tar.gz
chmod +x cloud-provider-kind
sudo mv cloud-provider-kind /usr/local/bin/

echo "Starting cloud-provider-kind in the background..."
LOG_FILE="/tmp/cloud-provider-kind.log"
nohup cloud-provider-kind > $LOG_FILE 2>&1 &
echo $! > /tmp/cloud-provider-kind.pid

echo "Setup complete. All components have been installed successfully."

Run a Large Model with vLLM

The DeepSeek‑R1‑Distill‑Qwen‑1.5B model is deployed to verify the cluster.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: deepseek-r1-distill-qwen-1-5b
  namespace: default
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  volumeMode: Filesystem
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deepseek-r1-distill-qwen-1-5b
  namespace: default
  labels:
    app: deepseek-r1-distill-qwen-1-5b
spec:
  replicas: 1
  selector:
    matchLabels:
      app: deepseek-r1-distill-qwen-1-5b
  template:
    metadata:
      labels:
        app: deepseek-r1-distill-qwen-1-5b
    spec:
      volumes:
      - name: cache-volume
        persistentVolumeClaim:
          claimName: deepseek-r1-distill-qwen-1-5b
      containers:
      - name: deepseek-r1-distill-qwen-1-5b
        image: vllm/vllm-openai:latest
        command: ["/bin/sh", "-c"]
        args:
        - "vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --trust-remote-code --enable-chunked-prefill --max_num_batched_tokens 1024"
        ports:
        - containerPort: 8000
        resources:
          limits:
            nvidia.com/gpu: "1"
          requests:
            nvidia.com/gpu: "1"
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 60
          periodSeconds: 5
        volumeMounts:
        - mountPath: /root/.cache/huggingface
          name: cache-volume
---
apiVersion: v1
kind: Service
metadata:
  name: deepseek-r1-distill-qwen-1-5b
  namespace: default
spec:
  ports:
  - name: deepseek-r1-distill-qwen-1-5b
    port: 80
    protocol: TCP
    targetPort: 8000
  selector:
    app: deepseek-r1-distill-qwen-1-5b
  type: LoadBalancer

The vLLM image (~8 GB) takes time to download; the first pod start downloads model weights (≈508 s). Subsequent restarts load the cached weights in ~0.55 s.

# Check pod status
kubectl get pod
# Check service
kubectl get svc
# Access the model via LoadBalancer IP
curl --location 'http://172.18.0.4/v1/chat/completions' \
  --header 'Content-Type: application/json' \
  --data '{
    "model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
    "messages":[{"role":"user","content":"你是谁？"}]
}'

The response shows the model successfully processed the request.

Cleanup

When testing is complete, remove the cluster with:

bash cleanup.sh

Conclusion

This guide demonstrates how a one‑click script can rapidly provision a GPU‑enabled Kind cluster for large‑model development and testing, using nvkind for balanced GPU allocation and vLLM to serve the DeepSeek‑R1‑Distill‑Qwen‑1.5B model.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Docker Kubernetes vLLM GPU NVIDIA AI model deployment kind

Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.