Cloud Native 23 min read

One-Click GPU-Enabled Kind Cluster Setup for Running Large AI Models

This tutorial walks you through using a one‑click script to create a GPU‑enabled Kind Kubernetes cluster, evenly distribute GPU resources across nodes with nvkind, install necessary drivers and toolkits, deploy a vLLM‑served large language model, and verify its operation, all on a local or cloud environment.

Ops Development Stories
Ops Development Stories
Ops Development Stories
One-Click GPU-Enabled Kind Cluster Setup for Running Large AI Models

Prerequisites

The full script is available at GitHub and assumes an Ubuntu host.

A machine with an NVIDIA GPU

Administrator (sudo) privileges

A stable internet connection

If you lack a local GPU server, you can rent a GPU instance from cloud providers such as Alibaba Cloud, AWS, or Azure; Alibaba Cloud often offers the most cost‑effective option.

Run the Setup Script

Execute the following command; the script will install all dependencies and create a GPU‑enabled Kind cluster.

<code>bash install.sh</code>

Script Details

Install Command‑Line Tools

Docker, kubectl, Helm, Kind, and nvkind are installed.

<code>sudo apt update
sudo apt install -y docker.io
sudo snap install kubectl --classic
# Add kubectl completion to bashrc
echo 'source &lt;(kubectl completion bash)' >> ~/.bashrc
source ~/.bashrc
sudo snap install helm --classic
# Install kind
[ $(uname -m) = x86_64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.25.0/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind
# Install nvkind
curl -L -o ~/nvkind-linux-amd64.tar.gz https://github.com/Jeffwan/kind-with-gpus-examples/releases/download/v0.1.0/nvkind-linux-amd64.tar.gz
tar -xzvf ~/nvkind-linux-amd64.tar.gz
mv nvkind-linux-amd64 /usr/local/bin/nvkind</code>

Install NVIDIA GPU Driver

The script installs NVIDIA driver version 565.57.01.

<code>wget https://cn.download.nvidia.com/tesla/565.57.01/NVIDIA-Linux-x86_64-565.57.01.run
sh NVIDIA-Linux-x86_64-565.57.01.run --silent</code>

Install and Configure NVIDIA Container Toolkit

The toolkit mounts NVIDIA devices into containers.

<code>curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

sudo nvidia-ctk runtime configure --runtime=docker --set-as-default --cdi.enabled
sudo nvidia-ctk config --set accept-nvidia-visible-devices-as-volume-mounts=true --in-place
sudo systemctl restart docker</code>

Verify GPU Availability

Run the following checks:

Execute

nvidia-smi

to list GPUs.

Run a Docker container with the NVIDIA runtime to ensure GPU detection.

Confirm containers can access GPU devices.

<code># Run nvidia-smi to list GPU devices
nvidia-smi -L
if [ $? -ne 0 ]; then
  echo "nvidia-smi failed to execute."
  exit 1
fi

# Run a Docker container with NVIDIA runtime
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all ubuntu:20.04 nvidia-smi -L
if [ $? -ne 0 ]; then
  echo "Docker command with NVIDIA runtime failed to execute."
  exit 1
fi

# Run a Docker container with mounted /dev/null to check GPU accessibility
docker run -v /dev/null:/var/run/nvidia-container-devices/all ubuntu:20.04 nvidia-smi -L
if [ $? -ne 0 ]; then
  echo "Docker command with mounted /dev/null failed to execute."
  exit 1
fi</code>

Create the Kind GPU Cluster

A configuration file is generated based on the number of GPUs, then

nvkind

creates the cluster.

<code>cat <<'EOF' > one-worker-per-gpu.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
{{- range $gpu := until numGPUs }}
- role: worker
  extraMounts:
    # We inject all NVIDIA GPUs using the nvidia‑container‑runtime.
    # This requires `accept-nvidia-visible-devices-as-volume-mounts = true` in `/etc/nvidia-container-runtime/config.toml`
    - hostPath: /dev/null
      containerPath: /var/run/nvidia-container-devices/{{ $gpu }}
{{- end }}
EOF

nvkind cluster create --name gpu-cluster --config-template=one-worker-per-gpu.yaml</code>

Install NVIDIA GPU Operator

The operator automates driver, device plugin, and DCGM exporter installation.

<code>helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
helm install --wait -n gpu-operator --create-namespace \
    nvidia/gpu-operator \
    --set driver.enabled=false</code>

Install Cloud Provider Kind

This component enables LoadBalancer‑type service exposure.

<code>curl -L ${KIND_CLOUD_PROVIDER_URL} -o cloud-provider-kind.tar.gz
tar -xvzf cloud-provider-kind.tar.gz
chmod +x cloud-provider-kind
sudo mv cloud-provider-kind /usr/local/bin/

echo "Starting cloud-provider-kind in the background..."
LOG_FILE="/tmp/cloud-provider-kind.log"
nohup cloud-provider-kind > $LOG_FILE 2>&1 &
echo $! > /tmp/cloud-provider-kind.pid

echo "Setup complete. All components have been installed successfully."
</code>

Run a Large Model with vLLM

The DeepSeek‑R1‑Distill‑Qwen‑1.5B model is deployed to verify the cluster.

<code>apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: deepseek-r1-distill-qwen-1-5b
  namespace: default
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  volumeMode: Filesystem
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: deepseek-r1-distill-qwen-1-5b
  namespace: default
  labels:
    app: deepseek-r1-distill-qwen-1-5b
spec:
  replicas: 1
  selector:
    matchLabels:
      app: deepseek-r1-distill-qwen-1-5b
  template:
    metadata:
      labels:
        app: deepseek-r1-distill-qwen-1-5b
    spec:
      volumes:
      - name: cache-volume
        persistentVolumeClaim:
          claimName: deepseek-r1-distill-qwen-1-5b
      containers:
      - name: deepseek-r1-distill-qwen-1-5b
        image: vllm/vllm-openai:latest
        command: ["/bin/sh", "-c"]
        args:
        - "vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --trust-remote-code --enable-chunked-prefill --max_num_batched_tokens 1024"
        ports:
        - containerPort: 8000
        resources:
          limits:
            nvidia.com/gpu: "1"
          requests:
            nvidia.com/gpu: "1"
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 60
          periodSeconds: 5
        volumeMounts:
        - mountPath: /root/.cache/huggingface
          name: cache-volume
---
apiVersion: v1
kind: Service
metadata:
  name: deepseek-r1-distill-qwen-1-5b
  namespace: default
spec:
  ports:
  - name: deepseek-r1-distill-qwen-1-5b
    port: 80
    protocol: TCP
    targetPort: 8000
  selector:
    app: deepseek-r1-distill-qwen-1-5b
  type: LoadBalancer
</code>

The vLLM image (~8 GB) takes time to download; the first pod start downloads model weights (≈508 s). Subsequent restarts load the cached weights in ~0.55 s.

<code># Check pod status
kubectl get pod
# Check service
kubectl get svc
# Access the model via LoadBalancer IP
curl --location 'http://172.18.0.4/v1/chat/completions' \
  --header 'Content-Type: application/json' \
  --data '{
    "model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
    "messages":[{"role":"user","content":"你是谁?"}]
}'
</code>

The response shows the model successfully processed the request.

Cleanup

When testing is complete, remove the cluster with:

<code>bash cleanup.sh</code>

Conclusion

This guide demonstrates how a one‑click script can rapidly provision a GPU‑enabled Kind cluster for large‑model development and testing, using nvkind for balanced GPU allocation and vLLM to serve the DeepSeek‑R1‑Distill‑Qwen‑1.5B model.

DockerkubernetesvLLMGPUNVIDIAAI Model DeploymentKind
Ops Development Stories
Written by

Ops Development Stories

Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.