One-Click GPU-Enabled Kind Cluster Setup for Running Large AI Models
This tutorial walks you through using a one‑click script to create a GPU‑enabled Kind Kubernetes cluster, evenly distribute GPU resources across nodes with nvkind, install necessary drivers and toolkits, deploy a vLLM‑served large language model, and verify its operation, all on a local or cloud environment.
Prerequisites
The full script is available at GitHub and assumes an Ubuntu host.
A machine with an NVIDIA GPU
Administrator (sudo) privileges
A stable internet connection
If you lack a local GPU server, you can rent a GPU instance from cloud providers such as Alibaba Cloud, AWS, or Azure; Alibaba Cloud often offers the most cost‑effective option.
Run the Setup Script
Execute the following command; the script will install all dependencies and create a GPU‑enabled Kind cluster.
<code>bash install.sh</code>Script Details
Install Command‑Line Tools
Docker, kubectl, Helm, Kind, and nvkind are installed.
<code>sudo apt update
sudo apt install -y docker.io
sudo snap install kubectl --classic
# Add kubectl completion to bashrc
echo 'source <(kubectl completion bash)' >> ~/.bashrc
source ~/.bashrc
sudo snap install helm --classic
# Install kind
[ $(uname -m) = x86_64 ] && curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.25.0/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind
# Install nvkind
curl -L -o ~/nvkind-linux-amd64.tar.gz https://github.com/Jeffwan/kind-with-gpus-examples/releases/download/v0.1.0/nvkind-linux-amd64.tar.gz
tar -xzvf ~/nvkind-linux-amd64.tar.gz
mv nvkind-linux-amd64 /usr/local/bin/nvkind</code>Install NVIDIA GPU Driver
The script installs NVIDIA driver version 565.57.01.
<code>wget https://cn.download.nvidia.com/tesla/565.57.01/NVIDIA-Linux-x86_64-565.57.01.run
sh NVIDIA-Linux-x86_64-565.57.01.run --silent</code>Install and Configure NVIDIA Container Toolkit
The toolkit mounts NVIDIA devices into containers.
<code>curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker --set-as-default --cdi.enabled
sudo nvidia-ctk config --set accept-nvidia-visible-devices-as-volume-mounts=true --in-place
sudo systemctl restart docker</code>Verify GPU Availability
Run the following checks:
Execute
nvidia-smito list GPUs.
Run a Docker container with the NVIDIA runtime to ensure GPU detection.
Confirm containers can access GPU devices.
<code># Run nvidia-smi to list GPU devices
nvidia-smi -L
if [ $? -ne 0 ]; then
echo "nvidia-smi failed to execute."
exit 1
fi
# Run a Docker container with NVIDIA runtime
docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all ubuntu:20.04 nvidia-smi -L
if [ $? -ne 0 ]; then
echo "Docker command with NVIDIA runtime failed to execute."
exit 1
fi
# Run a Docker container with mounted /dev/null to check GPU accessibility
docker run -v /dev/null:/var/run/nvidia-container-devices/all ubuntu:20.04 nvidia-smi -L
if [ $? -ne 0 ]; then
echo "Docker command with mounted /dev/null failed to execute."
exit 1
fi</code>Create the Kind GPU Cluster
A configuration file is generated based on the number of GPUs, then
nvkindcreates the cluster.
<code>cat <<'EOF' > one-worker-per-gpu.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
{{- range $gpu := until numGPUs }}
- role: worker
extraMounts:
# We inject all NVIDIA GPUs using the nvidia‑container‑runtime.
# This requires `accept-nvidia-visible-devices-as-volume-mounts = true` in `/etc/nvidia-container-runtime/config.toml`
- hostPath: /dev/null
containerPath: /var/run/nvidia-container-devices/{{ $gpu }}
{{- end }}
EOF
nvkind cluster create --name gpu-cluster --config-template=one-worker-per-gpu.yaml</code>Install NVIDIA GPU Operator
The operator automates driver, device plugin, and DCGM exporter installation.
<code>helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
helm install --wait -n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set driver.enabled=false</code>Install Cloud Provider Kind
This component enables LoadBalancer‑type service exposure.
<code>curl -L ${KIND_CLOUD_PROVIDER_URL} -o cloud-provider-kind.tar.gz
tar -xvzf cloud-provider-kind.tar.gz
chmod +x cloud-provider-kind
sudo mv cloud-provider-kind /usr/local/bin/
echo "Starting cloud-provider-kind in the background..."
LOG_FILE="/tmp/cloud-provider-kind.log"
nohup cloud-provider-kind > $LOG_FILE 2>&1 &
echo $! > /tmp/cloud-provider-kind.pid
echo "Setup complete. All components have been installed successfully."
</code>Run a Large Model with vLLM
The DeepSeek‑R1‑Distill‑Qwen‑1.5B model is deployed to verify the cluster.
<code>apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: deepseek-r1-distill-qwen-1-5b
namespace: default
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
volumeMode: Filesystem
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: deepseek-r1-distill-qwen-1-5b
namespace: default
labels:
app: deepseek-r1-distill-qwen-1-5b
spec:
replicas: 1
selector:
matchLabels:
app: deepseek-r1-distill-qwen-1-5b
template:
metadata:
labels:
app: deepseek-r1-distill-qwen-1-5b
spec:
volumes:
- name: cache-volume
persistentVolumeClaim:
claimName: deepseek-r1-distill-qwen-1-5b
containers:
- name: deepseek-r1-distill-qwen-1-5b
image: vllm/vllm-openai:latest
command: ["/bin/sh", "-c"]
args:
- "vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --trust-remote-code --enable-chunked-prefill --max_num_batched_tokens 1024"
ports:
- containerPort: 8000
resources:
limits:
nvidia.com/gpu: "1"
requests:
nvidia.com/gpu: "1"
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 60
periodSeconds: 5
volumeMounts:
- mountPath: /root/.cache/huggingface
name: cache-volume
---
apiVersion: v1
kind: Service
metadata:
name: deepseek-r1-distill-qwen-1-5b
namespace: default
spec:
ports:
- name: deepseek-r1-distill-qwen-1-5b
port: 80
protocol: TCP
targetPort: 8000
selector:
app: deepseek-r1-distill-qwen-1-5b
type: LoadBalancer
</code>The vLLM image (~8 GB) takes time to download; the first pod start downloads model weights (≈508 s). Subsequent restarts load the cached weights in ~0.55 s.
<code># Check pod status
kubectl get pod
# Check service
kubectl get svc
# Access the model via LoadBalancer IP
curl --location 'http://172.18.0.4/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
"model":"deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B",
"messages":[{"role":"user","content":"你是谁?"}]
}'
</code>The response shows the model successfully processed the request.
Cleanup
When testing is complete, remove the cluster with:
<code>bash cleanup.sh</code>Conclusion
This guide demonstrates how a one‑click script can rapidly provision a GPU‑enabled Kind cluster for large‑model development and testing, using nvkind for balanced GPU allocation and vLLM to serve the DeepSeek‑R1‑Distill‑Qwen‑1.5B model.
Ops Development Stories
Maintained by a like‑minded team, covering both operations and development. Topics span Linux ops, DevOps toolchain, Kubernetes containerization, monitoring, log collection, network security, and Python or Go development. Team members: Qiao Ke, wanger, Dong Ge, Su Xin, Hua Zai, Zheng Ge, Teacher Xia.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.