Deploying DeepSeek‑R1 Large Language Model on Knative with GPU A10
This guide explains how to deploy the DeepSeek‑R1 large language model on a Knative platform using an A10 GPU, covering preparation, service creation with appropriate annotations, YAML configuration, verification via curl, custom domain setup, and optional personal AI assistant deployment.
The traditional GPU‑utilization‑based autoscaling cannot accurately reflect the load of large model inference services; Knative's KPA (Knative Pod Autoscaler) can scale based on QPS/RPS, providing a more direct performance metric.
DeepSeek‑R1, developed by DeepSeek AI, is a high‑performance large language model designed for efficient natural language processing tasks.
Preparation : Ensure Knative is deployed in an ACK cluster and the ack‑virtual‑node component is installed.
Deploy DeepSeek‑R1 Model : Create a Knative Service resource with the label alibabacloud.com/eci=true and annotation k8s.aliyun.com/eci-use-specs to specify the ECI GPU spec (e.g., ecs.gn7i-c8g1.2xlarge ). Apply the following YAML:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
labels:
release: deepseek
name: deepseek
namespace: default
spec:
template:
metadata:
annotations:
k8s.aliyun.com/eci-use-specs: "ecs.gn7i-c8g1.2xlarge" // GPU spec A10
autoscaling.knative.dev/min-scale: "1"
labels:
release: deepseek
alibabacloud.com/eci: "true"
spec:
containers:
- command:
- /bin/sh
- -c
args:
- vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --max_model_len 2048
image: registry.cn-hangzhou.aliyuncs.com/knative-sample/vllm-openai:v0.7.1
imagePullPolicy: IfNotPresent
name: vllm-container
env:
- name: HF_HUB_ENABLE_HF_TRANSFER
value: "0"
ports:
- containerPort: 8000
readinessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 60
periodSeconds: 5
resources:
limits:
nvidia.com/gpu: "1"
requests:
nvidia.com/gpu: "1"
volumeMounts:
- mountPath: /root/.cache/huggingface
name: cache-volume
- name: shm
mountPath: /dev/shm
volumes:
- name: cache-volume
emptyDir: {}
- name: shm
emptyDir:
medium: Memory
sizeLimit: 2GiAfter deployment, retrieve the service's access gateway and default domain from the Service Management tab.
Verification : Use curl to test the model endpoint:
curl -H "Host: deepseek.default.example.com" -H "Content-Type: application/json" http://deepseek.knative.top/v1/chat/completions -d '{"model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", "messages": [{"role": "user", "content": "介绍一下DeepSeek-R1"}]}'The response contains a JSON with the model's generated answer.
Custom Domain : Knative allows assigning a specific domain to a service; configure DNS to point the domain to the access gateway.
Deploy Personal AI Assistant : Using ChatGPTNextWeb, deploy a private ChatGPT‑like web app that supports DeepSeek, Claude, GPT‑4, and Gemini Pro. Example Knative Service YAML:
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: chatgpt-next-web
spec:
template:
spec:
containers:
- name: chatgpt-next-web
image: registry.cn-hangzhou.aliyuncs.com/knative-sample/chatgpt-next-web:v2.15.8
ports:
- containerPort: 3000
readinessProbe:
tcpSocket:
port: 3000
initialDelaySeconds: 60
periodSeconds: 5
env:
- name: HOSTNAME
value: '0.0.0.0'
# Replace with your OpenAI API endpointObtain the service's gateway address and default domain, bind the domain to the gateway IP in the hosts file, and configure the DeepSeek API endpoint and API key as described.
Finally, verify the assistant’s functionality via the provided UI. Interested users can join the Alibaba Cloud Knative DingTalk group (ID 23302777) for further discussion.
Alibaba Cloud Infrastructure
For uninterrupted computing services
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.