Artificial Intelligence 8 min read

Deploying DeepSeek‑R1 Large Language Model on Knative with GPU A10

This guide explains how to deploy the DeepSeek‑R1 large language model on a Knative platform using an A10 GPU, covering preparation, service creation with appropriate annotations, YAML configuration, verification via curl, custom domain setup, and optional personal AI assistant deployment.

Alibaba Cloud Infrastructure

Feb 20, 2025

Deploying DeepSeek‑R1 Large Language Model on Knative with GPU A10

The traditional GPU‑utilization‑based autoscaling cannot accurately reflect the load of large model inference services; Knative's KPA (Knative Pod Autoscaler) can scale based on QPS/RPS, providing a more direct performance metric.

DeepSeek‑R1, developed by DeepSeek AI, is a high‑performance large language model designed for efficient natural language processing tasks.

Preparation : Ensure Knative is deployed in an ACK cluster and the ack‑virtual‑node component is installed.

Deploy DeepSeek‑R1 Model : Create a Knative Service resource with the label alibabacloud.com/eci=true and annotation k8s.aliyun.com/eci-use-specs to specify the ECI GPU spec (e.g., ecs.gn7i-c8g1.2xlarge). Apply the following YAML:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  labels:
    release: deepseek
  name: deepseek
  namespace: default
spec:
  template:
    metadata:
      annotations:
        k8s.aliyun.com/eci-use-specs: "ecs.gn7i-c8g1.2xlarge" // GPU spec A10
        autoscaling.knative.dev/min-scale: "1"
      labels:
        release: deepseek
        alibabacloud.com/eci: "true"
    spec:
      containers:
      - command:
        - /bin/sh
        - -c
        args:
        - vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --max_model_len 2048
        image: registry.cn-hangzhou.aliyuncs.com/knative-sample/vllm-openai:v0.7.1
        imagePullPolicy: IfNotPresent
        name: vllm-container
        env:
        - name: HF_HUB_ENABLE_HF_TRANSFER
          value: "0"
        ports:
        - containerPort: 8000
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 60
          periodSeconds: 5
        resources:
          limits:
            nvidia.com/gpu: "1"
          requests:
            nvidia.com/gpu: "1"
        volumeMounts:
        - mountPath: /root/.cache/huggingface
          name: cache-volume
        - name: shm
          mountPath: /dev/shm
      volumes:
      - name: cache-volume
        emptyDir: {}
      - name: shm
        emptyDir:
          medium: Memory
          sizeLimit: 2Gi

After deployment, retrieve the service's access gateway and default domain from the Service Management tab.

Verification : Use curl to test the model endpoint:

curl -H "Host:   deepseek.default.example.com" -H "Content-Type: application/json" http://deepseek.knative.top/v1/chat/completions -d '{"model": "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B", "messages": [{"role": "user", "content": "介绍一下DeepSeek-R1"}]}'

The response contains a JSON with the model's generated answer.

Custom Domain : Knative allows assigning a specific domain to a service; configure DNS to point the domain to the access gateway.

Deploy Personal AI Assistant : Using ChatGPTNextWeb, deploy a private ChatGPT‑like web app that supports DeepSeek, Claude, GPT‑4, and Gemini Pro. Example Knative Service YAML:

apiVersion: serving.knative.dev/v1
kind: Service
metadata:
  name: chatgpt-next-web
spec:
  template:
    spec:
      containers:
      - name: chatgpt-next-web
        image: registry.cn-hangzhou.aliyuncs.com/knative-sample/chatgpt-next-web:v2.15.8
        ports:
        - containerPort: 3000
        readinessProbe:
          tcpSocket:
            port: 3000
          initialDelaySeconds: 60
          periodSeconds: 5
        env:
        - name: HOSTNAME
          value: '0.0.0.0'
        # Replace with your OpenAI API endpoint

Obtain the service's gateway address and default domain, bind the domain to the gateway IP in the hosts file, and configure the DeepSeek API endpoint and API key as described.

Finally, verify the assistant’s functionality via the provided UI. Interested users can join the Alibaba Cloud Knative DingTalk group (ID 23302777) for further discussion.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI LLM Deployment Kubernetes DeepSeek GPU Knative

Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.