Artificial Intelligence 3 min read

Deploy DeepSeek, Llama, Qwen Models Fast on Baidu Baige AI Heterogeneous Platform

This guide walks you through creating a lightweight compute instance, adding it to Baidu Baige AI heterogeneous computing platform, deploying the vLLM tool, loading and serving small‑scale dense models such as DeepSeek, Llama and Qwen, and provides recommended configuration lists to achieve low‑cost, high‑performance inference.

Baidu Geek Talk

Feb 12, 2025

Deploy DeepSeek, Llama, Qwen Models Fast on Baidu Baige AI Heterogeneous Platform

1. Provision a lightweight compute instance

Use the Baidu Baige AI platform to create a compute instance of type H20 (identifier ebc.lgn7t.c208m2048.8h20.4d). This instance provides 8 vCPU, 20 GB memory, and GPU resources suitable for dense small‑scale models such as DeepSeek‑V3, DeepSeek‑R1, Llama, and Qwen.

2. Install vLLM from the Tool Market

In the left navigation panel select “Tool Market”, locate the vLLM tool and click Deploy . The platform pulls the vLLM container image and starts the service on the provisioned instance.

3. Prepare the model and start inference

After vLLM is running, SSH into the instance, download the desired model checkpoint from its official repository, and launch vLLM with appropriate arguments. Example commands:

git clone https://github.com/deepseek-ai/DeepSeek-Model.git
cd DeepSeek-Model
python -m vllm.entrypoints.openai \
    --model-path ./deepseek-v3 \
    --tensor-parallel-size 2 \
    --max-model-len 4096

Optionally install a WebUI client (e.g., an OpenAI‑compatible UI) and send a POST request to http://<instance_ip>:8000/v1/chat/completions with the standard JSON payload to start a conversation.

4. Recommended hardware configurations

The following table (illustrated) lists the minimum instance specifications for each model series. For example, DeepSeek‑V3 requires at least 8 GB GPU memory, while Llama‑2‑13B benefits from 16 GB GPU memory.

5. Platform capabilities

Baidu Baige AI offers full lifecycle management, proprietary inference acceleration, and automatic resource fragmentation handling. These features improve service stability, lower inference cost, and increase throughput for deployed models.

Reference: https://cloud.baidu.com/product/aihc.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

vLLM DeepSeek inference AI model deployment Cloud AI Baidu Baige Model configuration

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.