Deploy DeepSeek, Llama, Qwen Models Fast on Baidu Baige AI Heterogeneous Platform
This guide walks you through creating a lightweight compute instance, adding it to Baidu Baige AI heterogeneous computing platform, deploying the vLLM tool, loading and serving small‑scale dense models such as DeepSeek, Llama and Qwen, and provides recommended configuration lists to achieve low‑cost, high‑performance inference.
1. Provision a lightweight compute instance
Use the Baidu Baige AI platform to create a compute instance of type H20 (identifier ebc.lgn7t.c208m2048.8h20.4d). This instance provides 8 vCPU, 20 GB memory, and GPU resources suitable for dense small‑scale models such as DeepSeek‑V3, DeepSeek‑R1, Llama, and Qwen.
2. Install vLLM from the Tool Market
In the left navigation panel select “Tool Market”, locate the vLLM tool and click Deploy . The platform pulls the vLLM container image and starts the service on the provisioned instance.
3. Prepare the model and start inference
After vLLM is running, SSH into the instance, download the desired model checkpoint from its official repository, and launch vLLM with appropriate arguments. Example commands:
git clone https://github.com/deepseek-ai/DeepSeek-Model.git
cd DeepSeek-Model
python -m vllm.entrypoints.openai \
--model-path ./deepseek-v3 \
--tensor-parallel-size 2 \
--max-model-len 4096Optionally install a WebUI client (e.g., an OpenAI‑compatible UI) and send a POST request to http://<instance_ip>:8000/v1/chat/completions with the standard JSON payload to start a conversation.
4. Recommended hardware configurations
The following table (illustrated) lists the minimum instance specifications for each model series. For example, DeepSeek‑V3 requires at least 8 GB GPU memory, while Llama‑2‑13B benefits from 16 GB GPU memory.
5. Platform capabilities
Baidu Baige AI offers full lifecycle management, proprietary inference acceleration, and automatic resource fragmentation handling. These features improve service stability, lower inference cost, and increase throughput for deployed models.
Reference: https://cloud.baidu.com/product/aihc.html
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
