Fine‑Tuning Qwen‑14B Large Language Model: A Complete Guide
This article provides a comprehensive tutorial on fine‑tuning the Qwen‑14B large language model, covering the motivation, fine‑tuning concepts, step‑by‑step workflow, required code, DeepSpeed training parameters, testing scripts, and deployment using FastChat and the 360AI platform.
Introduction: With the rise of ChatGPT and other large language models (LLMs) such as OpenAI GPT, Meta LLaMA, Alibaba Tongyi Qianwen, and Baidu Wenxin, generic models often give overly broad answers for specific scenarios, motivating fine‑tuning.
What is fine‑tuning: Fine‑tuning adapts a pre‑trained model to a target task or domain by further training on task‑specific data, leveraging the model’s general knowledge while specializing it.
Why fine‑tune: Transfer learning, data scarcity, and computational savings are the main reasons.
Typical fine‑tuning workflow: data preparation, model selection, hyper‑parameter setting, training, evaluation, and deployment.
Case study – fine‑tuning Qwen‑14B: The article walks through environment setup (4 × A100 GPUs), choice of model (Qwen‑14B), framework (360AI platform, FastChat API), data format (JSONL with a “conversations” field), and the DeepSpeed command used for training, including a full list of relevant parameters.
# $DATA is the data path
# $MODEL is the model path
deepspeed finetune_merge.py \
--report_to "none" \
--data_path $DATA \
--lazy_preprocess False \
--model_name_or_path $MODEL \
--output_dir /hboxdir/output \
--model_max_length 2048 \
--num_train_epochs 24 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 1 \
--save_strategy epoch \
--save_total_limit 2 \
--learning_rate 1e-5 \
--lr_scheduler_type "cosine" \
--adam_beta1 0.9 \
--adam_beta2 0.95 \
--adam_epsilon 1e-8 \
--max_grad_norm 1.0 \
--weight_decay 0.1 \
--warmup_ratio 0.01 \
--logging_steps 1 \
--gradient_checkpointing True \
--deepspeed "ds_config_zero3.json" \
--bf16 True \
--tf32 TrueTesting the fine‑tuned model: Example Python code using HuggingFace Transformers to generate a response, and the expected output.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_dir="/models/qwen-14b"
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True).eval()
inputs = tokenizer('你好啊,介绍下你自己', return_tensors='pt')
inputs = inputs.to(model.device)
pred = model.generate(**inputs)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))Deployment with FastChat: Steps to launch the controller, model worker, OpenAI‑compatible API server, and a curl example to query the service.
python -m fastchat.serve.controller --host 0.0.0.0 --port 21001
python -m fastchat.serve.model_worker --model-path /models/qwen-14b/ --host 0.0.0.0
python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8000
curl http://{{HOST}}:8000/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "qwen-14b", "messages": [{"role": "user", "content": "你是谁"}]}'360AI platform usage: Data upload, parameter configuration, resource allocation, and model serving screenshots are described.
References: Links to related articles on LLaMA‑2 self‑recognition fine‑tuning and Tongyi Qianwen fine‑tuning.
360 Smart Cloud
Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.