Artificial Intelligence 10 min read

Fine‑Tuning Large Language Models: A Practical Guide Using Qwen‑14B on the 360AI Platform

This article explains the concept, motivations, and step‑by‑step workflow for fine‑tuning large language models—specifically Qwen‑14B—covering data preparation, training commands with DeepSpeed, hyper‑parameter settings, evaluation, and deployment via FastChat, all illustrated with code snippets and configuration details.

360 Tech Engineering

Apr 15, 2024

Fine‑Tuning Large Language Models: A Practical Guide Using Qwen‑14B on the 360AI Platform

Introduction – With the rapid rise of ChatGPT and other large language models (LLMs) such as OpenAI GPT, Meta LLaMA, Alibaba Tongyi Qwen, and Baidu Wenxin, users find generic responses impressive but often too broad for specific scenarios. Fine‑tuning offers a way to adapt these models to particular domains.

What is Fine‑Tuning? – Fine‑tuning continues training a pre‑trained deep‑learning model on a task‑specific dataset, leveraging the model’s general knowledge while adapting it to specialized requirements.

Why Fine‑Tune? – Benefits include transfer learning (better language understanding), handling data scarcity, and saving computational resources compared to training from scratch.

Main Fine‑Tuning Steps

Prepare data: collect, clean, and preprocess task‑relevant data.

Select a base model appropriate for the task.

Set hyper‑parameters (epochs, learning rate, sequence length, etc.).

Run the fine‑tuning training loop.

Evaluate the fine‑tuned model on a test set.

Deploy the model as a service.

Case Study: Fine‑Tuning Qwen‑14B (Tongyi Qwen)

Environment: four NVIDIA A100 GPUs (checked via nvidia-smi).

Framework and tools: 360AI platform for training, FastChat for an OpenAI‑compatible RESTful API.

Data Format – Training data is stored in .jsonl files, each line containing an id, source, and a conversations array with from and value fields. Example snippet:

[
    {
        "id": "112720",
        "source": "cot",
        "conversations": [
            {"from": "user", "value": "你好"},
            {"from": "assistant", "value": "您好，我是小智，一个由360智汇云开发的 AI 助手..."}
        ]
    }
]

Training Command (DeepSpeed)

# $DATA = data path, $MODEL = model path
 deepspeed finetune_merge.py \
    --report_to "none" \
    --data_path $DATA \
    --lazy_preprocess False \
    --model_name_or_path $MODEL \
    --output_dir /hboxdir/output \
    --model_max_length 2048 \
    --num_train_epochs 24 \
    --per_device_train_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --save_strategy epoch \
    --save_total_limit 2 \
    --learning_rate 1e-5 \
    --lr_scheduler_type "cosine" \
    --adam_beta1 0.9 \
    --adam_beta2 0.95 \
    --adam_epsilon 1e-8 \
    --max_grad_norm 1.0 \
    --weight_decay 0.1 \
    --warmup_ratio 0.01 \
    --logging_steps 1 \
    --gradient_checkpointing True \
    --deepspeed "ds_config_zero3.json" \
    --bf16 True \
    --tf32 True

Key parameter groups are explained: data‑related (path, split, max_seq_len), model‑related (LoRA options), training‑related (batch size, learning rate, epochs, gradient accumulation), and DeepSpeed specifics (zero_stage, offload, gradient_checkpointing).

Model Evaluation – Example Python code using HuggingFace Transformers to generate a response after fine‑tuning:

from transformers import AutoModelForCausalLM, AutoTokenizer
model_dir = "/models/qwen-14b"
tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto", trust_remote_code=True).eval()
inputs = tokenizer('你好啊，介绍下你自己', return_tensors='pt').to(model.device)
pred = model.generate(**inputs)
print(tokenizer.decode(pred.cpu()[0], skip_special_tokens=True))

Deployment with FastChat

Start the FastChat controller:

python -m fastchat.serve.controller --host 0.0.0.0 --port 21001

Launch the model worker:

python -m fastchat.serve.model_worker --model-path /models/qwen-14b/ --host 0.0.0.0

Run the OpenAI‑compatible API server:

python -m fastchat.serve.openai_api_server --host 0.0.0.0 --port 8000

Test via the official OpenAI SDK or a curl request, e.g.,

curl http://{{HOST}}:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen-14b", "messages": [{"role": "user", "content": "你是谁"}]}'

360AI Platform Usage – The platform provides a UI for uploading data, configuring fine‑tuning hyper‑parameters (learning rate, batch size, epochs), selecting resources, and monitoring training progress. After submission, the system logs the fine‑tuning status and allows model deployment with configurable resources.

References: “Fine‑Tuning Llama2 Self‑Awareness” and “Tongyi Qwen Fine‑Tuning”.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Model Deployment Large Language Models fine-tuning DeepSpeed FastChat Qwen-14B

Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.