Artificial Intelligence 14 min read

Prompt Engineering, LLM Supervised Fine‑Tuning, and Mobile Tmall AI Assistant Application

The article explains prompt engineering techniques, supervised fine‑tuning of large language models, and their practical deployment in the Mobile Tmall AI shopping assistant, detailing ChatGPT’s generation steps, Transformer architecture, prompt clarity, delimiters, role‑play, few‑shot and chain‑of‑thought prompting, SFT versus pre‑training, LoRA adapters, data collection, Qwen‑14B training configuration, SDK‑based inference, and comprehensive evaluation.

DaTaobao Tech

Oct 25, 2023

Prompt Engineering, LLM Supervised Fine‑Tuning, and Mobile Tmall AI Assistant Application

This article introduces prompt design, large language model (LLM) supervised fine‑tuning (SFT), and their practical deployment in the Mobile Tmall AI shopping assistant project.

ChatGPT basics : The generation process consists of five steps – text preprocessing, token encoding with a multi‑layer Transformer encoder, token‑by‑token prediction using softmax, decoding through a Transformer decoder, and repeating prediction until a stop token or maximum length is reached.

Algorithm core – Transformer : The model is built from an encoder and a decoder, illustrated by the accompanying diagram.

Prompt design covers four essential techniques:

Clarity – use explicit, unambiguous language.

Delimiters – separate instructions and content with symbols such as ###, """, <> or '''.

Output format – specify the desired structure (e.g., JSON).

Role‑play – instruct the model to assume a specific persona (e.g., a professional sales assistant).

Examples of good vs. bad prompts are shown in tables, demonstrating how precise wording and formatting improve model responses.

Advanced prompting includes few‑shot learning, chain‑of‑thought (CoT) reasoning, and in‑context learning. Sample one‑shot CoT prompts illustrate how step‑by‑step reasoning yields correct arithmetic answers.

Supervised fine‑tuning (SFT) vs. pre‑training:

Pre‑training learns next‑token prediction from massive unlabeled data, giving the model general language understanding.

Instruction fine‑tuning (SFT) uses labeled instruction‑response pairs to align the model with human intents, especially for domain‑specific tasks.

Related techniques such as P‑tuning, P‑tuning V2, and LoRA are described. LoRA adds low‑rank adapters to keep most model parameters frozen, enabling efficient adaptation with limited resources.

C‑Eval is a comprehensive Chinese benchmark covering humanities, social sciences, and STEM subjects, containing 13,948 questions across 52 disciplines.

Data collection for the AI assistant involves:

Gathering seed e‑commerce queries from conversation logs.

Generalizing questions via prompt‑driven generation.

Human annotation of high‑quality data.

Self‑instruction to expand the dataset using LLMs.

Model training uses the Qwen‑14B base model with the following command‑line configuration:

params="--stage sft \
--model_name_or_path /data/oss_bucket_0/Qwen_14B_Chat_ms_v100/  \
--do_train \
--dataset_dir data \
--dataset xuanji \
--template chatml  \
--finetuning_type  full  \
--output_dir file_path  \
--overwrite_cache \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine  \
--logging_steps 5 \
--save_strategy epoch \
--save_steps 10000 \
--learning_rate 2e-6 \
--num_train_epochs 3.0 \
--warmup_ratio 0.15 \
--warmup_steps 0 \
--weight_decay 0.1 \
--fp16 ${fp16} \
--bf16 ${bf16} \
--deepspeed ds_config.json \
--max_source_length 4096 \
--max_target_length 4096 \
--use_fast_tokenizer False \
--is_shuffle True \
--val_size 0.0 \"

The training job is submitted on PAI with:

pai -name pytorch112z \
-project algo_platform_dev \
-Dscript='${job_path}' \
-DentryFile='-m torch.distributed.launch --nnodes=${workerCount} --nproc_per_node=${node}  ${entry_file}' \
-DuserDefinedParameters="${params}" \
-DworkerCount=${workerCount} \
-Dcluster=${resource_param_config} \
-Dbuckets=${oss_info}${end_point}

Model deployment & inference examples:

DashScope Python SDK:

import dashscope
from dashscope import Generation
from http import HTTPStatus

dashscope.api_key = 'your-dashscope-api-key'

response_generator = Generation.call(
    model='model_name',
    prompt=build_prompt([
        {'role':'system','content':'content_info'},
        {'role':'user', 'content':'query'}
    ]),
    stream=True,
    use_raw_prompt=True,
    seed=random_num
)

for resp in response_generator:
    if resp.status_code == HTTPStatus.OK:
        print(resp.output)
    else:
        print('Failed request_id: %s, status_code: %s, code: %s, message:%s' %
              (resp.request_id, resp.status_code, resp.code, resp.message))

Whale private‑cloud SDK:

from whale import TextGeneration
import json

TextGeneration.set_api_key("api_key", base_url="api_url")

config = {"pad_token_id": 0, "bos_token_id": 1, "eos_token_id": 2, "user_token_id": 0, "assistant_token_id": 0, "max_new_tokens": 2048, "temperature": 0.95, "top_k": 5, "top_p": 0.7, "repetition_penalty": 1.1, "do_sample": False, "transformers_version": "4.29.2"}
prompt = [{"role": "user", "content": "content_info"}]

response = TextGeneration.call(
    model="model_name",
    prompt=json.dumps(prompt),
    timeout=120,
    streaming=True,
    generate_config=config)

for event in response:
    if event.status_code == 200:
        if not event.finished:
            print(event.output['response'], end="")
    else:
        print('error_code: [%d], error_message: [%s]' % (event.status_code, event.status_message))

Evaluation combines public benchmarks (knowledge, reasoning, multilingual) and internal business tests (150 questions per task). Model performance is monitored via logging and periodic reviews.

References include seminal works such as "Attention Is All You Need" and recent open‑source LLM resources (Qwen‑14B, ChatGLM‑6B, Baichuan2, Stanford Alpaca, etc.).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Prompt Engineering Transformer Model Deployment AI Assistant LLM fine-tuning

Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.