Fine‑Tuning Local LLaMA‑Factory Models and Building Networked AI Applications
The article walks through preparing a GPU‑enabled environment, downloading and LoRA‑fine‑tuning a DeepSeek model with LLaMA‑Factory, merging the adapter, then wrapping the model in a web UI that queries a ChromaDB vector store via crawled web data, illustrating security‑focused use cases and forecasting domain‑specific LLM adoption.
This article presents a step‑by‑step guide for customizing large language models (LLMs) locally using LLaMA‑Factory, and then extending the fine‑tuned model with network‑enabled capabilities.
1. LLaMA‑Factory Local Model Fine‑Tuning
Environment preparation : GPU T4 (8 GB), 36 GB RAM, Linux‑based OS.
Model download (DeepSeek‑R1‑Distill‑Qwen‑1.5B):
pip install -U huggingface_hub
pip install huggingface-cli
export HF_ENDPOINT=https://hf-mirror.com # optional China mirror
huggingface-cli download --resume-download deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B --local-dir /data/llm/models/DeepSeek-R1-Distill-Qwen-1.5BLLaMA‑Factory installation :
git clone https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e .Training command (LoRA fine‑tuning) :
llamafactory-cli train \
--stage sft \
--do_train True \
--model_name_or_path /data/llm/models/DeepSeek-R1-Distill-Qwen-1.5B \
--finetuning_type lora \
--template deepseek3 \
--learning_rate 0.0002 \
--num_train_epochs 10.0 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 8 \
--lr_scheduler_type cosine \
--output_dir saves/DeepSeek-R1-1.5B-Distill/lora/train_2025-03-02-15-57-16 \
--fp16 True \
--lora_rank 8 \
--lora_alpha 16 \
--lora_dropout 0 \
--lora_target allModel merging (base model + LoRA adapter):
llamafactory-cli export cust/merge_deepseekr1_lora_sft.yaml2. Local Model Network Function Development
Overall architecture : a web UI (LLama‑Factory) calls a local LLM, retrieves relevant documents from a ChromaDB vector store, and returns context‑aware answers.
LLM call wrapper (Python example):
def call_llm(prompt: str, with_context: bool = True, context: str | None = None):
# Build message list, optionally prepend system prompt and context
# Use ollama.chat(stream=True) to get streaming responses
passVector database initialization (ChromaDB with nomic‑embed‑text embeddings):
client = chromadb.PersistentClient(path="./web-search-llm-db")
collection = client.get_or_create_collection(name="docs", metadata={"hnsw:space": "cosine"})URL normalization (standardize IDs for vector storage):
def normalize_url(url):
url = url.replace("https://", "").replace("www.", "")
return url.replace("/", "_").replace("-", "_").replace(".", "_")Data ingestion : split crawled pages into 400‑character chunks with 100‑character overlap, store text, source URL and normalized ID in ChromaDB.
Web crawling (async crawler with BM25 filtering and robots.txt compliance):
async def crawl_webpages(urls: list[str], prompt: str) -> CrawlResult:
# fetch pages, filter with BM25, respect robots.txt, extract text via UnstructuredMarkdownLoader
passSearch integration (DuckDuckGo API, video‑site exclusion, robots.txt double‑check):
def get_web_urls(search_term: str, num_results: int = 10) -> list[str]:
# query DuckDuckGo, filter out youtube.com, vimeo.com, etc.
passThe article demonstrates the end‑to‑end workflow: fine‑tune, merge, launch the web UI, query the model, and observe the customized responses.
3. Business Scenario Exploration
Three security‑related use cases are discussed:
APK virus detection – using LLMs to spot malicious intent in app code and permissions.
URL security detection – AI‑driven analysis of phishing page language, certificate info, and navigation traces.
APP privacy inspection – comparing declared privacy policies with actual data‑access behavior, risk scoring, and anomaly detection.
4. Future Outlook and Practical Recommendations
The author predicts a shift toward vertical, domain‑specific LLMs, emphasizes the importance of data assets, application innovation, and solution integration as new competitive moats, and advocates for T‑shaped skill development (deep technical expertise + domain knowledge).
Overall, the guide provides a comprehensive, reproducible pipeline for turning a generic open‑source LLM into a customized, network‑aware AI assistant suitable for security‑oriented product scenarios.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.