Artificial Intelligence 9 min read

Practices and Techniques for Vertical Domain Large Language Models

Vertical domain large language models, fine‑tuned on specialized data, deliver higher expertise and task performance, but require continual knowledge updates and careful alignment; techniques such as BPO‑guided instruction tuning (+1.8% accuracy), Reflexion‑based Text2API (+4% API correctness), advanced RAG preprocessing, and SFT combined with ORPO (+5.2% gain) demonstrate notable improvements while underscoring remaining challenges and collaborative opportunities.

DaTaobao Tech
DaTaobao Tech
DaTaobao Tech
Practices and Techniques for Vertical Domain Large Language Models

Vertical domain large models are built on a base LLM and further trained with domain-specific knowledge, offering higher expertise and utility compared to general models.

Advantages : domain professionalism, high-quality output, better performance on specific tasks.

Challenges : accuracy requirements, frequent knowledge‑base updates, limited applicability to unrelated queries.

Alignment Enhancement (BPO) : Step1 – give model A an init instruction to generate a bad answer A1′ for a standard Q&A pair; Step2 – use GPT‑4 to compare good answer A1 and bad answer A1′ with the question and tune the instruction; Step3 – train a seq2seq model that maps a question Q to the tuned instruction; Step4 – prepend the generated instruction to the user query before feeding it to the LLM. This method improved answer accuracy by 1.8%.

Text2API : Initial use of LangChain’s React framework suffered from hallucinated parameters and long call chains. Switching to the Reflexion framework with verbal reinforcement learning and adding alignment prompts raised API‑call accuracy by 4%.

RAG : Retrieval‑augmented generation is effective for vertical LLMs, but handling heterogeneous documents (PDFs, flowcharts, screenshots) requires preprocessing, such as using LLMs to describe flowcharts and reorganizing text chunks via semantic clustering and recursive summarization.

SFT & ORPO : Collected tens of thousands of annotated evaluation cases to select base models and fine‑tuning methods. Evaluated models with embedding similarity, human scoring, and GPT‑4 scoring. Integrated public datasets (COIG‑CQIA, alpaca‑gpt4‑data‑cn) to mitigate domain‑specific degradation. Applying ORPO as a penalty term in SFT yielded an additional ~5.2% improvement.

The team’s one‑year experience demonstrates progress in vertical LLM deployment while highlighting remaining gaps and inviting further collaboration.

AIlarge language modelsRAGalignmentSFTText2APIvertical domain
DaTaobao Tech
Written by

DaTaobao Tech

Official account of DaTaobao Technology

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.