Artificial Intelligence 16 min read

Research on Domain Large Models by Fudan University Knowledge Workshop Lab

This article presents the Fudan University Knowledge Workshop Lab's comprehensive research on domain large models, covering background, domain adaptation, capability enhancement, collaborative workflows, challenges such as inference cost and alignment, and proposed solutions including source‑enhanced training, self‑correction mechanisms, and hybrid retrieval‑augmented generation.

DataFunSummit

Sep 13, 2024

Research on Domain Large Models by Fudan University Knowledge Workshop Lab

The presentation introduces the research work of Fudan University Knowledge Workshop Lab on domain large models, outlining four main sections: background, domain adaptation, capability improvement, and collaborative work.

Background: Large models like GPT‑4 have shown strong general knowledge and reasoning abilities, raising questions about their impact on knowledge engineering and whether they can replace traditional knowledge graphs.

Domain Adaptation: The lab discusses challenges in selecting and balancing training data for domain‑specific models, proposes a source‑enhanced tagging method to distinguish data origins, and demonstrates its effectiveness in downstream tasks. They also explore systematic corpus classification to improve pre‑training.

Capability Improvement: Emphasis is placed on enhancing models' ability to follow complex instructions, self‑correction through multi‑step answer generation, and applying these techniques to command generation and unit‑aware reasoning, achieving performance surpassing existing models.

Collaborative Work: The authors argue that large models should complement, not replace, smaller models. They propose a hybrid workflow where traditional models handle routine extraction tasks while large models address knowledge verification, correction, and few‑shot learning. Strategies for knowledge extraction, integrated extraction pipelines, and domain‑specific verification are presented.

Additional topics include the challenges of inference cost, limited applicability in complex decision‑making, and the need for reliable, traceable answers via retrieval‑augmented generation (RAG). The paper concludes with a discussion on decoding hard‑constraints to ensure factual grounding in generated responses.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI research domain adaptation Knowledge Graphs self-correction

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.