Task‑Aware Decoding (TaD): A Plug‑and‑Play Method to Mitigate Hallucinations in Large Language Models
TaD, a task‑aware decoding technique jointly developed by JD.com and Tsinghua University and presented at IJCAI 2024, leverages differences between pre‑ and post‑fine‑tuned LLM outputs to construct knowledge vectors, significantly reducing hallucinations across various models, tasks, and data‑scarce scenarios, especially when combined with RAG.
TaD: Task‑aware Decoding (TaD) is a technique proposed by JD.com and Tsinghua University to address hallucinations in large language models (LLMs). The work was accepted at IJCAI 2024.
RAG: Retrieval‑augmented Generation (RAG) is a widely used systematic solution for mitigating LLM hallucinations.
1. Background
Recent generative LLMs such as ChatGPT have sparked a new AI wave, achieving near‑human level dialogue quality but also exhibiting hallucinations—incorrect or misleading outputs—that hinder reliable deployment in high‑stakes domains.
This article explores solutions to the LLM hallucination problem.
2. Related Research
LLMs are fundamentally language models that predict token probabilities. Their hallucinations stem from data, training, and inference stages, and cannot be completely eliminated.
2.1 Data‑induced Hallucination
Low‑quality, incomplete, or outdated training data cause hallucinations. Strategies include data cleaning, knowledge editing, and RAG.
Data Cleaning – Collect more high‑quality factual data and clean existing corpora.
Knowledge editing can be parameter‑based or external‑module‑based, but both have limitations; thus, RAG is often preferred.
Knowledge Editing – Adjust model parameters or add external modules to bridge knowledge gaps, though risks remain.
RAG – Introduces an external retrieval step, feeding retrieved documents together with the prompt to the LLM, thereby reducing hallucinations.
2.2 Training‑induced Hallucination
LLM training suffers from single‑directional representation limits, attention defects, and exposure bias. Fine‑tuning methods (SFT, RLHF) can also introduce hallucinations when label data exceed the model’s knowledge.
2.3 Inference‑induced Hallucination
Decoding randomness (e.g., high temperature) and attention deficiencies increase hallucination risk.
DoLa (Decoding by Contrasting Layers) – Computes logits differences between upper and lower transformer layers to amplify factual knowledge and suppress hallucination.
3. Technical Breakthrough
RAG is effective but still relies on LLM output quality. JD Retail and Tsinghua University propose Task‑aware Decoding (TaD) , a plug‑and‑play method that compares pre‑ and post‑fine‑tuned LLM output distributions to build a knowledge vector, improving factuality without altering the base model.
TaD Principle – After fine‑tuning, the probability of task‑relevant tokens (e.g., “catalyze”) increases while generic tokens (e.g., “engage”) decrease, indicating better alignment with downstream tasks.
Knowledge Vector – Represents the shift in conditional probability distributions between pre‑ and post‑fine‑tuned models, capturing the adaptation from general to task‑specific knowledge.
Experimental Results
TaD was evaluated on multiple LLMs using LoRA, Adapter, etc., across various tasks. Tables 1‑4 (shown as images) demonstrate consistent performance gains, especially when training data are scarce.
4. Deployment Cases
Combining RAG with TaD yields a low‑hallucination LLM for JD’s knowledge‑question‑answer system, now serving over 6,000 business scenarios.
5. Outlook
Future work may involve more complex systems (RAG + Agent + tools), deeper integration of external knowledge with LLM reasoning, and continued research on low‑hallucination LLMs such as TaD.
6. Conclusion
Mitigating LLM hallucination requires a multi‑level approach; while no single solution is definitive, techniques like TaD and RAG together substantially improve reliability and enable broader AI adoption.
References
[1] Hallucination is Inevitable: An Innate Limitation of Large Language Models
[2] A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions
[3] Unveiling the Causes of LLM Hallucination and Overcoming LLM Hallucination
[4] Editing Large Language Models: Problems, Methods, and Opportunities
[5] ACL 2023 Tutorial: Retrieval‑based Language Models and Applications
[6] Theoretical Limitations of Self‑Attention in Neural Sequence Models
[7] Sequence level training with recurrent neural networks.
[8] Discovering language model behaviors with model‑written evaluations
[9] Dola: Decoding by contrasting layers improves factuality in large language models
[10] BERT rediscovers the classical NLP pipeline
[11] Retrieval‑Augmented Generation for Large Language Models: A Survey
[12] TaD: A Plug‑and‑Play Task‑Aware Decoding Method to Better Adapt LLM on Downstream Tasks
[13] Inference‑time intervention: Eliciting truthful answers from a language model
[14] Beyond RAG: Building Advanced Context‑Augmented LLM Applications
JD Tech Talk
Official JD Tech public account delivering best practices and technology innovation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.