Artificial Intelligence 7 min read

Comparison of Base LLM and Instruction Tuned LLM

The diagram contrasts a Base LLM, which merely predicts the next word from training data and can continue stories or answer simple facts but may generate unsafe text, with an Instruction‑Tuned LLM that is fine‑tuned via RLHF to understand and follow commands, delivering more accurate, useful, and safe responses.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
Comparison of Base LLM and Instruction Tuned LLM

图展示了两种大型语言模型(LLMs)的对比:基础语言模型(Base LLM)和指令调整语言模型(Instruction Tuned LLM)。

基础语言模型(Base LLM)能够预测下一个单词,这种预测是基于文本训练数据。功能:基础语言模型能够预测下一个单词,这种预测是基于文本训练数据。例子:图中提供了一个故事的开头:“Once upon a time, there was a unicorn that lived in a magical forest with all her unicorn friends(中文释义:从前,有一只独角兽和她所有的独角兽朋友住在魔法森林里)”,这展示了模型如何根据给定的文本继续生成故事。问答能力:它能够回答基础的问题,例如“法国的首都是什么?”,但它可能会产生问题文本,如有害输出,这些输出与基础语言模型的性质有关,因为它们仅基于预测下一个最可能的单词,而不是遵循特定的指令。

指令调整语言模型(Instruction Tuned LLM)功能:这种模型试图遵循指令,它通过在指令上进行微调(fine-tuning,周鸿祎大佬经常讲),并在尝试遵循这些指令时进行优化。微调方法:使用RLHF(Reinforcement Learning with Human Feedback,中文释义:人类反馈强化学习)进行微调,即结合强化学习和人类反馈。

总结来说,图中的主要区别在于基础模型侧重于文本数据的下一个词预测,而指令调整模型则侧重于理解和遵循指令,提供更准确、更有用、更安全的输出。

machine learningAILLMPrompt EngineeringAI applicationslanguage modelsinstruction tuningBASE model
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.