Artificial Intelligence 20 min read

Llama 2: Open Foundation and Fine‑Tuned Chat Models – Ghost Attention, RLHF Results, and Safety Evaluation

This article summarizes the Llama 2 series, describing the Ghost Attention technique for maintaining system‑message consistency across multi‑turn dialogs, presenting RLHF and human evaluation results, and discussing extensive safety pre‑training, benchmark assessments, and model release details.

Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Llama 2: Open Foundation and Fine‑Tuned Chat Models – Ghost Attention, RLHF Results, and Safety Evaluation

The article introduces Llama 2, an open‑source family of large language models and chat variants, providing links to the original arXiv paper and code repositories.

Ghost Attention (GAtt) is presented as a simple training trick inspired by context distillation that keeps the model attentive to system‑message instructions throughout multi‑turn conversations, preventing the RLHF model from forgetting initial directives after a few turns.

The authors describe how synthetic system‑message data are constructed by merging user‑assistant dialogues with varied constraints (e.g., hobbies, languages, personas) and how loss masking for early‑turn tokens is used during fine‑tuning.

RLHF Results show that GAtt enables dialogs to stay consistent for over 20 turns before reaching the maximum context length, and quantitative analyses indicate improved attention activation on system messages compared with baseline models.

Model‑based evaluation uses reward models to select the best RLHF checkpoint, while human evaluation with >4,000 prompts compares Llama 2‑Chat against other open‑source models and closed‑source baselines (ChatGPT, PaLM). Results demonstrate that Llama 2‑Chat outperforms most open models and approaches proprietary systems in usefulness and safety.

The safety section details pre‑training data analysis, bias and toxicity measurements (TruthfulQA, ToxiGen, BOLD), and language distribution, highlighting that the dataset is primarily English with minimal harmful content.

Benchmark evaluations of truthfulness, toxicity, and bias are reported, showing that Llama 2 improves truthfulness over Llama 1 but exhibits mixed results on toxicity and bias, partly due to less aggressive data filtering.

In conclusion, the authors summarize the architectural choices (RoPE, RMSNorm, SwiGLU, AdamW), training scale (up to 2 trillion tokens, context length 4096), and safety‑aligned fine‑tuning methods, emphasizing responsible release and future work.

The article also lists URLs for various Llama 2 model checkpoints (7B, 13B, 70B, chat variants) and defines key terminology such as Red Teaming, PPO, RMSNorm, Ghost Attention, and others.

Large Language ModelsAI evaluationRLHFGhost AttentionLlama 2model safety
Rare Earth Juejin Tech Community
Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.