Claude Opus 4.8 Hits Two 0% Honesty Scores in Just 41 Days

Anthropic released Claude Opus 4.8 only 41 days after Opus 4.7, delivering unprecedented 0 % lie‑rate and 0 % lazy‑answer rate, improving code‑defect silence by four‑fold, boosting SWE‑bench Pro to 69.2 % and GDPval‑AA to 1890 Elo, while adding Dynamic Workflows, Effort Control, a richer Messages API and a fast‑mode that runs 2.5× faster for a third of the cost.

AI honestyClaude Opus 4.8Dynamic Workflows

0 likes · 11 min read

Claude Opus 4.8 Hits Two 0% Honesty Scores in Just 41 Days

PaperAgent

Dec 5, 2025 · Artificial Intelligence

Can LLMs Be Trained to Confess? Inside the “Confession” Method for Honest AI

The article reviews OpenAI’s “Confession” training approach for large language models, explains why traditional RLHF fails to ensure honesty, details the confession methodology and PPO update, presents experimental results showing higher honesty rates, analyzes error cases, and discusses limitations and future risks.

AI honestyArtificial IntelligenceConfession Training

0 likes · 6 min read

Can LLMs Be Trained to Confess? Inside the “Confession” Method for Honest AI