Anthropic Unveils Claude Opus 4.8: Same Price, Agent Power Beats GPT‑5.5
Anthropic released Claude Opus 4.8 with unchanged pricing, new inference‑strength controls, Dynamic Workflows for massive tasks, a fast mode 2.5× quicker and three‑times cheaper, and benchmark results showing its agent capabilities surpass GPT‑5.5 while improving honesty and alignment.
Model Release
Anthropic released Claude Opus 4.8, built on Opus 4.7, and made it available in all scenarios with pricing unchanged.
claude.ai users can freely control the model’s inference strength.
Claude Code adds a Dynamic Workflows feature that supports handling ultra‑large tasks.
Fast mode runs 2.5× faster than before and costs three times less than the prior fast mode.
Core Capability Comparison
Across code generation, agent abilities, reasoning, and practical knowledge work, Opus 4.8 outperforms both its predecessor and competing models. Full evaluation data are available in the Claude Opus 4.8 System Card: https://www.anthropic.com/claude-opus-4-8-system-card.
Claude Opus 4.8’s judgment is noticeably better. In Claude Code it always asks the right questions, catches errors, and proactively refutes unreasonable solutions, making it ideal for collaborative development. — Tom Pritchard, Staff Engineer
In our Super‑Agent benchmark, Claude Opus 4.8 was the only model to complete every end‑to‑end test case, beating the prior Opus and GPT‑5.5 at equal cost. Its reliability shines in translation, deep research, PPT creation, and analysis tasks. — Kay Zhu, Co‑Founder and CTO
On CursorBench, Claude Opus 4.8 surpasses the previous Opus at all effort levels, showing a clear efficiency gain in tool calls and requiring fewer steps to finish end‑to‑end tasks. — Michael Truell, Co‑Founder and CEO
In our legal‑agent benchmark, Opus 4.8 achieved the highest score ever and was the first model to exceed a 10% pass rate under strict standards, dramatically raising confidence for AI‑assisted lawyer work. — Niko Grupen, Head of Applied Research
Compared with Opus 4.7, Opus 4.8 feels like a genuine upgrade: faster, easier to collaborate with, and it maintains context and style throughout long conversations. I now trust only Opus 4.8 for style‑sensitive, technical work. — Katie Parrott, Staff Writer
It is the strongest computer‑operation and browser‑agent model we have tested, scoring 84% on Online‑Mind2Web—well above Opus 4.7 and GPT‑5.5. It stays reflective, never drops offline, and meets the reliability demands of client agent workflows. — Miguel Gonzalez, Tech Lead
Opus 4.8 uses tools cleanly and consistently, meeting the unattended‑run requirements of our autonomous engineering workflow. It fixes annotation redundancy and tool‑call issues present in Opus 4.7, accelerating developer productivity built on Devin. — Scott Wu, CEO
Our long‑term tests show Opus 4.8’s analysis quality is higher than the previous generation: faster completion, denser information output, and a markedly better signal‑to‑noise ratio. It also flags input‑output problems that other models often miss. — Michael Ran, Sr. Investment Associate
In CoCounsel Legal, Opus 4.8 improves consistency and reasoning quality over Opus 4.7, raising the trust bar for high‑risk professional workflows. — Joel Hron, Chief Technology Officer
Within Databricks’ Genie AI agent, the new Opus lifts agent reasoning by one level, handling deeper multi‑step problems faster than any prior Opus. Its multimodal ability processes PDFs, charts, and other unstructured content while cutting token cost by 61% versus Opus 4.7. — Hanlin Tang, CTO, Neural Networks
In Hebbia’s financial‑document pipeline, Opus 4.8 matches Opus 4.7’s quality while markedly improving citation precision and retrieval token efficiency, fitting dense document‑processing use cases. — Aabhas Sharma, CTO
Most Notable Improvement: Honesty
Opus 4.8 is far more likely to flag uncertainty and makes far fewer unfounded assertions. Internal testing shows code with problems passes only one‑quarter as often as with the previous model.
Alignment tests indicate Opus 4.8 reaches new highs on user‑autonomy and pro‑social traits, while the probabilities of deception and misuse are significantly lower than Opus 4.7, matching Anthropic’s top‑ranked Claude Mythos Preview. Full evaluation is in the System Card.
Other Features Launched Simultaneously
Dynamic Workflows : Research preview in Claude Code. Claude can launch hundreds of parallel sub‑agents within a single session, keep them running longer, verify outputs, and feed results back to the user. Example: paired with Opus 4.8, Claude Code can migrate a codebase of hundreds of thousands of lines, run the full test suite, and merge changes automatically. Available to Claude Code Enterprise, Team, and Max plans. Details: https://claude.com/blog/introducing-dynamic-workflows-in-claude-code
Inference‑Strength Control on claude.ai and Cowork : Users select an “effort” level. High effort yields deeper, higher‑quality reasoning; low effort gives faster, cheaper responses. All subscription tiers have access.
Messages API Supports System Entries : Developers can insert a system entry into the messages array, updating Claude’s instructions mid‑task without breaking the prompt cache. Enables dynamic changes to permissions, token budgets, or context during an agent run.
About Inference Strength
Opus 4.8 defaults to high effort, which Anthropic found to be the best quality‑experience balance. Token consumption for coding tasks is comparable to Opus 4.7, but performance is better.
Users may choose extra (called xhigh in Claude Code) or max for even more token usage and stronger results. The extra setting is recommended for difficult or long‑running asynchronous workflows. Claude Code has raised rate limits for high‑effort scenarios, allowing users to pick the setting that fits their projects.
Pricing and Availability
Claude Opus 4.8 is fully available today. Regular‑use pricing remains $5 per million input tokens and $25 per million output tokens. Fast mode costs $10 per million input tokens and $50 per million output tokens, three times cheaper than the previous fast mode. Developers can invoke the model via the Claude API using the model name claude-opus-4-8.
Anthropic describes Opus 4.8 as a modest yet tangible upgrade. The roadmap includes a lower‑cost model with comparable capabilities and a next‑generation model with higher intelligence. Under Project Glasswing, a limited set of organizations already use the Claude Mythos Preview for cybersecurity tasks; a full rollout is expected within weeks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Engineering
Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
