Anthropic Releases Claude Opus 4.8, Beats Opus 4.7, GPT-5.5 On Many Benchmarks

New model releases are still coming in thick and fast from the frontier labs.

Anthropic has announced Claude Opus 4.8, the latest addition to its flagship model family, and the numbers are compelling. The new model outperforms its predecessor, Opus 4.7, across most major benchmarks — and beats OpenAI’s GPT-5.5 and Google’s Gemini 3.1 Pro in several key categories. What makes the launch particularly notable for enterprise customers: it arrives at the same price as Opus 4.7.

What the benchmarks show

According to Anthropic’s own evaluation data, Opus 4.8 leads the pack on agentic coding (SWE-Bench Pro) with a score of 69.2%, compared to 64.3% for Opus 4.7, 58.6% for GPT-5.5, and 54.2% for Gemini 3.1 Pro. On multidisciplinary reasoning (Humanity’s Last Exam), it scores 49.8% without tools and 57.9% with tools — ahead of all three rivals. Agentic computer use (OSWorld-Verified) sees Opus 4.8 reach 83.4%, nudging past Opus 4.7’s 82.8% and comfortably ahead of GPT-5.5 (78.7%) and Gemini 3.1 Pro (76.2%). On knowledge work (GDPval-AA), it scores 1890, above GPT-5.5’s 1769 and Opus 4.7’s 1753. Agentic financial analysis (Finance Agent v2) rounds things out at 53.9%, leading the field.

The one benchmark where Opus 4.8 does not top the table is agentic terminal coding (Terminal-Bench 2.1), where GPT-5.5 leads at 78.2%. Opus 4.8 scores 74.6% there — still ahead of Gemini 3.1 Pro’s 70.3% and Opus 4.7’s 66.1%.

Speed, price, and a new Fast Mode

Alongside the model itself, Anthropic is introducing Fast Mode for Opus 4.8 — the same model running at approximately 2.5x the speed, and priced at one-third the previous cost. Developers can activate it directly in Claude Code using the /fast command. API access is available by contacting an account manager or joining the waitlist.

Built for longer, more autonomous work

Anthropic is positioning Opus 4.8 squarely at the agentic use case — tasks that require a model to work independently over extended sessions without constant human input. In Claude Code, the company says the model “makes calls like an experienced engineer without needing constant check-ins,” staying on track across long-running sessions and following work through in a repository. The pitch to developers and engineering teams is straightforward: hand off a feature build or a bug sweep and focus elsewhere while the model handles the execution.

Dynamic workflows: a research preview

Also shipping today is a research preview of dynamic workflows inside Claude Code. Designed for the most complex tasks — think a migration touching hundreds of files — the system allows Claude to generate a plan, spin up hundreds of parallel subagents to execute it, and verify the results before reporting back. It is Anthropic’s most direct answer yet to the growing demand for AI that can tackle large, multi-step engineering problems end-to-end.

The broader picture

Opus 4.8 continues a pattern from Anthropic: incremental but meaningful gains on core capabilities, paired with a sharper focus on agentic and enterprise workflows. The combination of benchmark improvements, same-tier pricing, and a significantly cheaper fast variant gives teams a straightforward upgrade path. For businesses already running Claude in production, the case for switching to 4.8 requires little deliberation. For those still evaluating the frontier, the benchmark performance against GPT-5.5 and Gemini 3.1 Pro adds a fresh data point to what has become an increasingly competitive field.