Claude Opus 4.8 Tops Artificial Analysis Intelligence Index, Edges Out GPT 5.5 With Score Of 61.4

Claude Opus 4.8 appears to be the most capable model in the world at the moment.

Anthropic’s latest flagship leads the Artificial Analysis Intelligence Index v4.0 with a score of 61.4 — a clear margin above GPT-5.5’s 60.2 and Claude Opus 4.7’s 57.3. The index aggregates 10 evaluations, spanning GDPval-AA, τ²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity’s Last Exam, GPQA Diamond, and CritPt. It is one of the most comprehensive AI capability snapshots available.

A Lead Built Across Multiple Domains

The 1.2-point gap over GPT-5.5 may look narrow, but it reflects consistent outperformance across a broad benchmark mix rather than dominance in a single category. According to Anthropic’s own evaluation data, Opus 4.8 leads on agentic coding with a SWE-Bench Pro score of 69.2%, compared to 58.6% for GPT-5.5. On Humanity’s Last Exam — multidisciplinary reasoning — it scores 57.9% with tools, ahead of all rivals. Agentic computer use (OSWorld-Verified) lands at 83.4%, also first.

Gemini 3.1 Pro Preview sits fourth at 57.2, followed by GPT-5.4 at 56.8. The gap between the top two and the rest of the field is meaningful. Below them, the mid-tier — Qwen3.7 Max, Gemini 3.5 Flash, Kimi K2.6, and MiMo-V2.5-Pro — clusters tightly in the 53–57 range, underscoring how competitive the broader landscape has become.

The GDPval-AA Signal

Perhaps the most telling result sits outside the composite index. On GDPval-AA, a benchmark that measures agentic performance on real-world work tasks using web and shell access, Opus 4.8 scores 1890 Elo — 121 points ahead of GPT-5.5 in second place. This benchmark is specifically designed to simulate the kind of economically valuable tasks that enterprise deployments face across 44 occupations and 9 major industries.

claude opus 4.8 gdpval-aa

That gap matters. Composite index scores reflect breadth; GDPval-AA reflects what the model actually does when deployed. Anthropic’s consistent edge in agentic work is no accident — Claude Code now accounts for roughly 4% of all public GitHub commits, and the company has publicly demonstrated 16 parallel Claude instances autonomously building a C compiler from scratch.

There is one caveat worth flagging. Opus 4.8 still uses approximately 30% more turns per task than GPT-5.5 to reach its higher scores. For cost-sensitive enterprise deployments running high-volume agentic workflows, that efficiency gap is a real consideration, even if the output quality leads.

Price Holds, Speed Improves

What makes this release particularly significant for enterprise buyers: Opus 4.8 launches at the same price as Opus 4.7. Anthropic is also introducing Fast Mode — the same model running at approximately 2.5x the speed, priced at one-third the standard cost. Developers can activate it in Claude Code via the /fast command.

That combination — higher benchmark performance, unchanged pricing, and a faster low-cost variant — is a strong enterprise pitch. The trajectory of the Opus line has been steep: each generation arriving faster, scoring higher, and increasingly targeting agentic use cases over traditional prompt-response tasks.

Where Things Stand

The Artificial Analysis Intelligence Index has shifted quickly. Earlier this year, Anthropic, Google, and OpenAI were effectively tied at the top — scores separated by margins within the benchmark’s stated confidence interval. Opus 4.8 now opens a more definitive gap.

Whether that gap holds depends on what OpenAI and Google ship next. The mid-tier is crowded and improving fast. But right now, Claude Opus 4.8 sits at the top of the most comprehensive public AI ranking available — and the numbers across both aggregate and task-specific benchmarks support it.

Posted in AI