Claude Sonnet 4.6 Takes Second Spot In Artificial Analysis Intelligence Index, Beats GPT-5.2

The top two smartest AI models in the world currently belong to the same company.

Anthropic’s Claude Sonnet 4.6 has claimed second place on the Artificial Analysis Intelligence Index (AAII), scoring 51 points and sitting just two points behind its stablemate, Claude Opus 4.6 (53). Interestingly, Claude Sonnet 4.6, Anthropic’s cheaper, faster model is now ahead of OpenAI’s flagship GPT-5.2 (xhigh) model on the Artificial Analysis index.

A Narrowing Gap at the Top

The Artificial Analysis Intelligence Index v4.0 is a synthesis metric incorporating ten evaluations spanning agentic task performance, coding, scientific reasoning, and more — including GDPval-AA, Terminal-Bench Hard, SciCode, GPQA Diamond, and Humanity’s Last Exam. Claude Sonnet 4.6’s score of 51 represents an 8-point jump from its predecessor, Sonnet 4.5 (Reasoning, 43), and puts it essentially tied with OpenAI’s GPT-5.2 (xhigh, also 51).

Perhaps more telling than the raw score is what it signals about the competitive gap between Anthropic’s own models. When Opus 4.5 and Sonnet 4.5 were compared, Opus led by 7 points. That gap has now compressed to just 2 points, showing how smaller flash-like models are now catching up their bigger thinking counterparts.

Agentic Dominance

While Sonnet 4.6 trails Opus 4.6 on the overall index, it actually surpasses its bigger sibling on two specific evaluations: GDPval-AA, which measures performance on real-world agentic work tasks, and TerminalBench, which tests agentic coding and terminal use. On GDPval-AA, Sonnet 4.6 scored 1,633 versus Opus 4.6’s 1,606. On TerminalBench, it posted 53% against Opus’s 46%.

This means that for developers building agentic applications — autonomous systems that take sequences of actions in the real world — Sonnet 4.6 is currently the strongest model Artificial Analysis has tested, across any provider.

The Price-Performance Equation

For cost-conscious enterprises, the calculus is nuanced. Sonnet 4.6 is priced at $3/$15 per million input/output tokens — 40% lower than Opus 4.6’s $5/$25 — and its per-token pricing is unchanged from Sonnet 4.5. On paper, that makes it the more economical choice.

In practice, however, Sonnet 4.6 is substantially less token-efficient than its predecessors. Running Anthropic’s Intelligence Index evaluations consumed approximately 74 million output tokens in max effort mode — roughly three times the ~25 million used by Sonnet 4.5 (Reasoning), and 28% more than the ~58 million consumed by Opus 4.6. In total, the benchmark run for Sonnet 4.6 cost $2,088, compared to $733 for Sonnet 4.5 and $2,486 for Opus 4.6.

The result: Sonnet 4.6 remains cheaper than Opus 4.6 in absolute terms, but only modestly so. The 40% per-token discount is significantly eroded by higher token usage, narrowing the pool of use cases where Sonnet clearly beats Opus on cost grounds.

Adaptive Thinking: A New Control Paradigm

Claude Sonnet 4.6 introduces the same “adaptive thinking” mode previously debuted with Opus 4.6, replacing Anthropic’s earlier “extended thinking” framework. Rather than specifying a thinking token budget, developers now control model reasoning intensity through an “effort” setting — with options for low, medium, high, and max. Artificial Analysis evaluated Sonnet 4.6 in adaptive thinking mode at max effort.

Expanded Capabilities

Alongside its intelligence gains, Sonnet 4.6 ships with meaningful infrastructure upgrades. Its context window expands to 1 million tokens (currently in beta), up from Sonnet 4.5’s 200K standard window. Maximum output tokens double from 64K to 128K, matching Opus 4.6’s ceiling. Pricing remains unchanged at $3/$15 per million input/output tokens.

The model is available via Anthropic’s first-party API, Google Vertex, AWS Bedrock, and Microsoft Azure, and is also accessible through Claude Chat, Claude Cowork, and Claude Code.

What This Means for the Market

The fact that Anthropic now holds the top two positions on a major third-party intelligence index is a significant commercial signal. It validates Anthropic’s technical roadmap at a moment when the frontier AI race has never been more competitive, with GPT-5.2, DeepSeek V3.2, and Google’s Gemini 3 Pro all vying for positions in the top ten.

For enterprises evaluating AI vendors, the picture that emerges is one of increasing complexity. Sonnet 4.6 is the superior model for agentic use cases, but its token intensity means deployment costs may be higher than the headline pricing suggests. Organizations with latency-sensitive or cost-sensitive workloads may find Sonnet 4.5 still competitive for many tasks, while those prioritizing raw capability across the broadest range of benchmarks now have two compelling Anthropic options to weigh against each other — and the rest of the field.