GPT-5.4 Tied With Gemini 3.1 Pro On Artificial Analysis Intelligence Index, First Time A New OpenAI Model Hasn’t Topped Index Outright

For much of the last couple of years, new OpenAI model releases meant a new high on the intelligence indexes. That appears to have finally changed.

GPT-5.4, OpenAI’s latest and most capable model, has debuted on the Artificial Analysis Intelligence Index with a score of 57 — tied with Google’s Gemini 3.1 Pro Preview, which had held the top spot since its release two weeks ago. It is the first time that a new OpenAI frontline model has failed to push past the existing leader on the index. The benchmark covers ten evaluations: GDPval-AA, τ²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity’s Last Exam, GPQA Diamond, and CritPt.

A Streak Interrupted

To understand the significance of the tie, it helps to look at the trajectory of OpenAI’s recent model releases. When GPT-5 launched last August, it claimed the top spot on the Intelligence Index with a score of 69 — ahead of Grok 4, o3, and Gemini 2.5 Pro. GPT-5.2 continued that momentum: when Artificial Analysis revamped its index to v4.0 in early January, GPT-5.2 (xhigh) emerged at the top with a score of 50, ahead of Claude Opus 4.5 and Gemini 3 Pro.

That pattern was already under pressure before GPT-5.4 arrived. Gemini 3.1 Pro Preview claimed the top spot two weeks ago with a score of 57 — four points ahead of Claude Opus 4.6 and six ahead of Claude Sonnet 4.6. What made that more notable was Google achieving the leading position while costing less than half as much as Opus 4.6 (max) and GPT-5.2 (xhigh) to run the full index. GPT-5.4 has now matched that score, but not surpassed it.

The Current Standings

The current top of the index reads as follows: Gemini 3.1 Pro Preview and GPT-5.4 both sit at 57. GPT-5.3 Codex is at 54, Claude Opus 4.6 (max) at 53, and Claude Sonnet 4.6 (max) at 52. GPT-5.2 (xhigh) sits at 51. The GLM-5 from Zhipu AI scores 50.

The intelligence-versus-cost chart tells an equally interesting story. GPT-5.4 sits in the upper-right quadrant — high intelligence, but also among the more expensive models to run. Gemini 3.1 Pro, sitting at the same intelligence score, runs at a meaningfully lower cost, reinforcing the efficiency advantage Google established with the model’s original release.

Google’s Sustained Challenge

This is not the first time Google has applied pressure at the top of the index. Gemini 3 Pro took the top spot when it launched in November 2025, after which OpenAI and Anthropic traded positions with successive releases. The pattern has been a leapfrog of sorts: each new major model from one lab would nudge ahead, only for a rival to reclaim the lead weeks later.

What is different this time is that OpenAI has not managed the nudge. GPT-5.4 (xhigh) scores 57, identical to Gemini 3.1 Pro’s mark, and the over-time chart of frontier model intelligence shows the gap between the top few models has narrowed considerably since late 2025. All three major labs are now clustered within a few index points of each other — a sign of both how competitive the field has become and how difficult it is to pull decisively ahead.

What GPT-5.4 Does Bring

A tied Intelligence Index score does not mean GPT-5.4 is a disappointment. On several specific benchmarks outside the index, OpenAI’s model leads convincingly: its 75.0% on OSWorld-Verified for computer use far surpasses Claude Opus 4.6’s 72.7%, and on GDPval for professional knowledge work, GPT-5.4 reaches 83.0% against Opus 4.6’s 78.0%. OpenAI also says GPT-5.4 is its most factual model yet, with individual claims 33% less likely to be false compared to GPT-5.2.

The model is also OpenAI’s first general-purpose release with native computer-use capabilities — the ability to operate desktop environments, navigate applications, and complete multi-step agentic tasks across software. That puts it ahead of most rivals on a capability that is increasingly relevant for enterprise deployment, even if the headline index number is a tie.

The Cost Question

One area where GPT-5.4 clearly trails Gemini 3.1 Pro is cost efficiency at scale. OpenAI has priced GPT-5.4 at $2.50 per million input tokens and $15 per million output tokens — higher than GPT-5.2’s $1.75/$14. Gemini 3.1 Pro, by contrast, costs roughly half as much as GPT-5.2 to run the full Intelligence Index. For enterprises running large-scale workloads, the cost gap between equal-intelligence models matters significantly.

OpenAI argues that GPT-5.4’s greater token efficiency partially offsets the higher per-token price — the model is designed to complete tasks with fewer total tokens than GPT-5.2. Whether that translates to a competitive total cost of ownership in real-world workloads, against a Gemini 3.1 Pro that was already cheaper and is now equally capable by this measure, remains a live question for enterprise buyers.

What Comes Next

The tied result at 57 points reflects a broader dynamic visible in the over-time chart: frontier model intelligence growth has not stalled, but the gaps between the leading labs have compressed. The chart shows all three major providers — OpenAI, Google, and Anthropic — in a tight cluster at the top as of March 2026, with a steep upward trajectory continuing across all of them.

Anthropic appears to be next in line to release a top model. Claude Opus 4.6 (max) currently sits at 53 on the index, five points behind the leaders. Given the pace of recent releases — new frontline models arriving roughly every four to six weeks across the three labs — the current standings are unlikely to hold for long. Whether the next move comes from Anthropic, a further Google update, or an OpenAI revision, the top of the intelligence index has rarely been this competitive or this hard to hold.

Posted in AI