Google is back to the top of the AI model pile, and it’s back with a bang.
Gemini 3.1 Pro Preview has claimed the top position on the Artificial Analysis Intelligence Index v4.0, scoring 57 points — four points ahead of Claude Opus 4.6 in second place, and six points clear of Claude Sonnet 4.6 in third. What makes the achievement all the more striking is that Google has achieved this leading position while keeping costs well below those of its nearest competitors: Gemini 3.1 Pro Preview costs less than half as much as Opus 4.6 (max) and GPT-5.2 (xhigh) to run the full Intelligence Index.

Leading Intelligence at Lower Cost
The Artificial Analysis Intelligence Index v4.0 incorporates ten evaluations: GDPval-AA, τ²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity’s Last Exam, GPQA Diamond, and CritPt. Gemini 3.1 Pro Preview leads in six of these ten categories, including Terminal-Bench Hard (agentic coding), AA-Omniscience (knowledge and hallucination reduction), Humanity’s Last Exam (reasoning and knowledge), GPQA-Diamond (scientific reasoning), SciCode (coding), and CritPt (research-level physics reasoning).
The CritPt score is particularly noteworthy. Gemini 3.1 Pro Preview scored 18% on unpublished, research-level physics reasoning problems — more than five percentage points above the next best model. In total cost terms, running the full Artificial Analysis Intelligence Index costs $892 with Gemini 3.1 Pro Preview, compared to over $1,800 for frontier peers from Anthropic and OpenAI at their highest reasoning settings.
Token Efficiency and Speed
One of the more remarkable aspects of Gemini 3.1 Pro’s performance is how it has improved without becoming significantly more verbose or expensive. The model uses approximately 57 million tokens to run the full Intelligence Index — only around one million more than its predecessor, Gemini 3 Pro Preview. This represents a cost increase of roughly $72 to complete the benchmark suite, a modest premium for a substantial capability jump.
By contrast, frontier models from OpenAI and Anthropic at maximum reasoning settings consume considerably more tokens. This token efficiency, combined with Gemini 3.1 Pro Preview’s pricing of $2 per million input tokens and $12 per million output tokens (for contexts up to 200,000 tokens), keeps it cost-competitive even as it leads the intelligence leaderboard. It does, however, still cost approximately twice as much as leading open-weights models such as GLM-5 ($547 for the full index) and Kimi K2.5.
On speed, Gemini 3.1 Pro Preview averages 114 output tokens per second. While this is around 10 tokens per second slower than its predecessor, it remains one of the fastest models in the top ten of the Intelligence Index, trailing only other Google models — Gemini 3 Flash and Gemini 3 Pro Preview.
Dramatic Reduction in Hallucinations
Perhaps the single most impressive improvement in Gemini 3.1 Pro Preview is its reduction in hallucination rate. On Artificial Analysis’s AA-Omniscience benchmark, which rewards models for knowing the answer and penalises incorrect guesses, the model’s hallucination rate dropped from 88% to 50% — a reduction of 38 percentage points compared to Gemini 3 Pro Preview.
This improvement pushed the model’s Omniscience Index score up by 17 points, driven primarily by the model becoming far less likely to confabulate answers when uncertain. Its factual accuracy (AA-Omniscience accuracy of 53%) remained broadly comparable to its predecessor, meaning the gains came from better calibration rather than simply knowing more.
Agentic Performance: Improved, But Not Leading
Not every category sees Google on top. In GDPval-AA, Artificial Analysis’s agentic evaluation focusing on real-world tasks, Gemini 3.1 Pro Preview improved its ELO score by over 100 points to 1,316, up from its predecessor. However, it still sits behind Claude Sonnet 4.6, Claude Opus 4.6, GPT-5.2 (xhigh), and even GLM-5 in this area. For organisations that depend heavily on autonomous agentic workflows — where models must plan and execute multi-step real-world tasks — Anthropic’s Claude models and OpenAI’s GPT-5.2 retain an edge.
Multimodal Leadership Reinforced
Beyond text and reasoning benchmarks, Google continues to extend its dominance in multimodal AI. Gemini 3.1 Pro Preview ranks first on MMMU-Pro, Artificial Analysis’s multimodal understanding and reasoning benchmark, ahead of Gemini 3 Pro Preview and Gemini 3 Flash. Google now occupies the top three positions in multimodal reasoning — a clean sweep that underscores the company’s sustained investment in vision-language capability.
What This Means for the AI Landscape
The Artificial Analysis Intelligence Index leaderboard has been highly competitive, with Google, Anthropic, and OpenAI trading positions as new models arrive. Gemini 3.1 Pro Preview’s four-point lead over Claude Opus 4.6 is meaningful, but the real story may be in the economics. With frontier AI increasingly central to enterprise operations, the cost to run demanding workloads matters enormously — and a model that leads the intelligence rankings at less than half the price of its closest rivals represents a significant commercial proposition.
Whether this advantage holds as Anthropic and OpenAI respond with future releases remains to be seen. But for now, Google’s return to the top of the intelligence leaderboard is not just a technical milestone. At these prices, it is a statement of intent.