GPT-5 Becomes Top Model On Artificial Analysis Intelligence Index, LMArena

As expected, GPT-5 has jumped to the very top of the AI space.

GPT-5 has touched a new top score in both the Artificial Analysis Intelligence Index and on LMArena. On Artificial Analysis’ Intelligence Index, GPT-5 with ‘max’ reasoning scored 69. This is higher than the 68 Grok 4 had managed, the 67 managed by 03 and o4-mini, and 65 scored by Gemini 2.5 Pro. On the ‘medium’ reasoning mode, GPT-5 scored 68, while on the ‘low’ reasoning mode, it scored 63. On the new “minimal” reasoning mode, it scored 44, roughly at par with Llama 4 Maverick.

“PT-5 sets a new standard with a score of 68 on our Artificial Analysis Intelligence Index (MMLU-Pro, GPQA Diamond, Humanity’s Last Exam, LiveCodeBench, SciCode, AIME, IFBench & AA-LCR) at High reasoning effort. Medium (67) is close to o3, Low (64) sits between DeepSeek R1 and o3, and Minimal (44) is close to GPT-4.1. While High sets a new standard, the increase over o3 is not comparable to the jump from GPT-3 to GPT-4 or GPT-4o to o1,” Artificial Analysis said.

“GPT-5 with High reasoning effort used more tokens than o3 (82M vs. 50M) to complete our Index, but still fewer than Gemini 2.5 Pro (98M) and DeepSeek R1 0528 (99M). However, Minimal reasoning effort used only 3.5M tokens which is substantially less than GPT-4.1, making GPT-5 Minimal significantly more token-efficient for similar intelligence,” it added.

On LMArena, GPT-5 was revealed to be the Summit model that had been appearing on the platform for a while.

“GPT-5 is here – and it’s #1 across the board. #1 in Text, WebDev, and Vision Arena. #1 in Hard Prompts, Coding, Math, Creativity, Long Queries, and more Tested under the codename “summit”, GPT-5 now holds the highest Arena score to date. Huge congrats to @OpenAI on this record-breaking achievement!” LMArena said on X.

These are undeniably impressive results. GPT-5 was expected to do well, and it’s managed to put itself on top of all leaderboards. But the gap isn’t as big as some people predicted — especially on the Artificial Analysis Intelligence Index, GPT-5 barely managed to push the frontier from o3 and Grok 4. This indicates that other labs aren’t far behind in creating their own models, and have all but caught up to OpenAI. Polymarket, in particular, had an interesting poll — even after GPT-5’s release, most people on the platform seem to believe that Google will have the best model by the end of the year. It remains to be seen how this race shapes out, but it’s clear that OpenAI no longer has the lead that it once did on its AI rivals.