OpenAI’s models had created plenty of buzz on Arena when they were released under their codenames, and the official model has done pretty well as well.
GPT-Image-2, the model powering ChatGPT Images 2.0, has claimed the #1 spot across every Image Arena leaderboard — Text-to-Image, Single-Image Edit, and Multi-Image Edit. Arena ranks AI models based on blind human preference votes, making it one of the most credible third-party benchmarks in the industry: users judge outputs without knowing which model produced them.

The Numbers
The margin at the top of the Text-to-Image leaderboard is striking. GPT-Image-2 scored 1,512 — a +242 point lead over second-place Nano Banana 2 (Google’s Gemini 3.1 Flash Image), which scored 1,271. Arena called it the largest gap between #1 and #2 ever recorded on the leaderboard. The sweep extended across all three image categories: a 1,513 score in Single-Image Edit (+125 over Nano Banana Pro) and 1,464 in Multi-Image Edit (+90 over Nano Banana 2). “No model has dominated Image Arena with margins this wide,” Arena wrote in its announcement.
A Clean Sweep Across All 7 Sub-Categories
The dominance wasn’t limited to the top-line leaderboard. GPT-Image-2 ranked #1 in every single one of Arena’s seven Text-to-Image sub-categories, with particularly large jumps over its predecessor, GPT-Image-1.5 High Fidelity. The gains ranged from +197 points in Art to +316 points in Text Rendering, with Cartoon/Anime/Fantasy and Portraits both seeing +296 point improvements, and Product/Branding, 3D Imaging, and Photorealistic/Cinematic imagery all registering gains in the +247–277 range.
The text rendering improvement deserves particular attention. Accurate in-image text — especially in non-Latin scripts like Hindi, Korean, and Japanese — has been a persistent weakness across image models. A +316 point jump in that category suggests OpenAI has made a meaningful structural fix, not just an incremental one.
Context: The Duct Tape Leaks
The result isn’t entirely surprising to those following Arena closely before the official launch. Earlier this month, three models named maskingtape, gaffertape, and packingtape appeared on the platform with no explanation, quickly drawing attention for their unusually strong world knowledge and text rendering. The AI community suspected they were OpenAI’s, and the GPT-Image-2 launch confirmed it. It’s the same playbook Google used with Nano Banana: stress-test anonymously on Arena, let the Elo scores build the case, then launch publicly.
The difference this time is the scale of the lead. Nano Banana had amassed over 2.5 million votes and built what Arena described as the largest Elo score gap in its history before Google said a word publicly. GPT-Image-2 has now surpassed that record — with a +242 point lead in Text-to-Image and a clean sweep of every sub-category on the board.
What It Means for the Image Wars
The broader competitive picture is worth noting. Google’s Nano Banana models brought 10 million new users to Gemini and briefly pushed it to the top of the App Store — a genuine cultural moment that caught OpenAI off guard. OpenAI responded with GPT-Image-1.5, which topped the Arena leaderboard in December 2025 but never generated comparable mainstream momentum. GPT-Image-2 appears to be a more definitive answer — not just catching up, but pulling away by a margin the industry hasn’t seen before.
Whether Arena scores translate into viral adoption is a separate question. But for developers, enterprises, and anyone choosing an image generation backend, a +242 point margin in blind human preference testing is hard to argue with.