OpenAI Releases GPT 5.2, Beats Google Gemini 3 Pro On Several Benchmarks

OpenAI had been stung by Google’s release of Gemini 3 Pro which had eclipsed it on most benchmarks, but it’s thrown a counterpunch with GPT 5.2.

The new model, which OpenAI is calling GPT-5.2 Thinking, represents a significant leap forward in reasoning capabilities and marks the company’s most competitive offering against rivals in months. According to benchmark results released today, the model outperforms both Google’s Gemini 3 Pro and Anthropic’s Claude Opus 4.5 across several critical metrics.

GPT 5.2 Thinking Benchmark Performance

GPT-5.2 Thinking achieved particularly strong results in scientific and mathematical reasoning tasks. On the GPQA Diamond benchmark, which tests graduate-level science knowledge without tool assistance, the model scored 92.4%, surpassing Gemini 3 Pro’s 91.9% and significantly ahead of Claude Opus 4.5’s 87.0%.

In software engineering, GPT-5.2 posted a 55.6% score on SWE-Bench Pro, edging out both Claude Opus 4.5 (52.0%) and Gemini 3 Pro (43.3%). The model also demonstrated superior performance on CharXiv Reasoning, a benchmark for scientific figure interpretation, with an 82.1% score compared to Gemini 3 Pro’s 81.4%.

Perhaps most impressively, GPT-5.2 achieved a perfect 100% on AIME 2025, a competition mathematics benchmark, marking the first time any major model has reached this milestone. The model also scored 86.2% on ARC-AGI 1, an abstract reasoning test, considerably ahead of Gemini 3 Pro’s 75.0%.

Advanced Mathematics Capabilities

The model showed particularly dramatic improvements in advanced mathematics. On FrontierMath, which tests cutting-edge mathematical problem-solving, GPT-5.2 scored 40.3% on the main benchmark and 14.6% on the most difficult Tier 1-3 and Tier 4 problems. While Gemini 3 Pro performed better on the hardest tier with 18.8%, GPT-5.2’s overall FrontierMath score represents a substantial advancement in AI mathematical reasoning.

Areas of Competition

While GPT-5.2 leads in several categories, the competition remains fierce. Google’s Gemini 3 Pro is competitive in specific reasoning tasks, particularly on ARC-AGI 2, where it scored 31.1% compared to GPT-5.2’s 52.9% – though OpenAI’s model still leads in this metric. On knowledge work tasks measured by GDPval, GPT-5.2 achieved 70.9%, ahead of both Gemini 3 Pro (53.5%) and Claude Opus 4.5 (59.6%).

GPT 5.2 Pricing

GPT 5.2 is more expensive that GPT 5.1. GPT 5.2 costs $1.75/$14 per million tokens for input and output, compared to $1.25/$10 for GPT 5.1. GPT 5.2 Pro costs $21/$168 per million tokens for input/output, compared to $15/$120 for GPT 5.1.

The AI Arms Race Intensifies

The release underscores the increasingly competitive landscape among AI labs. With each company leapfrogging the others every few months, the pace of improvement in large language models shows no signs of slowing. OpenAI has positioned GPT-5.2 as optimized for maximum reasoning effort, suggesting the company has prioritized depth of analysis over speed.

For enterprise customers and developers, the benchmark results indicate that model selection may increasingly depend on specific use cases. While GPT-5.2 demonstrates clear advantages in scientific reasoning and software engineering, organizations will need to evaluate performance on tasks most relevant to their needs.

GPT-5.2 Thinking is available on ChatGPT and through API. The release sets up an intriguing next move from Google and Anthropic, both of which are expected to respond with their own model updates in the coming months.