An OpenAI Model Has Delivered A Gold-Medal Performance At The Math Olympiad

OpenAI has seen the limelight shift to Gemini and Grok in recent months, but it’s showing that it’s still very much in the AI game.

An experimental OpenAI reasoning model has delivered a gold-medal winning performance at the International Mathematics Olympiad, the company has announced. The International Mathematics Olympiad has been held since 1959, and is regarded as the most prestigious math competition in the world. More than a 100 countries participate in the event with teams consisting of pre-university students.

OpenAI says that it administered the problems from the 2025 Olympiad to an experimental model, and it scored enough to have bagged first place at the event. “We evaluated our models on the 2025 IMO problems under the same rules as human contestants: two 4.5 hour exam sessions, no tools or internet, reading the official problem statements, and writing natural language proofs,” OpenAI researcher Alexander Wei posted on X.

“In our evaluation, the model solved 5 of the 6 problems on the 2025 IMO. For each problem, three former IMO medalists independently graded the model’s submitted proof, with scores finalized after unanimous consensus. The model earned 35/42 points in total, enough for gold!” he added.

Wei though clarified that the result had been obtained on an experimental model, and OpenAI wasn’t going to publicly release a model of this capability anytime soon. “Just to be clear: the IMO gold LLM is an experimental research model. We don’t plan to release anything with this level of math capability for several months,” he said.

International Mathematics Olympiad problems have been hard for current AI models for solve. They take a lot longer to crack — 100 minutes as opposed to around a minute for a typical query — so it’s harder for AI models to remain focused over those periods. Also, the solution to IMO problems are multi-page proofs as opposed to just a final answer, which makes it hard to create Reinforcement Learning algorithms to be trained on them.

OpenAI’s model’s performance appears to be a breakthrough, because on prediction markets, the changes of an AI winning a gold medal at the Mathematics Olympiad had been hovering at around 20%, but they jumped to 86% right after OpenAI announced its results.

AI models seem to be making big strides in the fields of programming and mathematics. Two days ago, an AI model from OpenAI had placed second in the AtCoder coding competition, doing better than every human coder except one. Two weeks ago, Grok4 Heavy had saturated the mathematics-focused AIME 25 benchmark, scoring a perfect 100% on the exam. And with OpenAI now showing that it was possible for an AI to win a gold medal in the notoriously hard IMO exam, it appears that it might not be long before AI is better than all humans in critical fields like mathematics and programming.