DeepSeek Releases DeepSeek Math V2, An Open Model That Delivers Gold-Medal Performance At The International Math Olympiad

DeepSeek had been somewhat under the radar since releasing its V3 and R1 models that first put China’s AI capabilities in focus, but it’s now back with another impressive model release.

The Chinese AI lab has unveiled DeepSeek Math V2, an open-source large language model that achieved gold-medal-level performance at the International Math Olympiad 2025. The release is particularly significant because while OpenAI and Google have also developed models capable of winning gold at the prestigious mathematics competition, those systems remain proprietary and closed-source. In contrast, DeepSeek’s model is open, with its weights available for anyone to use and fine-tune.

Beyond Answer Accuracy: A New Approach to Mathematical Reasoning

DeepSeek Math V2 represents a departure from the dominant paradigm in AI mathematical reasoning. While recent models have made remarkable progress by using reinforcement learning to reward correct final answers—improving from poor performance to saturating competitions like AIME and HMMT in just one year—DeepSeek argues this approach has fundamental limitations.

The problem, according to the company, is that correct answers don’t guarantee correct reasoning. For many mathematical tasks, particularly theorem proving, what matters is the rigor and comprehensiveness of the step-by-step derivation, not just arriving at the right numerical result. This makes traditional final-answer reward systems inadequate for developing truly advanced mathematical AI.

DeepSeek’s solution centers on what it calls “self-verifiable mathematical reasoning.” The team trained an LLM-based verifier to assess the accuracy and faithfulness of mathematical proofs. They then used this verifier as a reward model to train a proof generator, incentivizing the model to identify and resolve issues in its own proofs before finalizing them. As the generator improved, DeepSeek scaled verification compute to automatically label difficult-to-verify proofs, creating new training data to further strengthen the verifier.

Impressive Competition Results

The results demonstrate the effectiveness of this approach. DeepSeek Math V2 achieved gold-level scores on both IMO 2025 and the China Mathematical Olympiad 2024. Perhaps most strikingly, the model scored a near-perfect 118 out of 120 on the notoriously difficult Putnam 2024 competition when using scaled test-time compute. The model also performed strongly on IMO-ProofBench, a benchmark developed by the DeepMind team behind the DeepThink IMO-Gold system.

These achievements place DeepSeek Math V2 among the elite tier of mathematical reasoning systems, alongside closed models from major Western AI labs. The difference is that DeepSeek has released its model under the Apache 2.0 license, making it freely available for research and commercial use.

Built on DeepSeek’s Foundation Models

DeepSeek Math V2 is built on top of DeepSeek-V3.2-Exp-Base, the company’s latest foundation model. This connection to DeepSeek’s broader model family suggests the mathematical reasoning capabilities could potentially be integrated into the company’s general-purpose AI systems.

The company acknowledges that “much work remains” in developing truly capable mathematical AI systems. However, the results suggest that self-verifiable mathematical reasoning is a viable research direction that could advance the field beyond simply optimizing for correct answers.

Implications for AI Development

The release comes at a pivotal moment for AI development. Mathematical reasoning has emerged as a crucial testbed for AI capabilities, with implications extending far beyond solving competition problems. If sufficiently advanced, these systems could accelerate scientific research across fields that rely on rigorous mathematical reasoning.

DeepSeek’s decision to release such a capable model as open source also highlights an ongoing tension in AI development between proprietary and open approaches. While Western labs have increasingly kept their most advanced systems closed, Chinese companies like DeepSeek have been more willing to release state-of-the-art models publicly. These open-source models are getting increasingly capable — Kimi K2 Thinking beats GPT-5 and Claude 4.5 on some benchmarks, and now DeepSeek has made a top mathematical model open weights. And with it, researchers and developers now have access to a gold-medal-level mathematical reasoning system they can study, modify, and build upon.