Google DeepMind’s AlphaProof Nexus Agent Has Solved 9 Open Erdős Problems At A Cost Of A Few Hundred Dollars Each

Erdős problems are now falling to AI systems like ninepins.

Google DeepMind researchers have published a paper describing an AI agent that autonomously solved 9 out of 353 open problems from the famous Erdős catalog — a collection of unsolved mathematical challenges posed by the prolific Hungarian mathematician Paul Erdős — at an inference cost of just a few hundred dollars per problem. Two of the nine problems had been open for 56 years. The results mark the most credible large-scale demonstration yet of AI doing genuine mathematical research, not just pattern-matching on known solutions.

What the Agent Does

The system, called AlphaProof Nexus, works by combining a large language model (Gemini 3.1 Pro) with Lean, a formal proof verification language. Rather than generating natural-language proofs that require expensive expert review — and can contain subtle errors — the agent generates machine-checkable formal proofs. Lean’s compiler verifies every logical step automatically. If a proof is wrong, the compiler says so, and the agent tries again.

This is the key distinction from OpenAI’s earlier Erdős claims, which Google DeepMind CEO Demis Hassabis called “embarrassing” — GPT-5 had reportedly found existing references to solved problems, not proved anything new. DeepMind’s agent actually constructs novel proofs.

The full-featured agent (“Agent D”) layers two additional capabilities on top of a basic LLM-generates, Lean-verifies loop: an evolutionary search that maintains a population of proof sketches ranked by an Elo-style rating system, and AlphaProof, DeepMind’s reinforcement learning-based olympiad theorem prover, which subagents can call as a focused tool on specific subgoals.

The Results

The nine Erdős problems solved include some that required genuinely sophisticated mathematics. Problem #12(i), posed by Erdős and Sárközy in 1970, asks for the existence of an infinite set satisfying a restrictive divisibility constraint while meeting a specific density condition. The agent constructed the required set using a block-based approach that integrates the Chinese Remainder Theorem with properties of sets that avoid length-3 arithmetic progressions.

Problem #125 — open since 1996 — concerns the “sumset” of two sets of integers defined by their digit representations in base 3 and base 4. The agent resolved the conjecture by exploiting the Diophantine proximity of the two bases (the fact that 3^m ≈ 4^k for certain integer pairs), a non-obvious and mathematically substantive insight.

Beyond Erdős problems, the same agent proved 44 out of 492 open conjectures from the Online Encyclopedia of Integer Sequences (OEIS), resolved a 15-year-old open question in algebraic geometry concerning Hilbert functions, improved a bound in convex optimization by discovering a novel algorithmic parameter schedule, and helped crack a problem from Ben Green’s well-known open conjectures list. It is also contributing to ongoing research in quantum optics and graph theory.

Architecture and Cost

The per-problem inference cost of “a few hundred dollars” — combined with the fact that the basic agent (just LLM + Lean, no evolution, no AlphaProof) also solved all 9 problems in the post-hoc analysis — is the paper’s most striking economic finding. You don’t necessarily need the full machine to get results; you just need the compiler feedback loop.

The full-featured agent does offer meaningful advantages on harder problems. On Erdős #125 and #138, it was 2x to 5x more cost-efficient than the basic agent at equivalent solve rates. But for easier problems in the set, the basic agent was actually more efficient per dollar. Smaller models — Gemini 3.0 Flash, Gemini 3.1 Flash-Lite — solved nothing.

This fits a broader pattern that Google DeepMind’s David Silver has described: mathematics is ideal for AI because it’s fully digital, self-verifying, and amenable to experience-driven improvement loops. The compiler feedback is the crucial ingredient — it grounds the LLM’s reasoning in a way that natural language cannot.

A Race That Is Accelerating

DeepMind’s nine solves don’t exist in isolation. Over the past several months, Erdős problems have been falling with increasing frequency across the industry — though the quality and credibility of those claims has varied considerably.

Harmonic, the math AI startup backed by Robinhood CEO Vlad Tenev, claimed in late 2025 that its Aristotle system solved Erdős Problem #124, open for 30 years. The mathematical community received it with measured enthusiasm — Thomas Bloom, who maintains the Erdős problems catalog, noted the solved variant was the easier of two posed by Erdős, and relatively straightforward in hindsight. Around the same time, GPT-5.2 appeared to autonomously solve Erdős Problem #728, concerning divisibility conditions involving factorials — in what was described as a first instance of AI resolving an open conjecture before any human mathematician.

Then came GPT-5.2 Pro and Erdős Problem #281, a number theory puzzle open since 1980 — with Fields Medalist Terence Tao calling it “perhaps the most unambiguous instance” of AI solving an open mathematical problem. (A wrinkle emerged later: a prior solution was found in the literature, though Tao noted the AI’s proof was “rather different.”)

Most recently, just days ago, an OpenAI model cracked the planar unit distance problem — a foundational question in combinatorial geometry that Erdős posed in 1946 and attached a cash prize to. Fields Medalist Tim Gowers said the proof would have been accepted at the Annals of Mathematics without hesitation.

What This Is Not

The 9 problems solved out of 353 attempted is a 2.5% solve rate. The vast majority of Erdős problems remain out of reach. The agent also exhibited characteristic failure modes: it frequently offloaded core difficulty into a helper lemma that essentially restated the problem with a sorry placeholder, and sometimes “proved” things using lemmas it hallucinated as established results. End-to-end formal verification caught both — which is the point.

The paper is also honest that successes are concentrated in areas where Lean’s mathematical library is mature — combinatorics, number theory, convex optimization — and that problems requiring extensive new theory remain beyond current reach.

Why It Matters

The significance is less the raw number of problems solved and more what the architecture demonstrates. AlphaEvolve had already shown that evolutionary AI search could crack Ramsey problems that stumped researchers for decades. AlphaProof Nexus extends that approach into a formal verification framework that produces machine-checkable proofs — removing the expensive human review bottleneck.

The AI co-mathematician framework DeepMind released recently sits alongside this: one system is a research partner, the other is an autonomous solver. Used together, they represent a genuine shift in what mathematicians can delegate to machines.

As the paper puts it, even failed proof attempts proved useful — researchers found that the formal sketches helped them understand which subgoals were actually hard, without re-verifying the parts that compiled correctly. The tool is valuable whether or not it crosses the finish line.

At a few hundred dollars per attempt, that’s a cheap research assistant.

Posted in AI