GPT 5.5 Pro Proved A Result That Would've Made A Reasonable Chapter In A Math PhD Thesis: Fields Medal Winner Timothy Gowers

More and more mathematicians are speaking up about how quickly AI is beginning to add real value to their fields.

The latest voice is one of the most credentialed in the world. Timothy Gowers, a Cambridge mathematician and winner of the Fields Medal — the highest honor in mathematics — has written that ChatGPT 5.5 Pro independently produced research he would describe as a “perfectly reasonable chapter in a combinatorics PhD.” It did so in roughly two hours, with no mathematical input from him whatsoever.

What Happened

Gowers decided to test the model against a set of open problems posed by mathematician Melvyn Nathanson in a paper on additive number theory. The questions concern how large or small you can make h-fold sumsets — the set of all sums of h elements drawn from a set A — for sets of a given size, and what the minimum “diameter” (spread on the number line) of such sets needs to be.

Nathanson had previously shown that for the two-element sumset case, a set could always be found inside the interval from 0 to 2^k − 1. He asked whether that exponential bound could be improved.

ChatGPT 5.5 Pro thought for 17 minutes and 5 seconds, then delivered a construction proving a quadratic upper bound — which is known to be essentially optimal. Gowers noted that the key move was the same intuition Nathanson used, but taken one step further: instead of using powers of 2 as the underlying Sidon set, the model used a more efficient Sidon set whose elements are only quadratic in size. Gowers asked if it could formalize the argument as a LaTeX preprint; two minutes and 23 seconds later, it had.

The Harder Problem

Gowers then pushed further. He asked the model to engage with work by Isaac Rajagopal, an MIT student who had recently characterized the set of all achievable sumset sizes for general h, but whose construction relied on geometric series with exponentially large elements. The implicit challenge: could the exponential dependence on k be reduced to polynomial?

After 16 minutes and 41 seconds, ChatGPT 5.5 Pro returned with an argument claiming to improve the bound from exponential in k to exponential in k^α for any α > 1/2. Gowers forwarded the resulting preprint to Nathanson, who sent it to Rajagopal, who said it appeared correct.

Gowers then asked the model to push to a full polynomial bound. After two more exchanges — a 13-minute thinking session and a 9-minute verification pass — it produced what it believed was a complete proof. Rajagopal reviewed the final preprint and declared it “almost certainly correct,” not merely line-by-line, but at the level of ideas.

The Idea That Impressed a Human Expert

In a guest section of Gowers’s blog post, Rajagopal offered his assessment of what the model actually contributed. The key move, he writes, was replacing geometric series (whose elements grow exponentially) with a new construction built on h²-dissociated sets — sets where no non-trivial additive relation holds up to a certain order.

Such sets, it turns out, can be built with polynomial-sized elements using constructions going back to Singer (1938) and Bose–Chowla (1963). The model used these to build stand-ins for the geometric series components in Rajagopal’s proof — retaining the essential sumset properties while dramatically shrinking the diameter.

“Even though I can motivate it in retrospect,” Rajagopal writes, “ChatGPT’s idea to use h²-dissociated sets to control relations of order at most h feels quite ingenious. As far as I can tell, this idea is completely original.”

He added that it was “the sort of idea I would be very proud to come up with after a week or two of pondering.”

A Pattern That’s Hard to Ignore

This result doesn’t come out of nowhere. AI’s track record in mathematics has been building steadily, if unevenly.

Early attempts were easy to dismiss. When OpenAI researchers claimed GPT-5 had “solved” ten Erdős problems, it turned out the model had simply located existing solutions in the literature — an embarrassment that drew public criticism from DeepMind’s Demis Hassabis and mockery from Meta’s Yann LeCun.

But things progressed. GPT-5.2 Pro and Harmonic’s Aristotle appeared to autonomously solve Erdős Problem #728, generating a proof in Lean that withstood formal verification. Shortly after, GPT-5.2 helped solve Erdős Problem #281, a number theory puzzle open since 1980 — with Terence Tao calling it “perhaps the most unambiguous instance” of AI solving an open mathematical problem. And there was also the case of a mathematician who found LLMs entirely unhelpful — a reminder that results remain uneven and area-dependent.

What makes the Gowers episode distinct is the level of the problem and the nature of the contribution. This wasn’t retrieval or finding an easy argument humans had overlooked. It was extending a novel, recently-published framework in a direction the framework’s own author hadn’t seen.

What It Means for PhD Students

Gowers is frank about the implications for graduate training. A traditional path into mathematical research involves being handed a problem just hard enough to be open, but tractable enough for a newcomer to make progress. That pipeline is now under pressure.

“If LLMs are at the point where they can solve ‘gentle problems’, then that is no longer an option,” Gowers writes. “The lower bound for contributing to mathematics will now be to prove something that LLMs can’t prove, rather than simply to prove something that nobody has proved up to now and that at least somebody finds interesting.”

He offers two qualifications. First, PhD students can and should use LLMs as collaborators — and those with deep mathematical intuition will likely wield them more effectively than novices. Second, the impact may be uneven across fields. Combinatorics is problem-driven; other areas of mathematics involve more open-ended exploration, and it’s less clear AI would excel there.

But Gowers doesn’t soften the longer-term message. A student starting a PhD next year will finish in 2029 at the earliest. “My guess is that by then,” he writes, “what it means to undertake research in mathematics will have changed out of all recognition.”

The Question No One Has Answered Yet

A result of this caliber from a human mathematician would normally be published. From an AI, the path is less clear. Gowers notes that arXiv’s policy against AI-authored content is reasonable, but that leaves a gap. He floats the idea of a separate repository for AI-generated results — moderated by human mathematicians certifying correctness, perhaps backed by formal proof assistants. No such infrastructure currently exists.

For now, the preprints sit at a public link, findable but uncategorized, neither inside the academic publishing system nor entirely outside it.

The question of credit is easier to resolve. Rajagopal built the framework. The model built on it. Gowers asked the questions. What nobody can credibly claim anymore is that this is a party trick.