GPT-5 Thinking Wrote The Key Technical Step In Our New Paper: Quantum Computing Researcher Scott Aaronson

AI systems are already beginning to help with research that only a handful of humans understand in the first place.

In a milestone moment for AI-assisted research, quantum computing theorist Scott Aaronson has revealed that OpenAI’s GPT-5 Thinking model contributed a crucial technical breakthrough to his latest academic paper. Aaronson says this is the first time an AI model has helped him write a scientific paper. He called the AI’s approach “clever”.

The Research: Pushing Quantum Complexity to Its Limits

Aaronson, who holds the Schlumberger Centennial Chair of Computer Science at the University of Texas at Austin, recently published a paper with Freek Witteveen of CWI Amsterdam titled “Limits to black-box amplification in QMA.” The work explores fundamental boundaries in quantum complexity theory, specifically examining a problem class called QMA (Quantum Merlin Arthur)—essentially the quantum version of classical computational problems.

Without diving too deep into the mathematics, the paper addresses a longstanding puzzle: how accurately can quantum verification procedures work? The researchers proved that certain theoretical limits discovered by Witteveen and Stacey Jeffery earlier in 2025 are actually optimal—you can’t do better using standard techniques. It’s the kind of result that excites quantum complexity theorists while remaining largely invisible to everyone else.

Until now, that is.

Where AI Entered the Picture

The breakthrough moment came when Aaronson and Witteveen encountered a thorny mathematical problem involving eigenvalues—a concept from linear algebra that measures certain properties of matrices. Specifically, they needed to analyze how the largest eigenvalue of a complex matrix behaved as a parameter changed, and prove it couldn’t hover impossibly close to a particular value for too long.

“Given a week or two to try out ideas and search the literature, I’m pretty sure that Freek and I could’ve solved this problem ourselves,” Aaronson wrote candidly on his blog. “Instead, though, I simply asked GPT-5 Thinking.”

What happened next resembled less a human consulting an oracle and more a researcher working with a capable graduate student.

The Collaborative Process

Aaronson’s experience with GPT-5 Thinking unfolded over about thirty minutes in an iterative back-and-forth. The AI provided a confident, plausible-looking answer within five minutes. Aaronson immediately recognized it was wrong. Rather than dismissing the AI as incompetent, Aaronson explained why the answer didn’t work—just as he would with a human collaborator. GPT-5 Thinking apologized, reconsidered, and tried again. And again. Each attempt got closer. Within half an hour, the AI suggested examining a specific mathematical function—a trace of an inverse matrix that encoded exactly the information they needed about eigenvalue behavior. “It pointed out, correctly, that this was a rational function in θ of controllable degree, that happened to encode the relevant information,” Aaronson explained.

The solution worked. Aaronson and Witteveen verified it themselves, and it became a key technical component of their proof.

What This Means for AI and Research

Aaronson’s reflection on the experience is notably measured. He acknowledges that GPT-5 might have encountered similar constructions in its training data, but adds a crucial observation: “If a student had given it to me, I would’ve called it clever. Obvious with hindsight, but many such ideas are.”

He contrasts this with his attempts a year earlier using earlier reasoning models, which “didn’t get results that were nearly as good.” The implication is clear: something fundamental has shifted in AI capabilities.

Importantly, Aaronson emphasizes that AI hasn’t replaced the researchers. “Right now, it almost certainly can’t write the whole research paper (at least if you want it to be correct and good),” he notes, “but it can help you get unstuck if you otherwise know what you’re doing, which you might call a sweet spot.”

This “sweet spot” represents a new paradigm in technical research—AI as a collaborative tool that can generate genuinely useful ideas when guided by expert knowledge, rather than either a fully autonomous researcher or a mere search engine.

The Bigger Picture

For those tracking AI progress, Aaronson’s account offers a data point as valuable as any benchmark: a leading expert in one of the most abstract and demanding fields of mathematics reporting that AI has crossed a meaningful threshold in its ability to contribute to cutting-edge research. “Who knows how long this state of affairs will last? I guess I should be grateful that I have tenure,” Aaronson said in his blogpost. AI leaders have long been predicted how AI will help scientific research — Anthropic CEO Dario Amodei has said that AI will lead to a century’s worth of progress in a decade, while Google DeepMind CEO Demis Hassabis has said that AI will give humanity a crack at solving all diseases. And with respected researchers saying that they’re grateful for tenure after seeing early results from AI models, it does seem to appear that AI could soon start accelerating scientific research in a big way.