AI has made plenty of progress in the last couple of years, but subject matter experts aren’t necessarily impressed with what it’s capable of at the moment.
Eric Weinstein, who is an investor with a PhD in Mathemthics from Harvard, has said that current AI systems aren’t even close to humans of 70 years ago in some STEM fields. Weinstein based his observations on a peer review he’d conducted using Google’s Gemini and OpenAI’s ChatGPT.

“LLM Peer Review: Gemini was asked to write a journal submission on a STEM topic I know a bit about. ChatGPT was asked to Peer Review that article for publication and make suggestions. I recommend every STEM academic try this adversarial exercise in an area he/she knows well,” Weinstein wrote on X.
Weinstein said that the results weren’t particularly impressive. “ChatGPT had all kinds of issues with it and complained while rewriting it to improve it. Gemini was given the reviewer report and it more or less freaked out: “How is this reviewer even competent to make these claims?“ and pointed out how lousy the reviewer report was. In fact CharGPT as reviewer had, in fact, misrepresented Gemini’s work as well as engaged in making claims to have improved the work…which were false. Gemini was basically far more correct. ChatGPT suggestions made Gemini worse. It may be that ChatGPT’s output tokens are far fewer so it gutted Gemini’s work,” he said.
And while Weinstein concluded that Gemini did better than ChatGPT, its performance was far from perfect. “I wanted to see if Gemini was the clear winner. So I then asked Gemini for specific references to back up its claims in a comprehensive literature search. And it promptly totally fabricated convincing quotes, page numbers, article titles and mathematical equations,” he added.
“That’s where the state of play is as of June 2025 in STEM. It’s not close to humans of 70 years ago (before modern peer review) as of these iterations. Maybe it’s a bit closer to simulating today’s humans in the Claudine Gay/Peer Review academic era. We are converging down as it moves up. And we may soon meet in the middle it seems,” he concluded.
Weinstein said that the top two LLMs were both currently unable to perform high-level scientific research. While Gemini came up with a good paper, ChatGPT misunderstood it and gave suggestions that made it worse. Gemini, though, hallucinated and made up references which it had used to back up its claims. Weinstein dryly remarked that current LLMs were close to today’s humans like Claudine Gay, the Harvard President who was accused of plagiarism in her research. He hinted that even as AI systems were getting smarter with time, reliance on such systems would likely degrade the abilities of humans, which would mean that human and AI abilities would converge over time.
AI leaders, meanwhile, have some ambitious plans for AI in scientific research. Anthropic CEO Dario Amodei has said that AI can conduct a century’s worth of research in a decade, and Google DeepMind CEO Demis Hassabis says that AI can have a crack at curing all diseases. OpenAI CEO Sam Altman has meanwhile said that AI will some day enable a century’s worth of scientific research to happen in a single year. But if one takes Eric Weinstein’s isolated experiment as evidence, it appears that that single year might still be a long way away.