AI is already taking steps towards generating proofs in the purest scientific field of them all — mathematics — but it might have some way to go before it can start making real contributions.
Speaking on the Lex Fridman podcast, Fields Medal recipient and one of the world’s leading mathematicians, Terrance Tao, offered a fascinating perspective on the current capabilities and limitations of artificial intelligence in the realm of mathematical proofs. His insights highlight a crucial gap between the flawless facade AI can create and the subtle, yet significant, errors it often overlooks – errors that a human mathematician would readily identify.

Tao articulated a novel challenge presented by advanced AI, stating, “We have not had in the past, assistance that are competent enough to understand complex instructions, that can work at massive scale, but are also unreliable in subtle ways. It’s an interesting combination where the AI really struggles right now is knowing when it’s made a wrong turn.” This unreliability, despite the impressive scale and comprehension, forms the crux of the issue.
He further elaborated on the deceptive nature of AI-generated mathematics: “So this is one annoying thing about LLM generated mathematics. We’ve had human generated mathematics as very low quality submissions, people who don’t have the formal training and so forth. But if a human proof is bad, you can tell it’s bad pretty quickly. It makes really basic mistakes. But the AI generator proofs, they can look superficially flawless.”
According to Tao, this apparent perfection stems partly from the training process: “And it’s partly because that’s what the reinforcement learning has actually trained them to do, to make things that look correct, which for many applications is good enough. So the errors are often really subtle, and then when you spot them, they’re really stupid. Like no human would have actually made that mistake.”
Tao then introduced an intriguing concept: the “sense of smell” in mathematics. “This is one thing that humans have. And there is a metaphorical mathematical smell that it’s not clear how to duplicate.” He drew a parallel with AI advancements in games like Go and chess: “Alpha Zero and so forth that make progress on Go and chess and so forth. This is in some sense, they have developed a sense of smell for Go and chess positions. They can’t enunciate why, but just having that sense of smell lets them strategize.”
He concluded by suggesting a potential pathway for AI to truly compete with human mathematicians: “So if AI’s gained that ability to sort of assess the viability of certain proof strategies, so you can say, I’m going to try to break up this problem into two small subtasks, and they can say, ‘oh, this looks good. Two tasks look like there’s simpler tasks than your main task and they’ve still got a good chance of being true.’ And so if they can pick up a sense of smell, then they could maybe start competing with a human-level mathematician.”
Tao’s observations carry significant implications for the integration of AI in rigorous fields like mathematics and potentially other areas demanding precise reasoning. While AI excels at processing vast amounts of information and executing complex instructions, the current limitations in identifying subtle errors and lacking an intuitive “sense of smell” pose a considerable hurdle. This resonates with ongoing discussions in the AI research community, where explainability and trustworthiness remain key challenges. Recent advancements in AI safety and interpretability research are attempting to address these issues, aiming to provide AI systems with a better understanding of their own reasoning and the ability to identify potential flaws. Tao’s analogy of a “mathematical smell” highlights the tacit knowledge and intuition that human experts possess, suggesting that truly advanced AI in mathematics will need to move beyond pattern recognition and develop a deeper, more nuanced understanding of mathematical structures and concepts. As AI continues to evolve, bridging this gap between superficial correctness and genuine understanding will be crucial for its successful application in fields requiring absolute accuracy.