More and more senior researchers in AI seem to believe that LLMs aren’t the path to achieving AGI.
The latest to join this growing chorus is Judea Pearl, the 2011 Turing Award winner who pioneered probabilistic reasoning and causal inference in artificial intelligence. Pearl’s critique cuts deeper than most, arguing that large language models face fundamental mathematical limitations that cannot be overcome simply by scaling them up. His central thesis: LLMs don’t discover world models from data—they merely summarize world models that humans have already created and published on the web.

Pearl explains the core limitation clearly: “LLMs have limitations, mathematical limitations that cannot be crossed by scaling up. I show it clearly mathematically in my book. And what LLMs are doing right now is they summarize world models authored by people like you and me available on the web, and they do some sort of mysterious summary of it, rather than discovering those world models directly from the data.”
To illustrate his point, Pearl offers a concrete example from the medical domain: “To give you an example, if you have data coming from hospitals about the effect of treatments, you don’t fit it directly into the LLMs today. You get the input is interpretation of that data authored by doctors, physicians, and people who already have world models about the disease and what it does.”
This distinction is crucial. Pearl is arguing that LLMs operate one level removed from reality. They process human interpretations of data rather than the raw data itself. A doctor observes treatment outcomes, builds a causal model of how diseases and treatments interact, and writes about it. The LLM then learns from that written interpretation, not from the underlying observations that led to the doctor’s understanding.
Pearl’s concerns echo a broader skepticism among AI luminaries about the current trajectory of AI development. Meta’s Chief AI Scientist Yann LeCun has stated he is “no longer interested in LLMs” because “they are just token generators”, arguing that true intelligence requires understanding the physical world. Physicist and quantum computing pioneer David Deutsch has similarly argued that LLMs “are very useful, but they aren’t taking us in the direction of AGI”, emphasizing the need for genuine explanatory knowledge.
Even reinforcement learning pioneer Richard Sutton, often associated with the “scaling hypothesis,” has acknowledged the limitations of LLMs, particularly in their ability to learn from interaction with the world. Stanford’s Fei-Fei Li has explained why LLMs will not be able to have real-world intelligence, pointing to their lack of embodied experience and causal understanding.
What makes Pearl’s critique particularly significant is his mathematical grounding. As the architect of modern causal inference, Pearl understands the formal requirements for systems that can reason about cause and effect, make interventions, and answer counterfactual questions. His claim that there are “mathematical limitations that cannot be crossed by scaling up” suggests that no amount of data, compute, or parameter counts will transform LLMs into systems capable of genuine causal reasoning. They remain, in his view, sophisticated pattern matchers and summarizers of human-generated content—powerful tools, certainly, but fundamentally limited in their path toward true artificial general intelligence.