LLMs Have A Lot Of Knowledge, But They Aren't Necessarily Very Intelligent: Evolutionary Biologist David Krakauer

Modern LLMs are rapidly becoming more and more capable, but they might not necessarily be getting more intelligent.

That’s the provocative argument from David Krakauer, an evolutionary biologist and President of the Santa Fe Institute, who recently challenged a fundamental assumption about how we evaluate AI systems. Speaking to an audience, Krakauer drew on the history of intelligence testing to make a distinction that has profound implications for how we think about large language models: knowledge and intelligence are not the same thing.

“Who’s actually studied the history of the IQ test?” Krakauer asked. “One of the things I want to say very clearly here is intelligence is not knowledge. In fact, in most of these tests, what you’re trying to do is factor out contributions from knowledge. The relationship is deep with knowledge, but it would be completely unfair to say that someone isn’t intelligent because they were ignorant of certain facts.”

He continued: “In fact, in my life when I was at school, the people who were described as intelligent were the ones who could solve problems without knowing anything. So notice what an LLM is here. An LLM is a library. It knows everything, is not intelligent. Intelligence is actually, as I think of it anyway, more with less, not more with more. We’ve always known that. Why are we so confused when it comes to LLMs?”

Krakauer then offered a vivid analogy to drive his point home: “If you were sitting next to someone in a classroom and you were given an exam and they went to the library and looked up the answer, you would not say they’re clever. You’d say they were cheats. There would be another kind of language you’d use, but you wouldn’t call it intelligent. I don’t know why we’re doing this now. It annoys me.”

The implications of this distinction are significant for the AI industry. While companies race to pack more parameters and training data into their models—with OpenAI’s GPT-5 reportedly containing over a trillion parameters and being trained on vast swaths of internet text—they may be building increasingly sophisticated libraries rather than increasingly intelligent systems. Current benchmarks often measure knowledge retrieval and pattern matching rather than genuine problem-solving with minimal information.

When LLMs are tested on actual IQ tests, the results are telling. Grok 4 and Claude Opus score around 120 on offline IQ tests, while some like DeepSeek score only 91. However, these scores could be somewhat misleading—the models could be performing well precisely because they can draw on vast stores of memorized information, not because they can reason abstractly with minimal context, which is what IQ tests were originally designed to measure.

Krakauer’s critique arrives at a moment when the industry is grappling with diminishing returns from simply scaling up models. Recent reports suggest that improvements in capability are plateauing relative to increases in computational resources. Perhaps this is because, as Krakauer suggests, we’ve been optimizing for the wrong metric all along—confusing the accumulation of knowledge with the emergence of intelligence. True intelligence, by his definition—doing more with less—might require fundamentally different architectures that prioritize reasoning and abstraction over memorization and retrieval.