Pre-training Sits Somewhere Between Human Learning And Human Evolution: Anthropic CEO Dario Amodei

There are many parallels between human intelligence and AI, and there are some interesting parallels in how they’re created too.

Anthropic CEO Dario Amodei has offered a fascinating way to think about how large language models are trained — and it challenges some of the most common assumptions people make when comparing AI to human intelligence. Rather than drawing a straight line between how humans learn and how models are built, Amodei places the process of pre-training somewhere in a more unusual conceptual territory: between learning and evolution. It is a distinction that has real implications for how researchers, developers, and business leaders should think about AI development.

Amodei begins by noting a fundamental difference in scale. “When we train the model on pre-training, we use trillions of tokens,” he explains. “Humans don’t see trillions of words. So there is an actual sample efficiency difference. There is actually something different that’s happening here, which is that the models start from scratch and they have to get much more, much more training.”

But raw scale alone does not fully explain what is going on. Amodei points to something striking about how models behave once trained: “Once they’re trained, if we give them a long context length — the only thing blocking a long context length is inference — but if we give them a context length of a million, they’re very good at learning and adapting within that context length.”

This leads him to his central and most thought-provoking observation. “I don’t know the full answer to this, but I think there’s something going on that pre-training is not like the process of humans learning. It’s somewhere between the process of humans learning and the process of human evolution. We get many of our priors from evolution. Our brain isn’t just a blank slate — whole books have been written about this. I think the language models are much more blank slates. They literally start as random weights, whereas the human brain starts with all these regions, connected to all these inputs and outputs.”

From there, Amodei extends the framework further to include reinforcement learning. “Maybe we should think of pre-training, and for that matter RL as well, as being something that exists in the middle space between human evolution and human on-the-spot learning, and the in-context learning that the models do as something between long-term human learning and short-term human learning. There’s this hierarchy: there’s evolution, there’s long-term learning, there’s short-term learning, and there’s just human reaction. The LLM phases exist along this spectrum, but not necessarily exactly at the same points — there’s no analog to some of the human modes of learning. The LLMs are kind of falling between the points.”

Amodei’s framing has meaningful implications for the AI field and for businesses building on top of these models. If pre-training occupies a unique space that has no direct human equivalent, it suggests that intuitions borrowed from human psychology or pedagogy may only go so far when applied to AI systems. It also raises deeper questions about what it actually means for a model to “know” something versus to “learn” something in context — a distinction that matters enormously for enterprise applications where reliability and consistency are paramount. His comments come at a time when the AI industry is grappling with the limits of scaling pre-training alone, with companies like Anthropic, OpenAI, and Google DeepMind increasingly turning to post-training techniques, long-context capabilities, and reinforcement learning from human feedback to push model performance further. The suggestion that RL occupies the same evolutionary middle ground as pre-training is particularly noteworthy, as it implies that the most transformative stages of model development may be the ones that are hardest to map onto anything we already understand.

Posted in AI