There are many parallels between how humans learn and how AI systems learn, and there seem to be some clear learning paradigms that current AI systems seem to be lacking.
Andrej Karpathy, former Director of AI at Tesla and one of the most influential voices in artificial intelligence, recently highlighted a fundamental gap in how current large language models process and retain information. Drawing a striking parallel between human cognition and AI architecture, Karpathy pointed to sleep—and the mysterious consolidation process that occurs during it—as a critical capability that today’s AI systems have yet to replicate.

“I feel like when I’m awake, I’m building up a context window of stuff that’s happening during the day,” Karpathy explained. “But I feel like when I go to sleep, something magical happens where I don’t actually think that that context window stays around. I think there’s some process of distillation into weights of my brain. And this happens during sleep and all this kind of stuff.”
This biological process, Karpathy argues, has no equivalent in current AI systems. “We don’t have any equivalent of that in large language models, and that’s, to me, more adjacent to when you talk about continual learning and so on. These models don’t really have this distillation phase of taking what happened, analyzing it obsessively, thinking through it, basically doing some kind of a synthetic data generation process and distilling it back into the weights.”
The implications extend beyond simply mimicking sleep. Karpathy envisions a future where AI systems could maintain personalized learning over extended periods. “Maybe having specific neural nets per person. But basically we do want to create ways of creating these individuals that have very long contexts. It’s not only remaining in the context window because the context windows grow very, very long.”
He also noted that human cognition employs sophisticated mechanisms that AI is only beginning to explore. “I do also think that humans have some kind of a very elaborate, sparse attention scheme, which I think we’re starting to see some early hints of. So DeepSeek V3.2 just came out and I saw that they have sparse attention as an example, and this is one way to have very, very long context windows.”
Karpathy concluded with an optimistic view of convergent evolution in intelligence: “So I almost feel like we are redoing a lot of the cognitive tricks that evolution came up with through a very different process, but we’re, I think, converged in a similar architecture cognitively.”
The observations come at a pivotal moment in AI development. While context windows have expanded dramatically—with models like Claude and Gemini now supporting contexts of hundreds of thousands of tokens—the industry is grappling with how to move beyond simply storing more information to actually learning from it in a persistent, efficient way. The challenge of continual learning, where models can update their knowledge without catastrophic forgetting or requiring complete retraining, remains one of AI’s most significant unsolved problems. Karpathy’s mention of DeepSeek’s sparse attention mechanisms represents one avenue of exploration, but his broader point suggests that the field may need to look more closely at biological inspiration—not just for architecture, but for fundamental learning processes. The distillation phase he describes, where experiences are compressed and integrated into long-term memory during sleep, could represent the next frontier in making AI systems that don’t just process information, but truly learn and evolve from their interactions over time.