Yann LeCun Explains How Current LLMs Are Only Trained On As Much Data As A 4-Year-Old Child

Modern LLMs are trained on massive amounts of data, but this data pales in comparison to the data a human child is exposed to.

That’s the striking argument from Yann LeCun, Meta’s former Chief AI Scientist and a Turing Award winner, who has emerged as one of the tech industry’s most vocal critics of the current LLM paradigm. In a recent talk, LeCun broke down the numbers in a way that challenges the prevailing narrative around artificial intelligence capabilities—and limitations. His analysis reveals a fundamental mismatch between how machines learn and how biological intelligence develops, with profound implications for the future of AI development.

The Numbers Behind the Training

LeCun begins with the scale of modern LLM training. “A typical LLM is trained on tens of trillions of words. It’s 30 trillion words, a typical size for the training set, pre-training of LLMs,” he explained. “A word is represented actually as sequences of tokens. The token is about three bytes. So the total is about 10 to the 14 bytes of training data to train those LLMs. And that corresponds to basically all the text that is productively available on the internet plus some other stuff.”

To put this in human terms, LeCun notes: “It would take any of us something like half a million years for any of us to read through that material. So it’s an enormous amount of textual data.”

What a Four-Year-Old Actually Sees

But here’s where the comparison becomes startling. “Psychologists tell us that a four-year-old has been awake a total of 16,000 hours, and there’s about one byte per second going through the optic nerve. Every single fiber of the optic nerve—we have two million of them. So it’s about two gigabytes per second getting to the visual cortex,” LeCun calculated.

“It’s about 10 to the 14 bytes. A four-year-old has seen as much visual data as the biggest LLM trained on the entire text ever produced.”

The implications are profound. “What that tells you is that there is way more information in the real world, but it’s also much more complicated—it’s noisy, it’s high-dimensional, it’s continuous. And basically the methods that are employed to train LLMs do not work in the real world.”

The Reality Gap

LeCun drives the point home with concrete examples: “That explains why we have LLMs that can pass the bar exam or solve equations or compute integrals like college students and solve math problems. But we still don’t have a household robot that can do the chores in the house. We don’t even have Level 5 self-driving cars. I mean, we have them, but we cheat. We certainly don’t have self-driving cars that can learn to drive in 20 hours of practice like any teenager.”

His conclusion is unambiguous: “Obviously we’re missing something very big to get machines to the level of human or even animal intelligence. We’re not even at that level with AI systems.”

The Broader Implications

LeCun’s analysis arrives at a critical moment in AI development. While companies race to scale up LLMs, the fundamental limitations he describes remain stubbornly persistent. Tesla’s Full Self-Driving system, despite years of development and billions of miles of data, has just about started being driven without a safety driver. Robotics companies like Figure AI and Boston Dynamics are making progress on humanoid robots, but none can match the adaptability of a human toddler navigating a cluttered living room.

The insight also helps explain the current AI investment landscape. While billions pour into scaling language models, there’s growing recognition that different approaches may be needed for embodied AI. Researchers like Ilya Sutskever and Yann LeCun himself are increasingly exploring alternative architectures, from world models to neuromorphic computing, that might better capture how biological systems process rich, multimodal sensory information.

LeCun’s message is clear: the path to artificial general intelligence won’t simply be paved by making LLMs bigger. The real challenge lies in developing systems that can learn from the kind of rich, continuous, high-dimensional data that even a four-year-old processes effortlessly—and that remains largely untapped by current AI approaches.