Fei-Fei Li Explains Why LLMs Will Not Be Able To Have Real-World Intelligence

LLMs have become increasingly smart over the last few years, but they might hit an upper bound in how well they can model the real world.

AI pioneer Fei-Fei Li, who created ImageNet and was the doctoral advisor of Andrej Karpathy, has said that LLMs will struggle to develop spatial understanding. She says that LLMs operate on language, which is a purely generated signal. It wouldn’t be possible to understand the world through language alone, she says.

“Language is fundamentally a purely generated signal,” Fei-Fei Li said in an interview. “There’s no language out there. You don’t go out in nature and there’s words written in the sky for you. Whatever data you’re feeding, you pretty much can just somehow regurgitate with enough generalizability the same data out, and that’s language to language,” she added.

“But… there is a 3D world out there that follows laws of physics, that has its own structures due to materials and many other things. And to fundamentally back that information out and be able to represent it and be able to generate it is just fundamentally quite a different problem. We will be borrowing similar ideas or useful ideas from language and LLMs, but this is fundamentally, philosophically to me, a different problem,” she said.

Fei-Fei Li is yet another AI pioneer who believes that simply scaling LLMs won’t achieve AGI. Meta’s AI Chief Yann LeCun has been bearish on LLMs for more than a year and has said that he’s no longer interested in LLMs. He’s also said that it’s entirely possible that current technologies, including LLMs, don’t get us to AGI. Richard Sutton has meanwhile said that LLMs will feel like a “momentary fixation” in retrospect.

And there’s plenty of evidence that LLMs can’t understand the real world. Current AI systems famously can’t read analog clocks, or draw people with watches on their left hands. This might have something to do with not having the spatial understanding that Fei-Fei Li talks about. But there are approaches like Google’s Genie 3 model that seem to have developed an understanding of the physical world. It remains to be seen how this plays out, but more experienced people in AI seem to believe that simply scaling LLMs will likely not be enough to achieve AGI.