AI has been progressing at a frenetic pace over the last couple of years, but one of the factors that had been crucial in its growth might no longer be available.
Elon Musk has said that AI companies have run out of data to train AI models. He says all human-generated data was exhausted sometime last year, and this data has already been used in training. Now AI companies are relying on synthetic data — which is data generated by other AI models — to train their models, which might not be as effective.
“ If you watch (NVIDIA CEO) Jensen Huang’s talk, AI is advancing on the hardware front and on the software front,” Musk said at an event. “(But) in terms of data, the new sort of thing is synthetic data, because we’ve actually run out of all the books. (We’ve) literally run out of the entire internet and all books ever written and all interesting videos. Like you don’t need a thousand cat videos that are exactly the same, but all the interesting videos, (those have already been used),” Musk said.
“And now the cumulative sum of human knowledge has been exhausted in AI training. That happened basically last year. And so the only way to then supplement that is with synthetic data, where the AI will sort of write an essay, or it’ll come up with a thesis, and then it will grade itself, and sort of go through this process of self learning with synthetic data,” Musk continued.
“This is always challenging because how do you know if it hallucinated the answer, or it’s a real answer. So it’s challenging to find the ground truth. But, but it is pretty wild that AI at this point has run out of all human knowledge to train on,” Musk said.
Musk was referring to what some in the AI community have called the “wall”, which implies that AI progress would hit a wall when companies run out of data to train their models on. The wall has been confirmed by several leading AI voices. Ilya Sutskever had said last month that the era of pre-training was over, and Google CEO Sundar Pichai has said that most of low-hanging fruits in AI were gone. But some, like OpenAI’s Sam Altman, have indicated that this doesn’t mean AI progress will stop, with Altman going as far as to post on X that there was “no wall”. OpenAI, for its part, has introduced new approaches for improving AI models, such as test-time compute, which essentially involves the model thinking deeply about its answers, which has led to some interesting results — OpenAI’s o3 model, which uses this new approach, smashed several benchmarks and led many to speculate if it really was AGI. But while AI companies will keep coming up with newer approaches, it does now seem to be general consensus that vast amounts of novel data for training — which was the bedrock of AI development so far — is no longer available.