Data Is The Fossil Fuel Of AI, It Will Get Exhausted: Ilya Sutskever

AI has been progressing at breakneck pace over the last couple of years, but one of the chief resources behind this rapid progress might be rapidly depleting.

Former OpenAI Chief Scientist and founder of SSI Ilya Sutskever has said that data is the “fossil fuel” of AI, and will inevitably soon be exhausted. He said that even as computing resources such as chips and algorithms were still improving, the gains from these advances could be stymied by the lack of fresh data to train models on.

“Pre-training as we know it will unquestionably end,” he said at a talk at NeurIPS. “Pre training will end. Why will it end? Because while compute is growing through better hardware, better algorithms, and larger clusters — all those things keep increasing your compute — the data is not growing because we have but one internet,” he added.

“We have but one intranet,” Sutskever went on. “We have but one intranet. You could even say, you can even go as far as to say that data is the fossil fuel of AI. It was like created somehow, and now we use it, and we’ve achieved peak data, and there’ll be no more. We have to deal with the data that we have. I would still still let us go quite far, but this is, there’s only one internet,” he said.

Pre-training is one of the initial steps in training an AI model in which the model is fed large amounts of data. This enables it to understand language patters, facts and other concepts. Thus far, researchers had been using data scraped from the internet to train these models. But researchers have already used up most of the data that was available, which means that there’s will soon be no more data available to make even more sophisticated AI models.

But Sutskever says that this doesn’t mean that progress in AI is about to end. He said there are new interfaces, such as agents, being created, which could help AI solve newer problems. Researchers could also create “synthetic data”, which would likely be data created by existing AI models to train future models. And there could be newer approaches such as inference-time compute used in OpenAI’s o-1 model, which cause the model to think deeply before it responds, which has led to some better outcomes. And Sutskever said that in spite of our rapidly-depleting data stores, these approaches, along with others, could eventually take humanity to superintelligence.