100,000X Better Results In 20 Seconds: OpenAI Exec Reveals How They Were Blown Away With Results Of “System 2” Thinking For AI Models

Chain-of-thought thinking — or reasoning — has brought about a big leap in the capabilities of AI models. While it was earlier believed that the best way to improve model capability was to simply throw more compute at it, researchers discovered that getting a model to think deeply before it answered helped model performance even more. This was used in OpenAI’s o1 series of models, and to even greater effect in o3, which some have speculated to be AGI. And OpenAI insiders are now revealing what it was like to discover the results of chain-of-thought reasoning on AI models.

 ”System one thinking is the faster, more intuitive kind of thinking that you might use, for example, to recognize a friendly face or laugh at a funny joke,” OpenAI executive Noam Brown said in a TED talk. “System two thinking is the slower, more methodical thinking that you might use for things like planning a vacation or writing an essay or solving a hard math problem,” he explained.

“I wondered whether the system two thinking might be what’s missing from a bot,” Brown said. “It might explain the difference in the performance between our bot and the human experts. So I ran some experiments to see just how much of a difference this system 2 thinking makes in poker. And the results that I got blew me away,” he said.

“It turned out that having the bot think for just 20 seconds in a hand of poker got the same boost in performance as scaling up the model by 100, 000x and training it for 100,000x longer. Let me say that again. Spending 20 seconds thinking in a hand of poker got the same boost in performance as scaling up the size of the model and the training by 100,000x,” he said.

“When I got this result, I literally thought it was a bug. For the first three years of my PhD, I had managed to scale up these models by 100x. I was proud of that work. I had written multiple papers on how to do that scaling. But I knew pretty quickly that all of that would be a footnote compared to just scaling up system 2 thinking,” he added.

And System 2 thinking — or chain of thought reasoning — has been a gamechanger for AI models. Just as AI models were thought to be hitting a wall with data running out, and performance gains with more compute plateauing, researchers discovered that simply getting the model to think deeply before it answered a question dramatically improved performance. This seemed to mirror human behaviour — humans too take better decisions when they think for a while, as opposed to immediately reacting to a situation instinctively. And with AI models too improving with the same paradigm, it appears that the parallels between humans and AI models seem to keep increasing with time.

Posted in AI