"I Got A Ping From A Graduate Student": Fei Fei Li Describes How AlexNet Changed AI Forever

AI has exploded in the public consciousness in recent years, but the ground for much of this progress had been laid more than a decade ago.

The story of modern artificial intelligence is often told through complex charts and technical jargon. Yet, behind the algorithms and datasets are moments of human discovery that forever altered the course of technology. Dr. Fei-Fei Li, a leading figure in AI and the creator of ImageNet, the dataset that revolutionized computer vision, recently recounted one such pivotal moment. It was a late-night ping from a graduate student that signaled a seismic shift in the world of AI, a discovery that would pave the way for the generative AI and deep learning applications we see today.

The year was 2012, and the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) was underway. The challenge, spearheaded by Li, was designed to benchmark the performance of computer vision algorithms. It was a Herculean task, asking machines to identify objects in millions of diverse and complex images. Li describes the unassuming beginnings of a breakthrough:

“And I remember it was late night. One day I got a ping from my graduate student. I was home and [he] said, ‘We got a result that really, really stands out and you should take a look.’ And we looked into it. It was a convolutional neural network.”

At the time, the leading approaches to computer vision were not dominated by neural networks. The result that had her student so excited came from a then-lesser-known team led by Geoffrey Hinton at the University of Toronto, and included Ilya Sutskever, who would later become the Chief Scientist at OpenAi. As Li explains, their model wasn’t yet the famed “AlexNet” that would go down in history.

“It wasn’t called AlexNet at that time. That team, Geoffrey Hinton’s team, was called SuperVision. It was a very clever play of the word ‘super’ as well as ‘supervised learning.’ So, SuperVision. And we looked at what SuperVision did. It was an old algorithm; the convolutional neural network was published in the 1980s.”

The initial surprise was palpable. The foundational technology behind this remarkable result was not some brand-new, radical invention. Instead, it was a concept that had been around for decades but had been largely sidelined.

“There was a couple of tweaks in terms of the algorithm, but it was pretty surprising at the beginning for us to see that there was such a step change,” Li recalls. That “step change” was an understatement. The SuperVision model, which was the work of Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, and would soon be christened “AlexNet,” achieved an error rate of 15.3% in the ImageNet challenge. The runner-up languished at 26.2%. It wasn’t just a win; it was a demolition.

The implications of that late-night ping were monumental. AlexNet’s success was a resounding validation for the power of deep learning, particularly when combined with two other key ingredients: massive datasets, like ImageNet, and the computational power of graphics processing units (GPUs), which were originally designed for gaming. The “tweaks” Li mentions, such as the use of the ReLU activation function and dropout regularization, proved to be critical in training deep neural networks effectively and preventing them from overfitting the data. This triumph single-handedly pulled neural networks from the relative obscurity of “AI winter” into the forefront of computer science. It triggered an explosion of research and investment in deep learning that has led directly to the AI-powered tools revolutionizing industries from healthcare and finance to transportation and entertainment. The success of AlexNet is the direct ancestor of today’s large language models and generative AI, a testament to how a “surprising” result, communicated through a simple ping, can indeed change the world forever.