Jeff Dean On How A Compute-Intensive Speech Recognition Feature Made Google Develop Its Own TPUs In 2015

Google’s TPUs are now considered to be giving stiff competition to NVIDIA’s GPUs for AI use-cases — Google’s Gemini 3 models were trained entirely on TPUs, and Anthropic has signed a deal to get 1 million TPUs from Google — but their origin lies from when Google realized that it didn’t have enough computing resources to launch a speech recognition tool back in 2015.

Jeff Dean, one of Google’s most legendary engineers and a key architect of its AI infrastructure, recently shared the origin story of Google’s Tensor Processing Units (TPUs) during a talk at the Stanford AI Club in 2025. His account reveals how a simple back-of-the-envelope calculation about a speech recognition feature led to one of the most significant hardware innovations in the history of artificial intelligence.

“One of the other things we started to realize as we were getting more and more success in using neural nets for all kinds of interesting things in speech recognition and vision and language, was that we had just produced a really high quality speech recognition model that we hadn’t rolled out, but we could see that it was much lower error rate than the current production speech recognition system at Google, which at that time ran in our data centers,” Dean explained.

The problem became apparent when Dean did a quick calculation. “I said, well, if speech recognition gets a lot better, people are gonna want to use it more. And so what if a hundred million people want to start to talk to their phones for three minutes a day? Just as random numbers pulled out of my head. And it turned out if we wanted to run this high quality model on CPUs, which is what we had in the data centers at that time, we would need to double the number of computers Google had in order just to roll out this improved speech recognition feature.”

This stark realization forced Google to reconsider its approach. “So I said, well, we really should think about specialized hardware because there’s all kinds of nice properties for neural net computations that we could take advantage of by building specialized hardware,” Dean said. “In particular, they’re very tolerant of very low precision computations, so you don’t need 32 bit floating point numbers or anything like that. And all the neural nets that we’d been looking at at the time were just different compositions of essentially dense linear algebra operations, matrix multiplies, vector products and so on. So if you can build specialized hardware that is really, really good at reduced precision linear algebra, then all of a sudden you can have something that’s much more efficient.”

Dean described how Google assembled a team to tackle the challenge. “We started to work with a team of people who are chip designers and board designers. And this is kind of the paper we ended up publishing a few years later, but in 2015 we ended up having these TPU v1, so the Tensor Processing Unit, which was really designed to accelerate inference, roll out into our data center.”

The results were remarkable. “We were able to do a bunch of nice empirical comparisons and show that it was 15 to 30 times faster than CPUs and GPUs at the time, and 30 to 80 times more energy efficient,” Dean noted. The impact of this work has been substantial — the TPU paper presented at ISCA in 2017 became the most cited paper in the conference’s 50-year history.

The implications of Google’s decision to build custom AI hardware have reverberated throughout the tech industry. What began as a pragmatic solution to a resource constraint has evolved into a strategic advantage, enabling Google to train massive models like Gemini without relying on external chip suppliers. The TPU story also foreshadowed a broader trend: as AI workloads have grown exponentially, major tech companies including Amazon, Microsoft, and Meta have all launched their own custom chip initiatives. Google’s early bet on specialized hardware, born from Jeff Dean’s napkin math about speech recognition usage, ultimately helped shape the infrastructure that powers today’s AI revolution.