Google eyes new chips to speed up AI results, challenging Nvidia
The Alphabet-owned company has previously touted inference capabilities for its chips
DeeperDive is a beta AI feature. Refer to full articles for the facts.
[SEATTLE] In a matter of months, Google’s artificial intelligence chips have become one of the hottest commodities in the tech sector. Leading AI developers, including some of the firm’s biggest rivals, are stocking up on them.
Now, the Alphabet-owned company aims to build on its momentum with the likely introduction of new chips dedicated to inference, or running AI models after they have been trained. With this push, Google is poised to further challenge market leader Nvidia in a fast-growing category for semiconductors that’s fuelled by surging adoption of AI software.
As demand grows for quickly processing AI queries, “it now becomes sensible to specialise chips more for training or more for inference workloads”, Google chief scientist Jeff Dean said. “We are looking at a whole bunch of different things,” he added, including the speed of AI results it wants to enable.
The company plans to announce its new generation of custom-designed chips, known as tensor processing units, or TPUs, at the Google Cloud Next conference in Las Vegas this week. Amin Vahdat, who oversees Google’s AI infrastructure and chip work, declined to comment on plans for an inference chip that can speed up AI outputs, but said more will likely be shared “in the relatively near future”.
Nvidia’s graphics processing units, or GPUs, remain the gold standard for AI, particularly for training more advanced models. But a growing number of up-and-comers are vying to take on the chipmaker for inference uses, including by offering chips meant to cut down response times for chatbots and AI agents. Last month, Nvidia began selling a chip intended for faster inference based on technology it acquired from Groq as part of a reported US$20 billion licensing deal.
Google brings unique strengths to that competitive landscape, including a decade of experience designing chips, vast resources from its online search profits and firsthand insights on AI models. Among the top AI developers, only Google makes its own chips at a significant scale, allowing it to share vital feedback between teams to better customise hardware. (OpenAI is only now starting to design its own.)
Navigate Asia in
a new global order
Get the insights delivered to your inbox.
In a recent podcast interview, Nvidia’s Jensen Huang stressed the advantages of his company’s chips, saying they can do “a whole bunch of applications” that “you can’t do with TPUs”. Google, for its part, relies on a mix of TPUs and GPUs for its own work. “A lot of people would like to run on both,” Demis Hassabis, CEO of Google DeepMind, told Bloomberg. Interest in TPUs is particularly high from leading AI labs, he said.
Google has previously touted inference capabilities for its chips. It also considered releasing separate chips for training and inference early on, according to Partha Ranganathan, a vice-president and engineering fellow at Google, but so far it’s resisted that approach. That might change soon as the AI spending boom moves from training to inference.
“The battleground is shifting towards inference,” said Chirag Dekate, an analyst at Gartner, who notes that in his experience, Google’s Gemini model is the fastest at responding to complex reasoning tasks. “In that battleground, Google has an infrastructure advantage.”
Already, today’s TPUs are a strong choice for processing results for the emerging crop of AI agents that field more complex work on a user’s behalf, according to Natalie Serrino, co-founder at Gimlet Labs, a startup that makes software for routing AI tasks to the best chip for each job. “They are very good tools for the workload that is exploding,” she said.
An overnight success that took a decade
Google’s long-simmering chip efforts gained new attention in October when Anthropic PBC, one of the most closely watched AI developers, unveiled an expanded agreement to access as many as one million TPUs. The next month, Google debuted the more advanced Gemini 3 model, trained and run on TPUs, to rave reviews.
Since then, demand for Google’s chips has only grown among large firms. Meta Platforms signed a multibillion-dollar deal to use TPUs through Google Cloud over several years. The company just received access to its first significant supply and is testing them out to see what tasks they are best suited for, said Santosh Janardhan, Meta’s head of infrastructure. “It does look like there might be inference advantages,” he said, while noting that “no new platform is without hurdles and a learning curve”.
Anthropic also signed a deal with Broadcom, Google’s TPU partner, for chips that will enable it to tap into about 3.5 gigawatts of computing power starting in 2027. Citadel Securities plans to present at the Google conference about how TPUs let the company train models faster than previous work with GPUs. And G42, the Abu Dhabi technology conglomerate, has held “multiple discussions” with Google about using its TPUs, according to Talal Al Kaissi, the interim CEO of Core42, the firm’s cloud unit. “I’m very bullish,” Al Kaissi said about the talks.
Google is already taking new steps to meet customers where they are. The company is testing out letting companies such as Anthropic run some of their TPUs in their own data centres rather than Google’s facilities, according to a source familiar with the matter. It has also enabled TPU customers to use outside tools such as PyTorch as well as other scheduling software rather than solely relying on Google’s products, Vahdat said.
Those changes are helping shift perception for chips that were born out of Google’s computing bottlenecks and long thought of as primarily useful for the company to meet its own needs.
After Dean, Google’s chief scientist, started building an earlier AI software system to let people use language translation and voice recognition services, he realised there was no way that even Google could afford to deliver it using available chips and hardware. At the same time, the central processing units Google relied on for AI were improving at a slower rate.
The company decided it should build an accelerator that focused on a narrower set of tasks that might rack up the biggest bills for AI. The key idea behind the TPU is that it “solves a small number of problems but the amount of computation required for them was enormous”, said Vahdat, a former computer science professor who played an early, key role in pushing Google to adopt the optical switches that help connect TPUs into supercomputers. “The conventional wisdom at the time was you don’t build specialised hardware.”
Over the years, Google’s TPUs have evolved alongside its AI work. A seminal 2017 Google research paper that gave rise to today’s large language models also pushed the TPU team to focus on chips for training bigger AI systems. Later, Google DeepMind and the chips team noticed that TPUs were sitting unused too often when deployed for reinforcement learning, a popular method for improving AI systems at specific tasks. The TPU team adjusted how they network various semiconductors to get the data flowing faster and avoid chips sitting idle.
That dynamic continues today as Google debates how many chips to link together in a single pod or whether the hardware can be less precise in order to save money. “A lot of those things are informed by the model experiments,” Hassabis said. In the future, he would love the TPU team to consider making an accelerator for edge-of-network cases, where the chip is placed closer to users, rather than being accessed via the cloud to reduce latency.
Along the way, Google has also built systems to more rapidly spot manufacturing flaws that can have an outsize impact on software. When working with AI accelerator chips that manage massive amounts of math, even a subtle failure can metastasise and cause a model to “completely self-destruct,” said Paul Barham, the Google distinguished scientist who co-leads the Gemini infrastructure team. An issue like that happened at Google about two years ago and took weeks to sort out what happened, he said, describing these as “bugs from hell”.
“We now have to do that with hundreds of thousands of accelerator chips within 10 seconds,” he said.
The guessing game
For all its expertise in AI development, Google faces a similar challenge to other chipmakers: Chips usually take about three years to develop from start to finish, but AI models are evolving much faster. That makes it difficult to predict what customers will want several years out.
“If anybody claims they know what Gemini 10 is going to look like, I’m like, ‘Please give me whatever you are smoking’,” Ranganathan said.
Barham also worries that the tight feedback loop between the AI model creators and the hardware designers can run the risk of missing new ideas. There’s “this cycle that traps you into what works well on the current software and hardware”, he said.
To strike a middle ground, the TPU team sometimes aims for the chip to be good enough for various uses, even if it’s not perfect for each. The other option, Vahdat said, is to plan two different designs. Both may not ship, but they could if the use case for each is compelling enough.
As Google’s chips become more popular, the company risks supply constraints, not unlike Nvidia. One startup executive, who spoke on condition of anonymity to discuss internal matters, said their company’s use of TPUs has been limited by availability and complained that Google had effectively given all its chips to Anthropic.
“Mostly we are sort of favouring what supply we do have to the more elite teams who obviously are the ones that could maybe take the most advantage out of what the TPUs do best,” Hassabis said, referring to top AI firms. Going forward, Google will also need to decide how to allocate TPUs between its own growing slate of competitive AI services and its burgeoning roster of customers.
“There are benefits to making TPUs only for Google, but there are substantial downsides,” Vahdat said. “Eventually, you wind up on what we refer to as a tech island. It might be a beautiful island, but it’s going to be limited in population and it’s going to be limited in diversity. In the end, it’s probably going to be less good.” BLOOMBERG
Decoding Asia newsletter: your guide to navigating Asia in a new global order. Sign up here to get Decoding Asia newsletter. Delivered to your inbox. Free.
Share with us your feedback on BT's products and services