We're A Bit Behind In Agentic Coding With Tool Use: Google CEO Sundar Pichai

Coding has been the killer use-case of AI so far, and Google appears to acknowledge that it’s not quite at the frontier in this particular area.

In a candid interview on the Hard Fork podcast, Google CEO Sundar Pichai admitted that while Google’s models are competitive across most dimensions, the company lags behind in agentic coding with tool use — one of the most commercially consequential battlegrounds in AI right now.

Pichai was measured in assessing where Google stands. “Our models are at the frontier in some areas, and there are areas where we are behind the frontier — it’s a combination. If you look at overall capabilities, including text, multimodality, voice or audio, and reasoning in general, I think we are very capable. When it comes to agentic coding with tool use, or instruction following, long-horizon tasks — I think we are a bit behind at this moment,” he said.

Pichai was candid about why: a lack of the right surface area to get feedback. “Coding was the area where getting access to data flows was important. We maybe quite didn’t have the surface — like Claude Code as an example, or what Anthropic had with Cursor. Getting Anti-Gravity with 2.0, we’ve been using it internally at Google for a while. I shared the token usage at Google I/O. I’ve never seen anything like it internally — we are doubling every week, and people are really putting the models to work. That is helping us hill climb quite a bit.”

He remained bullish on Google’s trajectory, pointing to Gemini 3.5 Flash as a meaningful step forward: “I think we took a big step forward with 3.5 Flash. It does address some of the areas where we have been behind. Getting it out in the real world and iterating with data coming back is going to really help us.”

Pichai also pushed back against the notion that short-term rankings define the race: “The space is so dynamic. All leading labs have their own pre-training cycles, so you have these cadences, and they may not exactly match up. The moment is intense enough that if you’re slightly off — three months ago, people were saying ‘we are ahead and no one can catch up,’ and now the conversation flips. But that’s part of the territory of being at the frontier. We are the only large company actually at that frontier. There are a couple of startups which have made extraordinary progress, and we have been deeply working on this for a long time. I’m very, very optimistic and confident we’ll push through.”

Pichai’s remarks reflect a broader, well-documented pattern. On Code Arena’s agentic webdev leaderboard, Anthropic’s models currently hold the top two spots, with Gemini 3.1 Pro sitting at rank 8. Similarly, in real-world agentic evaluations, Claude models and GPT-5.2 have retained an edge over Google even when Gemini leads on raw benchmarks. The gap between benchmark performance and agentic, tool-use tasks — the kind that actually ship software — has been Google’s persistent blind spot.

The Anti-Gravity platform Pichai references is Google’s answer to Cursor and Claude Code. Google Antigravity bills itself as an “agentic development platform for the agent-first era,” launched in November 2025. Notably, Google simultaneously invested in Cursor — a direct competitor — underscoring just how strategically fraught this space is for the company.

The good news for Google, as Pichai suggests, is that Gemini 3.5 Flash — announced at Google I/O 2026 — shows real progress. Purpose-built for agents and long-horizon tasks, it reportedly outperforms Gemini 3.1 Pro on coding and agentic benchmarks, which is a meaningful turnaround for a model in the Flash tier. The race, as Pichai puts it, is dynamic enough that today’s gap can close fast — but only if Google keeps getting the real-world feedback loops that agentic coding tools provide.