Google Releases Gemini Robotics-ER 1.6 Model To Give Robots Eyes That Can Actually Read The Room

Google doesn’t only have the top AI model on benchmarks in Gemini 3.1 Pro and the top on-device model in Gemma 4, but it’s also released a top model for robotics.

Google has unveiled Gemini Robotics-ER 1.6, a significant upgrade to its embodied reasoning model that enables robots to understand their physical environments with far greater precision than before. The model is now available to developers via the Gemini API and Google AI Studio.

We’re rolling out an upgrade designed to help robots reason about the physical world. 🤖

Gemini Robotics-ER 1.6 has significantly better visual and spatial understanding in order to plan and complete more useful tasks. Here’s why this is important 🧵 pic.twitter.com/rxT1lkYZZB
— Google DeepMind (@GoogleDeepMind) April 14, 2026

What’s New

Gemini Robotics-ER 1.6 is designed to serve as the high-level reasoning brain for a robot — capable of executing tasks by natively calling tools like Google Search, vision-language-action models (VLAs), and user-defined functions. The 1.6 release builds on its predecessor, Gemini Robotics-ER 1.5, with meaningful improvements across three core capabilities: spatial pointing, success detection, and an entirely new skill — instrument reading.

Pointing may sound basic, but it’s foundational to how a robot reasons about the world. The model uses points as intermediate steps to count objects, identify grasp points, map motion trajectories, and reason through constraint-based prompts like “point to every object small enough to fit inside the blue cup.” In benchmark testing, ER 1.6 correctly identified the number of hammers, scissors, paintbrushes, and pliers in a cluttered scene — and crucially, did not hallucinate objects that weren’t there. Its predecessor, ER 1.5, failed on several of those same counts and hallucinated a wheelbarrow that wasn’t in the image.

Success detection — knowing when a task is actually finished — is equally critical for autonomous operation. Most real-world robotics setups use multiple camera streams simultaneously (e.g., overhead and wrist-mounted), and the model must reason across all of them coherently, even in occluded or poorly lit conditions. ER 1.6 advances multi-view reasoning substantially, allowing robots to combine these feeds into a coherent, moment-by-moment picture of task completion.

The Standout: Instrument Reading

The headline new capability is instrument reading, and it’s a direct result of Google’s partnership with Boston Dynamics.

Industrial facilities are packed with instruments — pressure gauges, thermometers, chemical sight glasses — that require constant monitoring. Spot, Boston Dynamics’ quadruped robot, already roams facilities capturing images of these instruments. The missing piece was a model smart enough to actually interpret them.

Instrument reading is harder than it sounds. A robot must precisely perceive needles, liquid levels, tick marks, and container boundaries — and then combine that with world knowledge to interpret units, account for camera distortion, and even handle gauges with multiple needles referring to different decimal places.

ER 1.6 tackles this through agentic vision, a combination of visual reasoning and code execution. The model zooms into an image to resolve fine detail, uses pointing and code execution to estimate proportions and intervals, and applies world knowledge to arrive at a final reading. The results are striking: Gemini Robotics-ER 1.5 managed only a 23% success rate on instrument reading; Gemini 3.0 Flash reached 67%; ER 1.6 hit 86%; and ER 1.6 with agentic vision enabled reached 93%.

“Capabilities like instrument reading and more reliable task reasoning will enable Spot to see, understand, and react to real-world challenges completely autonomously,“ said Marco da Silva, Vice President and General Manager of Spot at Boston Dynamics.

This is a concrete, commercial-grade use case — not a demo. Facilities like oil refineries, chemical plants, and data centers have long relied on manual inspection rounds. A robot that can reliably read a pressure gauge autonomously changes that equation significantly.

Safety Is Baked In

Google is also positioning ER 1.6 as its safest robotics model to date. It shows improved compliance with safety policies on adversarial spatial reasoning tasks, and makes better decisions around physical constraints — for instance, correctly identifying which objects should not be picked up based on gripper or material limitations.

On tasks modeled after real-life injury reports, the Gemini Robotics-ER models improved over baseline Gemini 3.0 Flash performance by 6% in text-based scenarios and 10% in video-based ones. Safety in robotics isn’t an afterthought; a model that misjudges the weight of an object or ignores a constraint like “don’t handle liquids” can cause real-world damage.

The Bigger Picture

Google’s robotics ambitions have been building steadily. The company has already released an on-device robotics model that works without an internet connection, lowering the barrier for deployment in environments with limited connectivity. ER 1.6 is the cloud-side complement — a powerful reasoning engine that doesn’t need to run locally but can orchestrate complex multi-step physical tasks with the help of external tools and real-time web data.

Google’s broader AI momentum makes this release more significant. A year ago, Google was widely seen as lagging in the AI race. Today it holds top spots across general reasoning (Gemini 3.1), open models (Gemma 4), and now physical-world AI. The robotics space — unlike the crowded consumer chatbot market — is still early enough that a technically superior platform can define the ecosystem. If Google can make Gemini Robotics-ER the default reasoning layer for industrial and commercial robots, the downstream value could dwarf what’s happening in the AI assistant space.

The model is available now via the Gemini API, with a developer Colab notebook to help teams get started with embodied reasoning tasks.