Google Releases Deep Research Max, Tops HLE, BrowseComp & DeepSearchQA Benchmarks

The frontier labs are continuing to release ever-better AI models with predictable regularity.

Google DeepMind’s latest move is the launch of Deep Research Max — a new autonomous research agent built on Gemini 3.1 Pro that the company says represents a material step forward in long-horizon research quality, benchmark performance, and enterprise readiness.

Two Agents, Two Use Cases

Google is introducing two distinct configurations. Deep Research replaces the preview agent released in December 2025, delivering lower latency and reduced cost — designed for real-time, interactive surfaces. Deep Research Max, the headline release, is built for depth over speed. It leverages extended test-time compute to iteratively reason, search, and refine its outputs, making it best suited for asynchronous workflows: think a nightly cron job that produces exhaustive due diligence reports by morning.

The distinction matters — Deep Research is a tool for end-user interfaces; Deep Research Max is infrastructure for enterprise pipelines.

Benchmark Numbers That Stand Out

On three industry-standard benchmarks, Deep Research Max posts results that outpace both its predecessor and competing models:

DeepSearchQA (comprehensive web research): 93.3%, versus 81.8% for the new Deep Research, 66.1% for the December release, and 88.5% for GPT 5.4 Thinking (xhigh)
Humanity’s Last Exam (reasoning and knowledge): 54.6%, versus 50.4% for Deep Research, 46.4% for the December release, and 53.4% for GPT 5.4
BrowseComp (locating hard-to-find facts): 85.9%, versus 61.9% for Deep Research, 59.2% for the December release, and 58.9% for GPT 5.4

The gains over the December 2025 version are significant across the board — not marginal. Google also beats Anthropic’s Opus 4.6 on all three benchmarks, though Gemini’s momentum in the broader AI market has been building for some time now.

MCP Support and Native Visualizations

Beyond raw benchmark performance, the more consequential updates are functional. Deep Research can now connect to arbitrary remote MCP (Model Context Protocol) servers, transforming it from a web search engine into an agent capable of navigating proprietary and specialized data repositories — financial data providers, internal databases, custom document stores.

Partners already in the pipeline include FactSet, S&P, and PitchBook, with active collaboration underway on MCP server designs for financial workflows. For professionals in sectors like investment research or life sciences, this is the part of the announcement worth paying close attention to.

The agent also now generates native charts and infographics inline — a first for Deep Research in the Gemini API. Reports are no longer purely textual; the system can now visualize complex datasets within the document itself, making outputs more immediately usable for stakeholder presentations.

Other Capability Upgrades

Additional features shipped alongside the main release. Users can review and modify the agent’s research plan before execution begins, giving granular control over scope. Deep Research can run simultaneously with Google Search, URL Context, Code Execution, and File Search — or with web access disabled entirely for fully private, internal-only research. The agent accepts PDFs, CSVs, images, audio, and video as input context. Intermediate reasoning steps stream live, useful for building interactive research products.

The Bigger Picture

Google has been on a sustained upward trajectory in the generative AI market. Deep Research Max is not a consumer product — it’s a developer and enterprise offering available via the Gemini API’s Interactions API on paid tiers. But it signals where Google is placing its bets for enterprise AI: not just capable models, but end-to-end autonomous research infrastructure that plugs into real workflows.

The same infrastructure, Google notes, already powers research capabilities inside the Gemini App, NotebookLM, Google Search, and Google Finance. Deep Research Max is, in effect, a productized version of that research stack — now available via API to developers building in finance, life sciences, and market research.

Whether the benchmark numbers translate to production-grade utility at scale remains to be seen. But the architecture — blending open web search with proprietary data streams, native visualizations, and MCP extensibility — is a serious enterprise proposition, not just a benchmark play.