Gemini 3.5 Flash Places Fifth On Artificial Analysis Intelligence Index, Is Faster Than Other Models

Google hasn’t released Gemini 3.5 Pro yet — it said it would release it next month — but it’s released a competent new model in Gemini 3.5 Flash.

Announced at Google I/O 2026, Gemini 3.5 Flash scores 55 on the Artificial Analysis Intelligence Index, placing it fifth overall. GPT-5.5 leads the index at 60, followed by Claude Opus 4.7 and Gemini 3.1 Pro Preview at 57 each, and GPT-5.4 at 57 as well. The model sits ahead of Grok 4.3 (high) at 53 and Claude Sonnet 4.6 (max) at 52. For a Flash-tier model — historically Google’s faster, cheaper alternative to its Pro lineup — that’s a striking result.

A Step Change in Agentic Performance

The headline improvement is in agentic capability, which has traditionally been a relative weakness for Gemini models. On GDPval-AA, Artificial Analysis’s benchmark for real-world economically valuable tasks, Gemini 3.5 Flash achieves an Elo of 1,656 — well ahead of Gemini 3 Flash (1,204) and Gemini 3.1 Pro (1,314), and just behind GPT-5.4 (xhigh) at 1,674. It also posts strong gains on Tau2-Bench Telecom, an agentic tool-use benchmark.

That GDPval-AA result is notable in context. When Gemini 3.1 Pro launched in February, it still trailed Claude Sonnet 4.6 and Claude Opus 4.6 on agentic evaluations. Gemini 3.5 Flash clears both of them. Google appears to have made a deliberate architectural push toward long-horizon tasks: the model can plan across large codebases, deploy subagents in parallel, and sustain complex multi-step workflows. The tradeoff is that it uses an average of 49 turns per task — one of the highest recorded, and ahead of Claude Opus 4.7 (max) at 45 turns and GPT-5.4 (xhigh) at 40.

Speed Is the Real Story

Where Gemini 3.5 Flash is unambiguously strong is on the speed-intelligence frontier. At over 280 output tokens per second, it is roughly 70% faster than Gemini 3 Flash and significantly faster than GPT-5.4 mini (xhigh). On Artificial Analysis’s intelligence-vs-speed scatter plot, it sits firmly in the “most attractive” upper-right quadrant — high intelligence, high speed — alongside Gemini 3.1 Pro and Gemini 3.1 Flash-Lite. Google’s dominance on this frontier is becoming a structural advantage. No other lab has multiple models on the Pareto line.

The Multimodal Lead Extends

Google continues to lead on multimodal benchmarks. Gemini 3.5 Flash scores 84% on MMMU-Pro, the highest recorded on that evaluation, nudging past Gemini 3.1 Pro’s 82%. It supports image, video, and speech input alongside text — a broader modality set than Claude Opus 4.7, Grok 4.3, and GPT-5.5, which handle image input only.

Hallucinations Down, But Not Gone

On AA-Omniscience, Artificial Analysis’s knowledge and hallucination benchmark, Gemini 3.5 Flash improves by 11 points. Its hallucination rate falls to 61%, a 31-point drop from Gemini 3 Flash’s 92%. That’s a meaningful improvement, though it still trails GPT-5.5’s 86% hallucination rate — which itself remains a concern for knowledge-intensive enterprise deployments — and Claude Opus 4.7’s 36%.

The Cost Problem

The major caveat is price. Gemini 3.5 Flash is priced at $1.50 per million input tokens and $9 per million output tokens — a 3x increase over Gemini 3 Flash’s $0.50/$3.00. Combined with higher input token usage from longer agentic turn counts, the total cost to run the Artificial Analysis Intelligence Index comes to $1,552 — 5.5x more expensive than Gemini 3 Flash, and 75% more than Gemini 3.1 Pro. It costs roughly $200 more to run than GPT-5.4 mini (xhigh), but scores 7 intelligence index points higher.

For enterprises running high-volume workloads, the jump from Gemini 3 Flash will be jarring. Gemini 3.5 Flash is no longer a budget option. It’s a mid-tier model priced above its predecessor’s Pro sibling. Whether the agentic gains justify the cost will depend heavily on use case. Google offers a 90% discount on cached input tokens, which could help applications where context is heavily reused.

What Comes Next

Gemini 3.5 Flash is a genuinely capable release — it out-agentifies Gemini 3.1 Pro, leads on multimodal benchmarks, and sits on the speed-intelligence Pareto frontier. But it is a Flash model competing against Pro models, and it doesn’t top the overall index. GPT-5.5 leads at 60, and Gemini 3.5 Pro — when it arrives — will be the real test of whether Google can reclaim the intelligence crown. The Flash release suggests the Pro model will be formidable. For now, Google’s Flash tier has effectively caught up to the previous generation of frontier Pro models, which is itself a statement about how quickly AI is moving.

Posted in AI