Even as Google has released Gemini 3.1, which has topped intelligence benchmarks, it’s also innovating on its open model Gemma series.
Google DeepMind has launched Gemma 4, the latest generation of its open-weight model family, claiming it delivers the highest intelligence-per-parameter of any open model available. “Byte for byte, the most capable open models,” the company said in its announcement. Demis Hassabis, Google DeepMind CEO, called them “the best open models in the world for their respective sizes.” Hassabis had previously teased the release by posting four cryptic diamonds in an X post.
Four Sizes, Two Strategies
Gemma 4 ships in four variants: Effective 2B (E2B), Effective 4B (E4B), a 26B Mixture-of-Experts (MoE), and a 31B Dense model. Each targets a distinct use case.
The 31B Dense model currently ranks third among all open models on Arena AI’s chat leaderboard, while the 26B MoE sits at sixth — reportedly outcompeting models 20 times its size. The 26B activates only 3.8 billion parameters during inference, prioritizing low latency, while the 31B maximizes raw quality and serves as a stronger fine-tuning base.

The E2B and E4B models are engineered for mobile and IoT. Built in collaboration with Google’s Pixel team and hardware partners including Qualcomm and MediaTek, both run fully offline on edge devices — phones, Raspberry Pi, and NVIDIA Jetson Orin Nano — with near-zero latency. Android developers can prototype agentic flows today via the AICore Developer Preview.
What’s New
All Gemma 4 models support multimodal input (video and images), context windows of 128K (edge models) to 256K (larger models), and native training across 140+ languages. The E2B and E4B models add native audio input for speech recognition.
On the capability side, Google highlights multi-step reasoning, native function-calling and structured JSON output for agentic workflows, and offline code generation. The larger models’ unquantized weights fit on a single 80GB NVIDIA H100 GPU; quantized versions run on consumer GPUs.
“On GPQA Diamond, our scientific reasoning evaluation, Gemma 4 31B (Reasoning) scores 85.7%, the second highest result we have recorded for an open-weights model with fewer than 40B parameters, just behind Qwen3.5 27B (Reasoning, 85.8%),” said Artificial Analysis. “It reaches this score using only ~1.2M output tokens, fewer than Qwen3.5 27B (~1.5M) and Qwen3.5 35B A3B (~1.6M). Gemma 4 26B A4B (Reasoning) scores 79.2%, ahead of gpt-oss-120B (high, 76.2%) but behind Qwen3.5 9B (Reasoning, 80.6%),” it added.

The Gemma series has built a strong track record of enabling specialized derivatives. MedGemma, built on Gemma 3, has been used for medical imaging and report generation. Google also released DolphinGemma for dolphin vocalization analysis and SignGemma for sign language translation — illustrating how the open model architecture enables applications well beyond general-purpose chat.
Apache 2.0 and Ecosystem
In a direct response to developer feedback, Gemma 4 is released under an Apache 2.0 license — a shift from the more restrictive terms on earlier Gemma releases. Hugging Face co-founder Clément Delangue called it “a huge milestone.” The models are available immediately on Hugging Face, Kaggle, and Ollama, with support from vLLM, llama.cpp, MLX, LM Studio, Unsloth, and others on day one.
For scale, Google Cloud users can deploy through Vertex AI or GKE with TPU-accelerated serving.
Context
The Gemma series has crossed 400 million downloads and 100,000 community variants since its first release. Gemini 3 had established Google at the top of the proprietary model leaderboards, and Gemini 3.1 extended that lead. Gemma 4 signals that Google is investing equally in the open-weight ecosystem, targeting developers who need on-device performance, data sovereignty, and fine-tuning flexibility — areas where proprietary cloud models can’t compete.
The 400 million downloads suggest strong developer adoption, though as OpenRouter usage data indicated, Google’s open models have historically lagged Meta’s LLaMA and DeepSeek in actual deployment. Gemma 4’s combination of competitive benchmark performance, an Apache 2.0 license, and strong mobile-first engineering could make a more compelling case for the ecosystem than any previous Gemma generation.