Thus far, most AI adoption has happened through cloud services, but Google’s latest Gemma 4 model seems to be bringing on-device AI to the mainstream.
Google’s AI Edge Gallery app has climbed to #8 among the top downloaded productivity apps on Apple’s App Store — a signal of mainstream curiosity in running AI models directly on personal devices. Google’s Developer Relations head Logan Kilpatrick noted the milestone on social media: “Lots of people want Gemma 4! Google AI Edge is #8 on the iOS App Store for productivity apps.”

What Is Google AI Edge Gallery?
Google AI Edge Gallery is an open-source app available on both iOS and Android that lets users download and run large language models entirely on their devices — no internet connection required. It is effectively Google’s showcase for what on-device AI can do today.
The latest version of the app centers on Gemma 4, Google DeepMind’s newest family of open-weight models. Google describes these as “byte for byte, the most capable open models” it has built. The Gemma series has now crossed 400 million downloads and 100,000 community variants since its first release — and Gemma 4 represents a significant leap in capability and accessibility.
The Edge Models: Designed For Your Pocket
Gemma 4 ships in four variants. The two designed for smartphones — the Effective 2B (E2B) and Effective 4B (E4B) — are built specifically for low-latency, offline inference on mobile hardware. Google engineered these models to run using under 1.5GB of memory in some cases, using 2-bit and 4-bit weight quantization. They support a 128K context window, multimodal input (text, images, and audio), and over 140 languages.
In demonstrations, both models responded with near-zero latency even in airplane mode — meaning all processing happened on the device itself, with no data leaving the phone.
The models were developed in close collaboration with Google’s Pixel team, Qualcomm Technologies, and MediaTek, and are optimized for the latest generation of mobile AI accelerators.
What Users Can Do In The App
The app offers several modes out of the box:
- AI Chat with Thinking Mode — multi-turn conversations with a visible step-by-step reasoning trace, available for Gemma 4 models
- Ask Image — multimodal queries using the device camera or photo library
- Audio Scribe — real-time voice transcription and translation
- Agent Skills — an agentic layer that gives the model access to tools like Wikipedia and interactive maps, enabling multi-step autonomous workflows entirely on-device
- Prompt Lab — a sandbox for testing prompts with control over parameters like temperature
The Agent Skills feature is particularly notable: it represents one of the first implementations of multi-step agentic AI running fully offline on a consumer phone.
Why The Apache 2.0 License Matters
One of the most consequential decisions Google made with Gemma 4 was releasing it under an Apache 2.0 license — a commercially permissive open-source license that allows businesses to use, modify, and build on the models without restriction. Previous Gemma releases came with more limiting terms. Hugging Face co-founder Clément Delangue called the licensing shift “a huge milestone.”
For enterprises and developers, this removes a significant barrier to deploying Gemma 4 in production applications.
The Bigger Picture: On-Device AI Goes Mainstream
The app’s App Store ranking is more than a vanity metric. It reflects a genuine and growing appetite for AI that runs locally — driven by concerns over data privacy, the desire for offline functionality, and the appeal of faster, cheaper inference that doesn’t depend on cloud APIs.
Google’s move comes as the broader on-device AI market heats up. Apple has been expanding Apple Intelligence, and several startups are building local-first AI tools. But Google is one of the few companies with the hardware partnerships, open-model ecosystem, and developer tooling to make on-device AI genuinely accessible to non-developers.
The 31B Dense variant of Gemma 4 currently ranks third among all open models on the Arena AI chat leaderboard, while the 26B MoE sits sixth — reportedly outcompeting models twenty times its size. That the smaller, mobile-optimized siblings are powering a top-10 App Store hit is a sign that Google’s bet on the edge is beginning to pay off.
For developers, Google’s LiteRT-LM runtime — available on Linux, macOS, Raspberry Pi, and Android — provides a path from experimentation in the app to production deployment across a much broader range of hardware.
The era of AI running silently on your pocket device, with no server involved, is no longer a research demo. It just hit #8 on the App Store.