Google DeepMind has launched Gemini Omni Flash — the first model in a new family designed to collapse video, image, and audio generation into a single system.
Announced at Google I/O 2026, Omni Flash is now accessible in the Gemini app, Flow by Google, and YouTube Shorts, with API access expected in the coming weeks.
The release is significant. Until now, Google ran a split-model architecture — Veo for video, separate systems for image generation. Omni collapses that stack. It combines Gemini’s intelligence with Google’s generative media systems to handle creation across modalities from a single model. The goal, in DeepMind’s framing, is a model that can “create anything from anything.”

What Omni Flash Can Do
The capabilities announced go well beyond incremental improvement in video quality.
Physics and world understanding. The model brings an improved grasp of how environments behave — actions have consequences, objects respond to events, and narratives evolve in ways that are logically consistent. Getting physics right in generated video is a genuinely hard problem; early outputs show it handling edge cases like math written on a blackboard — equations correct, video lifelike — that have tripped up prior models.
Consistent characters. Define a character once and Omni maintains visual consistency across locations, lighting conditions, and actions. This has been a stubborn limitation of video generation models, and solving it meaningfully expands the practical uses — from marketing to storytelling to game prototyping.
Reference-based editing. Users can apply styles, motion, or effects using input references or plain language. The model accepts natural language instructions to change environments, add objects, or reimagine existing footage. Point your phone at something; Omni can transform it.
Video remixing. Perhaps the most consumer-facing capability: take a video you shot and ask Gemini Omni to reimagine the action in it. This is real-time, conversational editing — no timeline, no keyframes, no exports.
Where It Sits Competitively
Google has been building toward this with a clear trajectory. Gemini 3 Deep Think topped the ARC-AGI-2 benchmark at 84.6%. Gemini 3.1 Pro posted leading scores on 13 of 16 evaluated benchmarks. Veo 3.1 is already considered one of the strongest video generation models available. Omni builds on all of that — but the structural shift is what matters.
The competition isn’t just about benchmark rankings anymore. OpenAI has Sora. Runway has Gen-4.5. ByteDance’s Seedance has ranked highly on public evaluations. But Omni isn’t trying to win a video-model leaderboard; it’s positioning Google as the single destination for AI-native content creation across formats.
That is an advantage only Google can credibly attempt. As Demis Hassabis has noted, the company’s ecosystem — Search, YouTube, Workspace, Chrome — gives it integration surfaces no pure-play AI startup can match. Omni landing inside YouTube Shorts and the Gemini app on day one is not incidental. It is the strategy.
The “Flash” Naming
Calling this Omni Flash — rather than just Omni — signals deliberate architecture. Flash, in Google’s naming convention, has historically meant an efficient, faster model optimized for consumer-facing deployment (as with Gemini 1.5 Flash). Positioning Flash as the first in the Omni family suggests a full-capability Omni model is in development, likely targeting enterprise and API users who need heavier lifting.
What Comes Next
API access is on its way in the coming weeks. That opens the floodgates for developers to build on top of Omni’s generation and editing capabilities — particularly the character consistency and remixing features, which have obvious applications in advertising, entertainment, and education.
Google DeepMind’s track record of shipping ambitious generative systems — from Genie 3’s interactive world models to Gemini’s multimodal reasoning — suggests this is not a demo. Omni Flash is live, and the broader Omni family appears to be the lens through which Google intends to compete in the generative AI era.