Most frontier AI labs are working on the best text, image, or video generation models, but trust Google DeepMind to come up with something completely different.
Google DeepMind has launched its Genie 3 world model. This model can create “worlds” from a single prompt, such as a world on Mars, or under the ocean. Users can navigate these worlds like in a video game for multiple minutes.
Unlike video games, in which the game world is already decided and pre-coded, the worlds generated from Genie 3 are created on the fly. The user can navigate through these worlds, walk along surfaces, and even interact with them. The Genie 3 model also has memory, which means that the model remembers any action you’ve taken in a world — if you’ve painted on wall, even when you pan away and then look at the wall again, you’ll notice the paint patch you added is still there.
The world also understands physics. In a Genie 3 world that simulates a buggy on Mars, the buggy bounces realistically on craters, and takes feedback from the environment. In a world next to a river, the water splashes realistically when a jet ski goes through it.
Google already had a similar world model named Genie 2, but Genie 3 is better in several respects. Genie 2 created worlds in 360p resolution, while Genie 3’s worlds at in picture-perfect 720p. Genie 2 only created game-specific worlds, while Genie 3 can create any kind of world. And most crucially, while Genie 2’s interaction horizon was only 10-20 seconds, Genie 3’s worlds can last for several minutes.
Google already has an extremely capable video generation model in Veo3, but Genie 3 differs in several ways. Veo3 can only create 8-second long videos, while Genie 3’s worlds are a lot longer. Also, Veo3’s models are static, while with Genie 3, users can interact and play with them, much like in a video game.
And technology like Genie 3 could radically transform the video game industry. Right now, game developers need to spend years to build realistic game worlds. Once it is sufficiently advanced, Genie 3 would be able to create such worlds on-demand from simple prompts. These worlds can be of radically different styles, and even allow users to create their own.
It’s because of projects like these that Google might end up being the big winner in the AI space. While Google already has capable text, code and video generation models, its vast resources and decades-long focus on AI has enabled it to create such projects which have no equivalents in any other lab. Apart from Genie 3, Google has released a model that interprets sign language, and one that talks to Dolphins. It only takes one of these models to succeed, and Google could benefit disproportionately as a result. And Genie 3 — which could upturn the entire video gaming industry — could be a big contender for doing so.