Companies including Google and Adobe want you to see what generative AI can do next, even if you can’t access it.
We’ve seen generative AI tackle writing, imagery and video — now it’s coming for gaming and music.
Google’s DeepMind subsidiary teased yet another AI model, which a DeepMind research paper published Feb. 23 called „the first generative interactive environment trained in an unsupervised manner from unlabeled Internet videos.”
In other words, the model — called Genie — can create playable, virtual worlds.
Users can input text or images to generate „an endless variety of action-controllable 2D worlds.”
In a Feb. 26 tweet from DeepMind’s Tim Rocktäschel, examples include playable worlds made to look as if built from clay; rendered in the style of a sketch; and set in a futuristic city. Rocktäschel is DeepMind’s open-endedness team lead (open-ended algorithms are those that seek to solve increasingly complex tasks).
According to a DeepMind spokesperson, when a user selects an action in this action-controllable world model, Genie generates the next frame.
„It does not have any way to know which part of the image corresponds to the character,” she said. „Instead, Genie figures this out by itself during training time.”
The technology isn’t limited to 2D environments. Genie could, for example, generate simulations to be used for training „embodied agents such as robots,” the spokesperson added.
Unfortunately, you probably won’t be able to test it out yourself. The spokesperson called Genie „early-stage research” and said it isn’t designed to be a public product.
Meanwhile, both voice technology startup ElevenLabs and Adobe have teased generative audio tools, albeit for different sounds.
In the first case, ElevenLabs is focused more on sound effects. In a blog post last week, the startup said it can use prompts like „waves crashing,” „metal clanging,” „birds chirping” and „racing car engine” to create audio. However, it didn’t share a release date for the tool.
And then there’s Adobe’s Project Music GenAI Control, which lets creators generate music from text prompts like „powerful rock,” „happy dance” or „sad jazz” and then edit the audio. Options include adjusting the tempo, structure and repeating patterns; increasing or decreasing intensity; extending a clip; remixing a section; or generating a repeatable loop.
Nicholas Bryan, senior research scientist at Adobe Research, likened the technology to Photoshop for sound.
„Instead of manually cutting existing music to make intros, outros and background audio, Project Music GenAI Control could help users to create exactly the pieces they need,” Adobe added in a Wednesday blog post.
Adobe didn’t specify a release date for the tool.