Google researchers have presented an AI tool that can generate text-based videos with a resolution of 1280×768 pixels and at 24 frames per second. The tool, Imagen Video, is currently in a research phase.
Out the research paper, which describes how the Imagen Video tool works, shows that the AI tool can generate both video and text animations. This can be done in various artistic styles. The researchers share images showing that the tool understands concepts such as studio lighting, origami, pixel art and watercolor and that the tool can convert these concepts into moving images. According to the researchers, the tool also understands how a three-dimensional object is constructed, and then takes this into account when generating 3D objects.
According to the researchers, this text-to-video AI tool could be used to boost human creativity. The researchers state that the tool did receive filters to prevent possible abuse. Imagen Video works on a pre-trained language model that was ‘frozen’, much like the Imagen tool that Google researchers proposed earlier this year. Imagen Video is therefore partly based on Imagen. That AI tool was able to create realistic images based on text input. It’s not clear if or when the researchers will make Imagen Video available to a wider audience.
At the end of September, Meta released an AI tool that can generate videos based on text. The videos that this AI tool makes have a resolution of 768×768 pixels for now. The OpenAI researchers decided at the end of September to open Dall-E to the general public. Like Google’s Imagen, this AI tool can convert text into images.
Screenshots Google Imagen Video