Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×

OpenAI introduces new tool to create video from text

The tool is capable of generating entire videos all at once or extending generated videos to make them longer, said the company

OpenAI, the company behind ChatGPT, has introduced a new tool that uses generative AI to create videos from text.

According to OpenAI, Sora generates a video by starting with one that looks like static noise and gradually transforms it by removing the noise over many steps.

The tool is capable of generating entire videos all at once or extending generated videos to make them longer, said the company. By giving the model foresight of many frames at a time, it claims to have solved the issue of making sure a subject stays the same even when it goes out of view temporarily.

So far Sora is only available to a few researchers and video creators, however the company has showcased its capabilities on X, formerly known as Twitter.

According to a blog post from the company, Sora takes inspiration from large language models which acquire generalist capabilities by training on internet-scale data.

The success of the LLM paradigm is enabled in part by the use of tokens that unify diverse modalities of text—code, math and various natural languages, said the post.

Screenshot from video created by OpenAi using the prompt: The camera rotates around a large stack of vintage televisions all showing different programmes — 1950s sci-fi movies, horror movies, news, static, a 1970s sitcom, etc, set inside a large New York museum gallery

Instead of using text tokens, Sora has visual patches, said OpenAI. “We find that patches are a highly-scalable and effective representation for training generative models on diverse types of videos and images,” it added.

The technology turns videos into patches by first compressing them into a lower-dimensional latent space, and subsequently decomposing the representation into spacetime patches, said the post.

OpenAI has “trained” a network that reduces the dimensionality of visual data. It takes raw video as input and outputs a “latent representation that is compressed both temporally and spatially”.

Sora is trained on and subsequently generates videos within this compressed latent space.