OpenAI introduces new tool to create video from text

OpenAI, the company behind ChatGPT, has introduced a new tool that uses generative AI to create videos from text.

According to OpenAI, Sora generates a video by starting with one that looks like static noise and gradually transforms it by removing the noise over many steps.

The tool is capable of generating entire videos all at once or extending generated videos to make them longer, said the company. By giving the model foresight of many frames at a time, it claims to have solved the issue of making sure a subject stays the same even when it goes out of view temporarily.

So far Sora is only available to a few researchers and video creators, however the company has showcased its capabilities on X, formerly known as Twitter.

Introducing Sora, our text-to-video model.

Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W

Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf

— OpenAI (@OpenAI) February 15, 2024

According to a blog post from the company, Sora takes inspiration from large language models which acquire generalist capabilities by training on internet-scale data.

The success of the LLM paradigm is enabled in part by the use of tokens that unify diverse modalities of text—code, math and various natural languages, said the post.

Screenshot from video created by OpenAi using the prompt: The camera rotates around a large stack of vintage televisions all showing different programmes — 1950s sci-fi movies, horror movies, news, static, a 1970s sitcom, etc, set inside a large New York museum gallery

Instead of using text tokens, Sora has visual patches, said OpenAI. “We find that patches are a highly-scalable and effective representation for training generative models on diverse types of videos and images,” it added.

The technology turns videos into patches by first compressing them into a lower-dimensional latent space, and subsequently decomposing the representation into spacetime patches, said the post.

OpenAI has “trained” a network that reduces the dimensionality of visual data. It takes raw video as input and outputs a “latent representation that is compressed both temporally and spatially”.

Sora is trained on and subsequently generates videos within this compressed latent space.

Your browser is out-of-date!

Related

What sets VVC apart from other video compression standards?

Nokia sues Amazon in row over video compression patents

BNS, TAG Video combine expertise on video signal probing, monitoring and visualisation

Matrox Video introduces new series of single-channel 4K60 AVC/HEVC encoders

Amazon Prime Video to kick off Premier League matches in HDR

Fast-forward: GridSearch launches innovative video search tool

Report: AV1 live and VoD encoding presents opportunity for innovation in 2024

TAG and Zixi partner on joint solution to enhance livestream multi-channel monitoring