Disney Research and the University of California, Irvine have developed a new Artificial Intelligence-enhanced video compression model, which they say shows deep learning can compete against established video compression technology.
The compressor, which is still in an early phase of development, yields less distortion and smaller bits-per-pixel rates than classical coding-decoding algorithms such as H.265 when trained on specialised video content. The research team added that it achieves comparable results on downscaled, publicly available YouTube videos.
The research team began by downscaling the dimensions of the video using what they describe as a variational autoencoder, a neural network that processes each video frame in a sequence of actions that results in a condensed array of numbers. The autoencoder then tries to undo this operation to ensure that the array contains enough information to restore the video frame.
The algorithm attempts to guess the next compressed version of an image given what has gone before, relying on an AI-based technique called a “deep generative model.”
The algorithm conducts an operation to encode frame content by rounding the autoencoder’s real-valued array to integers. According to the research team, these are easier to store than real numbers, given their many decimal places. The final step is to apply lossless compression to the array, allowing for its exact restoration. “Crucially, this algorithm is informed by the neural network about which video frame to expect next, making the lossless compression aspect extremely efficient,” said the researchers.
According to Stephan Mandt, UCI assistant professor of computer science, who began the project while employed at Disney Research, these steps, as a whole, make this approach an “end-to-end” video compression algorithm: “The real contribution here was to combine this neural network-based deep generative video prediction model with everything else that belongs to compression algorithms, such as rounding and model-based lossless compression,” he said.
Mandt added that the research team will continue to work toward a real, applicable version of the video compressor. One challenge is that they might need to compress the neural network itself, along with the video.
“Because the receiver requires a trained neural network for reconstructing the video, you might also have to think about how you transmit it along with the data,” said Mandt. “There are lots of open questions still. It’s a very early stage.”