Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now


Microservices: Audio processing to enhance viewer experience

Emotion Systems' MC Patel looks at how audio standards can be raised and maintained through automated deliverables, raising consumer satisfaction, eliminating costly reworks and ensuring revenues are maintained

With the growth of online delivery platforms, content distribution has become a complex and time-consuming process. Currently, the focus is on picture quality as evidenced by the growth in 4K services and the plethora of HDR solutions to provide a better viewing experience. Additionally, there is also the requirement for subtitling, audio description and of course audio which can range from stereo, 5.1 surround and now increasingly immersive. 

Given that content needs to be quickly monetised across multiple platforms and localisations for different countries, there is a need to provide a large number of versions to satisfy the needs and specifications of the end user. 

A media supply chain provides MAM services that enables creating and posting jobs, billing and managing signal processing for video, audio and metadata. 

The good news is that a large number of these processes have become commoditised. Tasks like encoding and transcoding, or inserting audio description captions, are readily available from a number of vendors.

It is becoming the norm that all these services are parcelled up into single workflows in the cloud. This is appealing to production companies and broadcasters who want to speed up and simplify deliverables. Best practice is a single media work order which automatically triggers all the processes, passes everything through a final QC check, and despatches the files to the right recipients.

This is fine for all of those commoditised services. The challenge comes for functions such as HDR, immersive audio and loudness. To get the best viewing and listening experience it is important that these processes are error-free, and the quality has not been compromised. Frequently these needs are satisfied manually or not at all. Given the volume and the growth of diverse media delivery, manual operations are too expensive. 

Failure to get this right can be catastrophic. France, for example, has a zero tolerance to loudness errors. A single excursion and a programme will be rejected, which is a problem if you are up against a deadline.

Looking more generally at the question of audio processing, consider for a moment a premium movie channel. The newest titles come to it with a Dolby Atmos or DTS:X soundtrack. The automated cloud processes create 4K HDR Ultra HD masters with Dolby immersive sound, ready for broadcast. There are other processes in the automated workflow that will create HD video, and down-mix the audio to 5.1 and stereo.

The viewer is very happy with the quality of the movie. Then up comes a commercial break, where some of the ads have been produced or delivered in stereo, or maybe there is a trailer for a classic movie.

Suddenly the sound field collapses. The viewer is at best confused, and at worst will reach for the controls to try to find out what is wrong. The advertiser is also very displeased as the commercial loses all its impact. The sales house will quickly receive calls suggesting that the invoice will not be paid.

Having identified that a consistent sound field is vital for customer satisfaction, the broadcaster will then look to update its archives. The back catalogue needs reprocessing to up-mix the audio, so that everything has a soundtrack that will feed the immersive expectations.

For the broadcaster and the content deliverer, the recognition is growing that audio processing, including up-mixing and loudness control, is a specialist service, and it must be done well. These services tend to be supplied by small, dedicated businesses employing skilled and experienced people as well as dedicated software systems.

But for the operational and commercial reasons we have already talked about, these specialist services must be provided as part of the media supply chain, called by the single work order. The specialist supplier must have the same microservices architecture and a good API so the supply chain provider can integrate the services, progress and monitor the work, and bill the end user appropriately.

What may be surprising is that, in our experience, there may be 60 or 70 audio-only workflows within a typical media supply chain infrastructure. Emotion Systems has worked with leading audio specialists to find out what users really need.

The clear message is that managing audio for delivery is a complex and varied process. These variations need to be accurately described and processed without errors and preserve the highest quality. And these tasks must be carried out repeatedly hundreds and thousands of times a month.

What Emotion Systems is able to provide is the ability for audio specialists to design their own sub-workflows, then have them as shrink-wrapped software that will drop into the media supply chain, communicating with standardised work orders, and allowing as much or as little operator input as appropriate.

This is the practical way that audio standards can be raised and maintained through the automated deliverables process, raising consumer satisfaction, eliminating costly reworks and ensuring revenues are maintained.