That was before the pandemic most of us have already forgotten, along with our resolutions to cut down on air travel, adopt sustainable consumption patterns, and generally make the world a more decent place. Be that as it may, the necessity to produce content under exceptional circumstances encouraged broadcasters to be creative with the tools at their disposal.
Some were surprised: the difference wasn’t as striking as operators and their line management had feared. Some “consumer” audio tools come reasonably close to a professionally calibrated signal chain. It would nevertheless be a stretch to argue that pristine audio—or picture—quality is no longer necessary at source when more and more productions are streamed or consumed via heavily compressed OTT or social media channels.
Most of us know that things are not that easy—a high-quality production will likely look and sound better after heavy compression than a project created with more budget-friendly production tools, or a smartphone.
Speaking of which, today’s high-end smartphones are capable of recording good-quality audio and video that is deemed acceptable in situations where covering breaking news is more important than waiting for the OB van to arrive. The same is true of Skype, Teams, Zoom, or vMix for interviews, which have become generally accepted formats.
One important milestone for the evolution of the audio landscape over the last few years has been the possibility to listen to immersive audio mixes on your smartphone. Festivals like Tomorrowland allow their fans to experience the event via live footage streamed from Belgium or elsewhere with a glorious Dolby Atmos audio rendition. This and similar initiatives have arguably contributed to establish the Dolby Atmos audio format (usually in a 5.1.4 configuration) as the de-facto immersive audio standard.
Something similar applies to soundbars that allow listeners to enjoy an enveloping audio experience in a compact, yet coherent way. Size matters after all: 5.1 Surround probably failed to go mainstream, because most people felt that six bulky speakers were just a little too much.
More developments
Another trend we need to touch on is the ability for sound supervisors to work in a distributed production setup, sharing some, or all, signals that are available on the network, but processing them for different destinations. In broadcast, this is increasingly common, and live immersive mixing from a fixed location that is hundreds of miles away from the venue is gaining serious traction.
Top-tier global event coverage keeps pushing the envelope with ever more cameras and assorted microphones whose signals need to be audible when the director cuts to a given camera. Audio-follows-Video makes this easy, even though the number of channels can be daunting.
Live productions providing immersive audio easily require more than 100 audio channels, which includes the field-of-play microphones controlled by the automated KICK tracking software for crisp ball, whistle and other on-pitch noises, ambience microphones, plus more mics in the mixed zone, around the pre-and post-match desks alongside the field of play, at press conferences, etc.
Mixing all these signals in such an expert way is something for which sound supervisors deserve a lot of praise, especially when you consider that they may be expected to provide different formats of their mixes simultaneously (stereo, 5.1, Dolby Atmos, etc.). A lot can be automated here, but some checking by the sound supervisor is always required, and that may become a distraction.
The ability to run audio processing on the same generic servers as video processing apps will prove a major plus for deliverables involving a variety of transport formats (ST2110, NDI, SRT, more to be announced at IBC2024). This audio app with the same sound, look and feel as a dedicated hardware device allows users to instantiate audio processing at short notice just about anywhere, complete with a host of options only the server-based incarnation can provide.
Finally, if we assume that most of today’s large-scale live production setups are based on an IP infrastructure with almost endless routing possibilities, consider this: every video feed and every audio signal is a separate essence. That is usually a good thing. Flexibility regarding how audio is packaged with video footage (which is called ‘embedding’ or ‘audio shuffling’) is more important than ever. Allowing users to perform this shuffling as close to the sources and destinations as possible (‘edge computing’) keeps things clear in an increasingly multi-faceted live production environment.