We all learned something during lockdown, whether that was how to make banana bread, that the upstairs neighbours are into Riverdance or that there is actually a park nearby. AI-driven audio specialists Salsa Sound learned that people love to hear the sound of sport. Fixtures going behind closed doors opened up the possibilities for sound to play a bigger part than ever, transporting fans into the arena through the power of audio.
“People love to hear what’s going on, on the pitch,” says Rob Oldfield, co-founder and CEO of Salsa Sound. “Hearing all of those kicks, all of the whistles, all the racket strikes, all the punches, whatever it is, whatever the sporting context, people like to hear the sound of that sport. That’s something that I’m hoping stays forever. We want to get the full narrative. I always think it is about storytelling, and all of those impact sounds in sport, people shouting or racket strikes or footsteps, that’s part of your script of the story.”
To that end, Salsa Sound developed vCROWD in 2020 – a dynamic audio tool for creating crowd atmosphere when the crowd were actually at home, desperately trying to drown out the sound of Irish dancing coming through the ceiling. “With any story, any good movie, you’ve got the diegetic sounds, you’ve got the speech and the sound effects, but you’ve also got this incredible backing track that provides the emotion, and during Covid that’s the thing that got removed,” recalls Oldfield. “The emotion provided by the crowd just got completely deleted.” vCROWD was the company’s way of putting that emotional soundtrack back in.
The other great lesson from Covid was the potential for personalisation within sound for sport broadcasting. Being able to choose your audio feed, right down to recreating the sound from a specific part of the stand, has obvious tactical advantages for both accessibility purposes and all-round consumer experience. “I think that being able to select and choose how you want your mix to be, very much facilitated now over OTT platforms, is becoming a thing of the present rather than how we’ve always spoken about it as something that’s coming in the future,” comments Oldfield. “I think we’re seeing the first fruits of it now and I think in a couple of years it’s going to be standard practice to be able to chop and change what you want.”
Another not-so-futuristic feature of audio in sport broadcasting is AI, already a key component of Salsa Sound’s MIXaiR – a live mixing platform revamped at the end of 2021 to remove manual processes from the equation. AI is used to detect and enhance on-pitch sounds and create mixes across any format, all from a standard microphone setup. But for Oldfield the implications of AI reach beyond automating and streamlining current mixing techniques. “Microphones are more about data gatherers than they are about sound recorders,” he notes. “It’s a slightly different way of thinking about it but if you’ve got 20 microphones around an event, actually you’ve got 20 points of data capturing.”
From crowd excitement levels to speech-to-text metadata from commentary tracks, all that data can be mined through AI and used for automatic graphics triggering, metadata tagging and highlight generation. “I think it’s time for audio to step up a little bit in terms of providing bits of data into other parts of the broadcast chain that will help speed up and augment things,” Oldfield remarks. “You can use the audio to figure out when and where the exciting points were on the pitch, so we do a triangulation of all the sounds that we detect so we know where they are on the field… That actually gives you a very clear timeline of where the exciting events were and when they happened. So you can use that for example to put together an automatic highlights package.”
Along with personalisation and AI, the third point in the triangle pointing the way forward for sport audio is 5G. In 2020 Salsa Sound became the lead audio partner on the DCMS-funded 5G Edge-XR project, designed to explore 5G-enabled immersive experiences for sporting events. “The idea is to basically democratise our experiences so that you can watch AR content on a standard phone,” Oldfield explains. “Rather than needing to have a massive server in your bedroom, the server in your bedroom is actually on the edge network, and your device communicates to the network via 5G so that no matter what device you’ve got, you’ve got access to great AR content.”
By putting its AI in the cloud, the company can capture audio from multiple locations around a football pitch so that when a user selects a 360-degree camera angle, the audio rotates with their phone. “It actually feels like you’re sat at that point in the stadium with all the pitch sounds in the correct place as well,” describes Oldfield. “It means that you’re getting these incredible experiences that require a lot of heavy computation, but you don’t have to do the heavy computation on your device at home. You just let the cloud do that for you. We’re sort of at the start of the 5G revolution, but it’s exciting to see what could come next.”