It was during a trial with Chelsea Football Club that Salsa co-founders Dr Rob Oldfield and Dr Ben Shirley began to appreciate the complexities involved in capturing rich, immersive sound for live football broadcasts. “The audience wants to hear the kicks and the whistles to feel part of the action, but it can be tricky to capture the on-pitch sound without letting crowd noise envelope the mix,” says Oldfield.
Traditionally this work is carried out by skilled sound engineers in an OB van at the match, by raising and lowering the faders on the mic, and only activating the mic near the ball, to capture sound with a minimum of crowd noise. “The problem with this,” Oldfield points out, “is that it’s hugely labour-intensive and requires attention for the duration of the match. And lots of gameplay sounds don’t get picked up at all.”
Another issue, he adds, is that in a surround sound mix, the crowd sounds go through the centre channel, along with the pitch sounds and audio commentary. “This can make the pitch sounds less immersive, flatter and results in a less than optimal mix,” he adds. Their Chelsea research was part of a larger, EU-funded next gen interactive TV project, based at Salford University between 2012 and 2015.
The duo then set to work on developing a technology that could separate out key audio events during the match and localise them so that audio content could be extracted, along with accompanying metadata. In the resulting product this metadata is extracted automatically and generates a triangulation that maps out each ball kick of a game.
“That’s an incredibly valuable source of metadata – visual tracking doesn’t provide you with all this information,” explains Oldfield. “For years, people haven’t used mics for data, when actually it’s one of the cheapest sources.” This feature, which is scalable because it uses audio sensors that work off what is already in place on site, can be deployed for auto highlight generation, triggering graphics or controlling cameras.
Dancing to the Same Beat Dr Rob Oldfield and Dr Ben Shirley developed the technology to provide a more engaging and immersive audio experience for viewers
The technology also uses machine learning to automate the mixing process with an algorithm that ‘listens’ to the audio and compares the incoming signal with a set of predefined acoustic-feature templates, such as the sound of a ball-kick or a whistle-blow.
The technology received its UK patent last October, with others, including a US one, pending. Right now, Oldfield is making the transition from developing algorithms to developing business models and it’s one that he appears to be embracing. This transition was aided last year by a £60,000 enterprise fellowship from the Royal Academy of Engineering to further research the project and transform it into a commercial product.
“For years, people haven’t used mics for data, when it’s one of the cheapest sources.”
The technology exists in both hardware and software forms, for which customers will pay a nominal, fee. However, Salsa hopes to make the bulk of its revenue by licensing the feed to the rights holder to broadcast Salsa’s audio content. While the company is initially targeting broadcasters and OB companies – the tech is currently in trial at a UK broadcaster – Oldfield is excited about IP-based use cases in the streaming market.“As the remote production of matches becomes standard this product will allow broadcasts to create a rich on-pitch mix automatically and feed that back to broadcast centre,” he says.
Oldfield adds that the product’s ability to assign metadata will also aid Next Generation Audio (NGA)/object-based sound formats (such as Atmos, MPEG-H, Auro) to create avenues for interactivity and personalisation.
“The technology is also capable of swapping out commentaries, changing the level of on-pitch sound, and providing a service for impaired users – all options are better suited to live streaming environment,” he adds.
This article originally appeared in the May 2018 issue of FEED magazine.