Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now


BBC R&D: Derive Wikipedia topics from audio

As part of its remit to deliver the future of public service media the BBC’s estimable scientists can always be relied on to put theory into practice faster than nearly anybody. Collaboration is a key part of its success helping the researchers better understand the potential challenges and benefits of emerging and established technologies, and to drive open standardisation.

“This vital work enables R&D to advise the BBC on what is coming in the future; what it needs to be involved in and influence; what the likely winning and losing technologies are; and what the BBC needs to lead, follow or ignore,” explains Andy Bower, head of external relations, BBC R&D.

Take a close look at its exhibit over in the IBC Future Zone and you will see COMMA. This is an intriguing project undertaken with partners Kite and Somethin’ Else, funded by the UK’s Technology Strategy Board and aimed at metadata extraction.

“High quality metadata is required to allow audiences to search, discover and personalise content. We are developing techniques to address the challenges of creating this metadata, whether from today’s content or that taken from the vast wealth of content held in archives.”

Cloud Marketplace for Media Analysis (COMMA) is showing an automated metadata generation workflow and the ‘Kiwi’ topic extraction algorithm, which can derive Wikipedia topics from audio.

“We are aiming to show how we think that the production, distribution and delivery of content could change as networks and systems for production and distribution become increasing based on flexible IP networks, architectures and tools, and how this will bring massive benefits and opportunities for content producers and audiences alike,” he adds.

There are also demos through the RESTful API with an example interface to navigate within radio programmes using data generated through COMMA.

“COMMA is a completely generic metadata extraction platform that can run in practically any environment, scale fluidly across public and private cloud infrastructure, and run nearly any Linux-compatible algorithm code – usually without modification – against single files or petabyte-scale archives,” describes Bower. “The simple HTTP-based API allows trivial integration into existing ingest workflows and tool-chains, allowing metadata to be generated and delivered without any additional human intervention.”

Streaming of UHD content for interactive navigation on a tablet has only recently become possible, and the BBC is one of the first to show this working using HTML5 rather than via a native app at IBC2014.