Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×

Bird watching without binoculars

How the BBC's R&D team developed open source machine-learning framework network YOLO (You Only Look Once) for Autumnwatch.

One of the major problems with making natural history programmes is the fact that the ‘stars’ will not take direction from the production team. Try getting a badger to appear on cue, or a bird to feed its young when the programme goes live, and you will appreciate the difficulty. And yet it is these principal characters that the audience wants to see. And the viewers are interested – programmes such as the BBC’s Springwatch and Autumnwatch prove the point.

So-called camera traps have, of course, been around for some time. Basically, these are fitted with motion sensors that enable the camera to start shooting when action is detected. However, that action might be a tree moving in the background – and that could mean a memory card filled with brilliant shots of branches moving to and fro – but little sign of wildlife.

Beyond that, many hours are needed for checking through the footage from upwards of 30 Springwatch cameras to determine what could and could not be used. 

Using technology

To make the work of natural history production teams a little easier, the BBC’s Research and Development team has been working on a solution. And it involves Artificial Intelligence (AI) and Machine Learning (ML). Trials took place during the 2019 Springwatch and Autumnwatch productions, with the technology being used more fully for this year’s programming. 

“We became involved in this project when a natural history producer approached us some time ago,” explains Robert Dawes, senior research engineer, BBC R&D. “He asked us to look into AI technology to see how we could improve the performance of those sensors and use these resources more effectively.”

Dawes continues, “We did some initial Artificial Intelligence work using computer vision processing techniques. This involved building a rig that had a camera attached to a small Raspberry Pi computer. This computer monitored continuously the camera’s output and enabled us to determine when birds or animals appeared in shot. By using the computer vision techniques, we were able to filter out unwanted triggers such as moving trees. But because ML was not involved, it still did not tell us what kind of animal or bird was there.”

Obviously, more was needed and when the R&D team became involved with the Springwatch operation it provided the opportunity they had been looking for. By cabling the local remote wildlife cameras to the OB unit, power was immediately available. This opened up possibilities for using higher-powered computer technology to help with the monitoring purposes. 

“We wanted to create a system of tools that will keep an eye on the multiple cameras at the same time. This needed to be quite sophisticated and required a method to trigger recordings. This was not only helpful for the live programme production teams, but also for the digital output that allows viewers at home to watch cameras 24 hours a day.”

As it turned out, the solution that Dawes and his team devised not only worked for the BBC cameras, but for those operated by third party wildlife teams – such as the RSBP (Royal Society for the Protection of Birds) across the United Kingdom – to which the broadcaster has access. “Our answer removes the need for someone to be monitoring all these cameras on a continual basis.”

So, how does it work?

Dawes explains that an open source machine-learning framework network called YOLO (You Only Look Once) was employed. “This technology enables the system to recognise objects. For example, if it is used in an office, it can be taught to identify a chair, a monitor, a fridge or a person. In the natural history application, we can teach it to recognise different types of creatures and then put a box around that object. Once that box is in place, it is possible to track where that animal, bird or whatever wanders around the screen.”

To enable the system to ‘learn’ about the animals, multiple stills of the creatures in question are fed into the computer. These still can run into many thousands as it is important for the system to recognise the subject from many different angles. The computer uses the images to train a system known as a “neural network”, loosely modelled on the structure of the brain, to recognise what those kind of objects look like. This is a good example of Machine Learning. When the system ‘sees’ an object that it recognises as being an animal, it tracks that creature in real time on live video. The set-up means that it will be a creature that is being tracked rather than something else that is moving within the camera’s view. In other words, producers can advance from knowing ‘something has happened in the scene’ to ‘a creature has moved in the scene’.

In each case, it takes up to three days to import all this basic data to train the neural network. “One benefit is that all this can be carried out on a powerful domestic PC – it doesn’t require a million-dollar system,” emphasises Dawes. “It also means it is easy to change the images the system needs to recognise as and when the circumstances change. Clearly, if that process took six months, the system would not be viable.” 

Creating the data

The technology also generates data about the actions so that information can be logged for review by those who may wish to use the footage.

“We store the data related to the timing and content of the events, and this is used as the basis for a timeline provided to the production unit. The team can then use this timeline to scroll through the activity on a particular camera’s output. In addition, the system enables us to supply video clips of the event. One is a small preview to allow for easy reviewing of the content and a second recorded at original quality with a few seconds extra video either side of the activity. This can be downloaded into the editing system.”

Once the system was found to be robust, the BBC’s Natural History Unit staff were trained to use this technology.

He goes on, “This may be clever technology, but one important part of our work was to ensure that it fitted into the existing workflows of the production cycle. We built a simple web-based interface to allow access to the clips and data. This proved particularly helpful when so much of the production was working remotely.”

Of course, the Machine Learning technology has far wider applications than just natural history programming. As Dawes points out, most television output involves people and the system allows logging of specific activities. “We are also using these techniques to search through archive material – and that can certainly save many hours of tedious viewing to find a particular person or action.”

Ben Morrison, digital producer for the ‘Watches’ adds, “It was fantastic to work with the R&D team on Springwatch this year – their technology made aspects of our workflow a lot simpler, and even provided us with access to footage that we weren’t expecting to be able to use in that way. Their AI was able to record key moments of behaviour from our network of partner cameras across the country, which we could then weave into our live shows. We’re really looking forward to seeing how the technology develops moving forward!”