Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now


The promise of voice control

Today consumers are increasingly comfortable using voice commands to control some devices. The next phase of development is bringing control to a multiplicity of devices

Almost 50 years ago, Stanley Kubrick showed us a vision of a unified voice controlled environment in his movie 2001:A Space Odyssey. The reality took a while to arrive, and at first provided a less than convincing practical experience, as anyone who has ever tried booking cinema tickets or getting through a customer response system by phone will attest.

The reality is that the ability for a machine to understand natural speech in real time takes a lot of computing power, and the first practical propositions used the cloud to deliver the necessary processing. Apple’s Siri led the way, and others are now following.

Today consumers are increasingly comfortable using voice commands to control some devices. The next phase of development will inevitably be in bringing control to a multiplicity of devices.

Think of the typical coffee table for a moment. On it you would be likely to find separate – and bulky – remote controls for the television, set-top box, DVD player, games console and OTT box. Although we probably do it intuitively, getting the right source on the screen and speakers generally involves a sequence of commands on different controllers. And that is before we consider what it is we might want to watch – and that’s only for our entertainment system. One of the benefits of the internet of things (IoT), we are told, is that we will be able to control remotely other devices around the home, like lights, heating and security. Does this mean more controllers, or at least a succession of apps on a phone?

The logical goal is a single controller which manages everything. Using traditional technology that would mean a very large box with an extremely large number of buttons. Which brings us back to voice control: the ideal way of controlling a broad swathe around the home is by telling the device what to do.

If this meant learning a specific set of instructions for each piece of functionality then it would be as ineffective as the big box of buttons in terms of user experience. We have to implement natural language instructions, letting the device implement the details.

So rather than turning on the television, then the set-top box, then remembering that BBC1 HD is channel 115, we should simply tell the remote control “I want to watch BBC1.”

More likely, though, we will be less specific. We might come home and say “I want to watch a Steven Spielberg movie.” The controller will do the heavy lifting of seeing what is available on movie channels, on your PVR and on Netflix, and will present the results on screen for you to tell the controller which you choose.

The next phase would be to say “play the movie”, which as well as starting the content would also close the curtains, turn off the main lights and turn on the standard lamp. At the end, you might say “good night”, which will be the cue for the system to shut down the various entertainment boxes, but also switch the thermostat to the night setting and turn off the lights, giving you time to move to the bedroom.

It is important to understand that this is a two-part system. The remote control itself has to remain small, compact and practical, even if the intention is that most users will do little other than talk to it. The remote talks to a device which interprets the instructions and routes them to the device or IoT service which will implement them.

We specialise in RF connectivity between the remote and the device. This is necessary, because the speech commands are digitised in the remote control and transferred to the host device which does the voice processing. Obviously the ergonomics of the remote control have to be designed around good acoustic performance: there is no need to overload the processing with sound that has poor frequency response or is over-reverberant.

The remote itself will probably sit on the coffee table most of the time, rather than being picked up by the user. The processing, therefore, has to be able to discriminate between voices, particularly as the living room television is generally a social experience. You may need to train the system on which voices to listen to.

Applying voice biometrics can provide the ability to offer different functionality to different users. Children, for example, can be presented with just the content that is suitable for them. A smart system could even be programmed to limit the hours children can watch television or play games.

This technology is available today. We have already delivered more than 250 million multi-function, voice controlled remotes. The next phase is down to broadcasters or service providers seeing this as a commercial edge to attract and retain subscribers.

By Menno Koopmans, senior VP subscription broadcasting, Universal Electronics