From the minute we wake up in the morning, our daily lives are dominated by voices. Just think about it: we put the radio on while we’re getting ready for work, we’re told of updates and delays to services at the train station via an overhead speaker, and the lift up to the office informs us of which floors it will be stopping at. And that’s just your average morning.
Throughout the rest of the day, the average person is likely to chat to work colleagues, make numerous telephone calls — perhaps having to interact with an automated voice service in the process – and use the voice assistant on their smartphone or wireless speaker at home to catch up on the latest news or set their alarm for the next morning.
Of course, many of these voices will be real, but an increasing percentage of those we hear on a daily basis are synthetic, pre-recorded or activated through the use of artificial intelligence (AI). While synthetic voices have been around for a long time — they’ve been used for years on television, public transport and on customer service telephone lines — voice-activated AI has taken longer to find its way into the mainstream. Initially, it was a technology that was quickly dismissed by the public as inconvenient and frustrating, but thanks to the rapid advancement of AI we can now interact and hold conversations with AI-activated voices in a much more natural, human way.
Importantly, there is a difference between synthetic voices and voices activated through AI: the former is pre-meditated and more linear — a train telling you which station is the next stop, for example — while the latter responds accordingly to what humans request or ask for — Amazon’s Alexa and Apple’s Siri are two good examples.
Despite these differences, however, both synthetic and AI-activated voices are beginning to reach a level of popularisation where they are able to deliver remarkable benefits to users. Not only do they educate and inform us through voice assistant apps and public speaker systems, but they have also proved themselves as vital safety and accessibility tools — countless visually impaired people have had their quality of life improved immeasurably through both synthetic and AI-activated voices.
As these voices reach an unprecedented level of mainstream popularity and acceptance, many are left wondering what the future of voice technology might look (or should that be sound?) like.
Just like any other new, emerging technology, the success of synthetic and AI-activated voices ultimately depends on whether they are fully adopted by users. For this to happen, there are two responsibilities which must be carried out. Firstly, the companies, organisations and innovators creating these voices must do so with the intention of using them to create a better, more advanced world. Secondly, there must be a willingness on behalf of the public to trust and try the technology out, because technology always relies upon public acceptance. New technology doesn’t become standard just because people built it.
On top of this, it must be implemented in a way that makes our lives easier. Once again, Amazon’s Alexa voice assistant is a good example of this. Just as if they are talking to a friend, users can ask to play their favourite playlist or get an update on the weather while being able to carry out other tasks at the same time. The future potential of Alexa has not been forgotten about by Amazon, who have recently introduced an ‘Alexa Skills Kit’. This allows designers, developers and brands to ‘teach’ the voice assistant new skills, dramatically expanding what the technology can achieve.
Audio description is another area where there is significant potential for voice technology and AI to combine in unique and beneficial ways. Those who suffer from impaired hearing could rely on more detailed and dynamic subtitling that uses AI to study the on-screen action of their favourite shows, while those who are visually impaired could rely on AI-activated voice descriptions to add colour and an extra dimension to their viewing experience. Perhaps technology could identify important moments within a scene and relay this onto the viewer, or describe a particular setting that’s on screen in more detail. Voices and AI are set to continue improving accessibility features across various home entertainment devices until those that exist today are seen as old-fashioned.
If AI-activated voice technology is indeed embraced by the public then it has the potential to offer remarkable benefits that can improve and enrich our daily experiences, but only if it is delivered within the right context. People will not simply interact with voice technology because it is there – it must have a well-defined purpose and offer innovative solutions to age-old problems.
By David Ciccarelli, CEO, Voices.com