Recent research indicates that users would be happy with computer generated speech for audio description if it leads to more provision, according to Screen Systems. Where provision is being increasingly mandated, the economics may suggest that text to speech is a viable alternative to using voice talents, particularly for non-premium channels.
As part of its work with spoken subtitles, Screen Systems has extended the technology’s potential to providing what the company says is cost-effective audio description. It has developed an output driver for its Polistream subtitle and caption transmission system which connects internally in the same way as all other output encoders. This specialist Polistream module receives ‘subtitle texts’ and renders them using SAPI 5 to drive a text to speech engine to produce an audio snippet. If the duration of the rendered audio is too long, it will be re-rendered with a faster spoken rate, up to a maximum configured speed increase. The audio snippet is then presented for output when the ‘subtitle text’ goes ‘on-air’. If an audio snippet is queued for output but behind more than four seconds of existing audio data, then the previous audio snippet will be cut with a fade over 50ms so that the audio does not get progressively late in abnormal conditions.
Screen has also developed a module for its MediaMate offline processing framework. The spoken subtitles module allows ‘subtitle text’ files to be rendered to 48 khz stereo .wav files, with or without a control track, and behaves in a similar way to the Polistream implementation in terms of timing of audio and use of audio files. The generated audio file may then be attached to the media and played out alongside existing audio.