Adobe and Speechmatics have extended their collaboration with the launch of a new on-device speech-to-text (STT) model in Adobe Premiere. Providing studios, production companies and more with secure, local transcription, the model delivers near cloud levels of accuracy, aiming to optimise workflows, enable faster content creation and create a pathway for agentic AI solutions.
Trained on millions of hours of speech, the Speechmatics model delivers high accuracy for transcription of accented speech, non-native speakers and deployments in noisy environments such as field reporting or film sets. Users are able to securely and accurately edit video and audio with text, create captions and label speakers from anywhere, with no dependency on connection.
Key features of the new model include:
- Processes 1 hour of audio in about 55 seconds
- Leads the way against the closest competitor, with a 12-16 per cent improvement against Whisper-powered creative solutions
- Runs on Windows and Mac, making use of the latest AI acceleration techniques to ensure efficient processing across a range of hardware, including broad hardware support for the latest Mac M5, NVIDIA RTX, AMD GPUs and older hardware such as Intel Macs
Katy Wigdahl, CEO, Speechmatics, commented, “Adobe’s global creator community speaks hundreds of languages and dialects. Since 2021, our partnership has focused on making sure speech technology works for everyone – whether you’re editing in Scottish English, Mexican Spanish, or Cantonese. Today, millions of users can benefit from accurate transcription that works anywhere – on-device for privacy, and in the cloud for scale – without compromising performance. As Adobe builds toward LLM-powered creative workflows, having a speech foundation that truly understands diverse voices becomes even more critical. We’re proud to be part of that future.”