Content localisation company CAMB.AI has unveiled the first text-to-speech (TTS) system designed not as a single model, but as a family of specialised architectures built for production.
According to the company, it has uncovered a critical flaw in the current market: no single TTS architecture wins across every use case.
“The market forces developers to choose between speed, quality, accuracy, and cost. We realised that was a false choice,” said Akshat Prakash, CTO at CAMB.AI. “A live voice assistant needs sub-150ms latency. A movie dubbing pipeline needs director-level emotional control. An automotive system has strict memory constraints. You cannot solve all three with one generic API.”
MARS8 moves away from the “black box” API model by offering four distinct architectures, each optimised for specific production constraints. They include MARS-Pro, which can be used for expressive dubbing and digital media, and MARS-Instruct, which gives users “director-level control for high-end film production”, allowing independent tuning of speaker and prosody.
The system allowscustomers to run the models on their own infrastructure, whether that is AWS Bedrock, Google Cloud Vertex AI, or specialised GPU platforms.
MARS8 is launching on 25+ compute platforms and on-device SDKs which, said the company, marks a historic first in the Voice AI space.
CAMB.AI’s MARS has been employed by companies such as Eurovision Sport and Comcast NBCUniversal.