French AI firm Mistral has released Voxtral TTS, an open-source text-to-speech model supporting nine languages. The small-sized model could revolutionise voice assistants and customer support systems.
VP Pierre Stock revealed that the model can adapt custom voices with just a five-second sample, maintaining natural inflections and accents across multiple languages. A 10-second clip takes only 1.6 seconds to render, making it perfect for real-time applications like dubbing or translation.
The tech aims to compete directly with giants like ElevenLabs and OpenAI while offering state-of-the-art performance at a fraction of the cost. Mistral claims their end-to-end platform will handle audio inputs alongside text and images, providing comprehensive solutions for enterprises.
With this release, Mistral hopes to entice businesses through open-source flexibility, allowing them to customise voice models as needed. The company's vision is a future where every enterprise can build its own voice agents without hefty licensing fees.
Reflecting on the tech industry, one wonders if Mistral’s innovative approach will indeed democratise speech generation or merely add another player to the crowded field of AI solutions.







