Mistral AI releases Voxtral TTS open-weight speech model

11 articles · Updated · KDnuggets · May 1

The 4-billion-parameter system supports nine languages, clones voices from three seconds of audio and runs on local hardware, with commercial access also available through Mistral's API.
Mistral says Voxtral TTS delivers 70ms model latency, about 100ms time-to-first-audio and 9.7x real-time generation, targeting conversational AI, customer support, translation and accessibility tools.
The model's weights are available under a CC BY-NC 4.0 non-commercial licence, while paid API use costs $0.016 per 1,000 characters and commercial self-hosting requires separate licensing.

Mistral is betting its future on an open model. Will this strategy truly disrupt the established AI voice market leaders?

Voxtral TTS offers a self-hosted alternative to pricey APIs. But what are the hidden operational costs for businesses?

Mistral's AI clones voices in seconds. Can new US and EU laws actually prevent a crisis of deepfake fraud?