Voxtral TTS
Mistral text to speech in your hands.
Voxtral TTS transforms text into natural, expressive speech with voice cloning, emotion control, and broadcast-quality audio. The best Mistral TTS alternative to ElevenLabs and Kokoro TTS.
Mistral Voxtral TTS Features — What It Can Do for You
Everything you need for production-grade Mistral text to speech, in one API. A powerful alternative to ElevenLabs, Kokoro TTS, and Ollama voice generation.
Zero-Shot Voice Cloning
Clone any voice from just 5 seconds of reference audio — like Mistral Voxtral TTS voice cloning. No training, no fine-tuning, instant replication.
Emotion Control
Choose from 7 emotions — happy, calm, sad, angry, fearful, disgusted, surprised. More expressive than Kokoro TTS or ElevenLabs.
32+ Languages — Multilingual TTS
Native support for English, Chinese, Japanese, Korean, Spanish, French, German, Arabic. Wider language coverage than Voxtral Mini.
Voxtral Realtime — Ultra-Low Latency
Under 200ms median latency with streaming support. Voxtral realtime voice synthesis for live agents and applications.
Natural Interjections
Add (laughs), (sighs), (coughs), and 20+ human sounds that render naturally — a feature missing in Ollama and other local TTS tools.
Broadcast Quality Audio
Studio-grade output up to 44.1kHz. Ranked #1 on Artificial Analysis and Hugging Face TTS Arena, outperforming Kokoro.
Fine-Grained Control
Adjust speed (0.5x–2x), pitch (-12 to +12), volume, custom pauses. Compatible with vLLM Omni and vLLM serving pipelines.
Production Ready — Mistral AI Powered
Enterprise-grade Mistral AI TTS API with high throughput. Deploy via Hugging Face, vLLM, or our managed cloud.
Mistral TTS Playground — Try Voxtral Text to Speech
Type or paste text, pick a voice or clone your own, and hear Mistral text to speech come to life. Free to use — no API key required.
Voxtral TTS Pricing — Mistral Text to Speech Plans
Start free. Scale as you grow. More affordable than ElevenLabs with better quality than Kokoro TTS.
Free
Get started with text-to-speech for personal projects.
- 10,000 characters/month
- 5 preset voices
- 3 languages
- MP3 output
- Community support
Pro
For developers and creators who need more power.
- 500,000 characters/month
- All preset voices
- 32+ languages
- Voice cloning (5 voices)
- Emotion control
- Streaming API
- All audio formats
- Priority support
Enterprise
Unlimited scale with dedicated infrastructure.
- Unlimited characters
- Unlimited voice clones
- Custom fine-tuning
- On-premise deployment
- SLA guarantee
- Dedicated account manager
- SSO & audit logs
- Custom integrations
Mistral Voxtral TTS API — Quick Start in Minutes
Three lines of code to generate your first speech with the Voxtral TTS API. Works with vLLM, vLLM Omni, or our hosted endpoint.
import requests, base64
response = requests.post(
"https://voxtralttsai.com/api/tts",
headers={"Authorization": "Bearer YOUR_API_KEY"},
json={
"text": "Hello! Welcome to Voxtral TTS.",
"voice": "casual_male",
"emotion": "happy",
"format": "mp3"
}
)
audio = base64.b64decode(response.json()["audio"])
with open("output.mp3", "wb") as f:
f.write(audio)REST API
Simple HTTP endpoints with JSON payloads. Compatible with Mistral AI ecosystem.
Voxtral Realtime Streaming
WebSocket & SSE for real-time audio delivery. Under 200ms latency.
SDKs & vLLM Omni
Python, TypeScript, and cURL examples. Deploy with vLLM or Hugging Face.
Deploy Voxtral TTS with Hugging Face & vLLM
Self-host Mistral Voxtral TTS on your own infrastructure using Hugging Face open weights and vLLM Omni. The model runs on a single GPU with 16GB+ VRAM — no Ollama required. Alternatively, use our managed API for zero-setup deployment, or compare with Kokoro TTS and ElevenLabs on the playground above.
The next chapter of Mistral voice AI
is yours.
Start building with Voxtral TTS today. Free tier available with full Mistral text to speech capabilities — no credit card required. Outperforms Kokoro TTS, Ollama, and Gemini 3.1 Flash Live in voice quality benchmarks.