Voxtral TTS
Mistral text to speech in your hands.

Voxtral TTS transforms text into natural, expressive speech with voice cloning, emotion control, and broadcast-quality audio. The best Mistral TTS alternative to ElevenLabs and Kokoro TTS.

Try Playground Read Docs

32+

Languages

<200ms

Realtime Latency

Voice Clone

44.1kHz

Max Quality

Mistral Voxtral TTS Features — What It Can Do for You

Everything you need for production-grade Mistral text to speech, in one API. A powerful alternative to ElevenLabs, Kokoro TTS, and Ollama voice generation.

Zero-Shot Voice Cloning

Clone any voice from just 5 seconds of reference audio — like Mistral Voxtral TTS voice cloning. No training, no fine-tuning, instant replication.

Emotion Control

Choose from 7 emotions — happy, calm, sad, angry, fearful, disgusted, surprised. More expressive than Kokoro TTS or ElevenLabs.

32+ Languages — Multilingual TTS

Native support for English, Chinese, Japanese, Korean, Spanish, French, German, Arabic. Wider language coverage than Voxtral Mini.

Voxtral Realtime — Ultra-Low Latency

Under 200ms median latency with streaming support. Voxtral realtime voice synthesis for live agents and applications.

Natural Interjections

Add (laughs), (sighs), (coughs), and 20+ human sounds that render naturally — a feature missing in Ollama and other local TTS tools.

Broadcast Quality Audio

Studio-grade output up to 44.1kHz. Ranked #1 on Artificial Analysis and Hugging Face TTS Arena, outperforming Kokoro.

Fine-Grained Control

Adjust speed (0.5x–2x), pitch (-12 to +12), volume, custom pauses. Compatible with vLLM Omni and vLLM serving pipelines.

Production Ready — Mistral AI Powered

Enterprise-grade Mistral AI TTS API with high throughput. Deploy via Hugging Face, vLLM, or our managed cloud.

Mistral TTS Playground — Try Voxtral Text to Speech

Type or paste text, pick a voice or clone your own, and hear Mistral text to speech come to life. Free to use — no API key required.

100 / 100

Guest mode — 100 chars, 2 tries/day2 tries left today

Voice Source

Voice

EmotionPRO

Format

Voxtral TTS Pricing — Mistral Text to Speech Plans

Start free. Scale as you grow. Same quality as ElevenLabs with voice cloning and emotion control, rivaling Kokoro TTS and Ollama local models.

Current Plan

Free

$0forever

Try Voxtral TTS with no commitment. Perfect for personal projects and evaluation.

10,000 characters / month
5 preset voices
Standard quality
MP3 output
Community support
Voice cloning
Emotion control
API access

Active

Starter

$5/month

For hobbyists and small projects. Great value to get started with premium TTS.

30,000 characters / month
All 17+ preset voices
Turbo model quality
All audio formats
Commercial license
Email support
Voice cloning
Emotion control

Creator

$22/month

For content creators and indie developers who need more volume and HD quality.

100,000 characters / month
All 17+ preset voices
HD model quality
All audio formats
REST API access
Voice cloning (3 voices)
Emotion control
Priority support

Pro

$99/month

Full power for professionals. Maximum quota, voice cloning, and streaming API.

500,000 characters / month
All 17+ preset voices
HD model quality
Voice cloning (10 voices)
Emotion control
Streaming API
Webhook callbacks
Dedicated support

Need more than 500K characters/month or custom deployment?

Mistral Voxtral TTS API — Quick Start in Minutes

Three lines of code to generate your first speech with the Voxtral TTS API. Works with vLLM, vLLM Omni, or our hosted endpoint.

generate_speech.py

import requests, base64

response = requests.post(
    "https://voxtralttsai.com/api/tts",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "text": "Hello! Welcome to Voxtral TTS.",
        "voice": "casual_male",
        "emotion": "happy",
        "format": "mp3"
    }
)

audio = base64.b64decode(response.json()["audio"])
with open("output.mp3", "wb") as f:
    f.write(audio)

REST API

Simple HTTP endpoints with JSON payloads. Compatible with Mistral AI ecosystem.

Voxtral Realtime Streaming

WebSocket & SSE for real-time audio delivery. Under 200ms latency.

SDKs & vLLM Omni

Python, TypeScript, and cURL examples. Deploy with vLLM or Hugging Face.

Deploy Voxtral TTS with Hugging Face & vLLM

Self-host Mistral Voxtral TTS on your own infrastructure using Hugging Face open weights and vLLM Omni. The model runs on a single GPU with 16GB+ VRAM — no Ollama required. Alternatively, use our managed API for zero-setup deployment, or compare with Kokoro TTS and ElevenLabs on the playground above.

The next chapter of Mistral voice AI
is yours.

Start building with Voxtral TTS today. Free tier available with full Mistral text to speech capabilities — no credit card required. Outperforms Kokoro TTS, Ollama, and Gemini 3.1 Flash Live in voice quality benchmarks.

Start Building Talk to Sales

Voxtral TTSMistral text to speech in your hands.