TTS Models¶

Configure Text-to-Speech models to give your agents the ability to speak responses aloud.

Overview¶

TTS (Text-to-Speech) models convert the agent's text responses into spoken audio. When enabled, the agent can automatically play back responses as speech, or users can selectively listen to individual messages using the Message Read Aloud feature.

TTS is configured per-agent under Agent Settings > Audio and Speech Settings.

TTS Models are managed under Settings > AI Models > TTS Models.

TTS Models List

Enabling TTS on an Agent¶

Audio and Speech Settings

To enable text-to-speech for an agent:

Navigate to Agent Settings > Audio and Speech Settings
Enable the "Text to Speech" toggle
Select a TTS model from the configured models in your workspace
Click "Save"

Auto-Play with ASR¶

When TTS is enabled alongside ASR (Speech-to-Text), the agent supports automatic voice playback:

User speaks a message via the Mic button (ASR)
The agent processes the transcribed input
The agent's response is automatically played back as speech (TTS)

This creates a seamless voice conversation experience without requiring the user to read the response.

Message Read Aloud¶

Agent Chat Interface with Read Aloud

When TTS is enabled, Agent Admins can also enable the "Message Read Aloud" feature:

Navigate to Agent Settings > Audio and Speech Settings
Enable "Text to Speech" (required)
Enable "Message Read Aloud"

Once enabled, a Speaker button appears next to each of the agent's responses in the chat interface. Users can click this button to have any individual message read aloud on demand.

When to use

Message Read Aloud is useful for accessibility, hands-free workflows, or situations where users prefer listening over reading — such as long-form responses or mobile usage.

Supported Providers¶

TTS models are configured at the workspace level under Settings > AI Models > TTS Models. The available models depend on the providers configured for your workspace.

Provider	Models / Notes
OpenAI	OpenAI TTS
Azure OpenAI	Azure-hosted OpenAI TTS
Azure AI Speech	Azure AI Speech Services
Google Cloud	Google Cloud Text-to-Speech
ElevenLabs	High-quality neural voices with wide voice library
Sarvam	Bulbul v3 — optimized for Indian languages
LiveKit Inference	Access various TTS providers via a unified gateway using simple model ID strings. Requires a LiveKit Inference API credential under Settings > Credentials
OpenAI	(Deprecated) Legacy OpenAI TTS
Azure OpenAI	(Deprecated) Legacy Azure-hosted OpenAI TTS

Voice Selection

Different providers offer various voice options with different characteristics — tone, accent, speed, and naturalness. Choose a voice that matches your agent's personality and your audience's preferences. For Indian language support, consider Sarvam's Bulbul v3.

Back to AI Models
ASR Models — Pair with TTS for full voice conversations
Realtime Voice Models — For phone/SIP-based voice agents
Voice Guides — End-to-end voice workflow setup
Agent Builder — Advanced Configuration — Audio and Speech settings in the agent builder