Skip to content

TTS Models

Configure Text-to-Speech models to give your agents the ability to speak responses aloud.


Overview

TTS (Text-to-Speech) models convert the agent's text responses into spoken audio. When enabled, the agent can automatically play back responses as speech, or users can selectively listen to individual messages using the Message Read Aloud feature.

TTS is configured per-agent under Agent Settings > Audio and Speech Settings.

TTS Models are managed under Settings > AI Models > TTS Models.

TTS Models List


Enabling TTS on an Agent

Audio and Speech Settings

To enable text-to-speech for an agent:

  1. Navigate to Agent Settings > Audio and Speech Settings
  2. Enable the "Text to Speech" toggle
  3. Select a TTS model from the configured models in your workspace
  4. Click "Save"

Auto-Play with ASR

When TTS is enabled alongside ASR (Speech-to-Text), the agent supports automatic voice playback:

  1. User speaks a message via the Mic button (ASR)
  2. The agent processes the transcribed input
  3. The agent's response is automatically played back as speech (TTS)

This creates a seamless voice conversation experience without requiring the user to read the response.


Message Read Aloud

Agent Chat Interface with Read Aloud

When TTS is enabled, Agent Admins can also enable the "Message Read Aloud" feature:

  1. Navigate to Agent Settings > Audio and Speech Settings
  2. Enable "Text to Speech" (required)
  3. Enable "Message Read Aloud"

Once enabled, a Speaker button appears next to each of the agent's responses in the chat interface. Users can click this button to have any individual message read aloud on demand.

When to use

Message Read Aloud is useful for accessibility, hands-free workflows, or situations where users prefer listening over reading — such as long-form responses or mobile usage.


Supported Providers

TTS models are configured at the workspace level under Settings > AI Models > TTS Models. The available models depend on the providers configured for your workspace.

Provider Models / Notes
OpenAI OpenAI TTS
Azure OpenAI Azure-hosted OpenAI TTS
Azure AI Speech Azure AI Speech Services
Google Cloud Google Cloud Text-to-Speech
ElevenLabs High-quality neural voices with wide voice library
Sarvam Bulbul v3 — optimized for Indian languages
LiveKit Inference Access various TTS providers via a unified gateway using simple model ID strings. Requires a LiveKit Inference API credential under Settings > Credentials
OpenAI (Deprecated) Legacy OpenAI TTS
Azure OpenAI (Deprecated) Legacy Azure-hosted OpenAI TTS

Voice Selection

Different providers offer various voice options with different characteristics — tone, accent, speed, and naturalness. Choose a voice that matches your agent's personality and your audience's preferences. For Indian language support, consider Sarvam's Bulbul v3.