Voice

The voice feature allows the agent to convert its responses into audio using Text-to-Speech (TTS). This is especially useful for telephony integrations, voice assistants, and channels that support audio messages.

Overview

Voice configuration is in the Voice tab in the agent settings. You choose a voice from the catalog or create a cloned voice from an audio sample from your team.

The voice feature is only available on plans that include AI Audio. If your plan does not include this feature, the tab will display an upgrade option.

TTS providers

Timely.ai integrates with three high-quality speech synthesis providers:

Cartesia

Ultra-realistic voices with very low latency. Recommended for real-time service where naturalness and response speed are critical.

ElevenLabs

The reference for voice cloning quality and expressiveness. Ideal for agents that need a highly personalized and emotive voice.

Fish Audio

High-performance provider with good support for multiple languages, including Brazilian Portuguese with good prosody.

The voice catalog available on the platform includes options for both genders and different tonal and accent characteristics.

Configure a voice

Open the Voice tab

Access the agent settings and click the Voice tab.

Click Add voice

The voice configuration panel opens with three fields: voice selector, instructions, and speed.

Select the voice

In the selector, you will see two sections:

Catalog — pre-defined voices from providers, identified by name and gender (e.g., “Aria – Female”).
My Voices — cloned voices you have created (appear with the “cloned” badge).

Adjust the instructions (optional)

The instructions field lets you describe the tone you expect from the voice. Examples:

“Speak calmly and empathetically, like an experienced support attendant”
“Energetic and enthusiastic tone, like a sales presenter”
“Slow and articulate, to ensure the customer understands every detail”

Limit: 1,000 characters.

Set the speed

Use the slider to adjust speaking speed between 0.5x (slow) and 2.0x (fast). The default is 1.0x.

Generate a preview

Click Generate preview to hear a sample of the voice with the current settings before saving.

Save

Click Add voice. The voice becomes active for the agent immediately.

Voice cloning

You can create a custom voice from an audio recording — useful for maintaining your brand’s sonic identity or using the voice of a team member.

Voice cloning is available on the Enterprise plan and has a monthly clone limit defined by your plan. Check the limit in Settings > Billing.

Start cloning

In the voice selector of the configuration dialog, click + Clone my voice (last option in the list). The cloning dialog opens.

Upload the audio sample

Upload an audio recording of the voice to be cloned. Requirements for best results:

Minimum recommended duration: 30 seconds
Quiet environment, no background noise
Clear and natural voice, as in a normal conversation
Accepted formats: MP3, WAV, M4A

Name the clone

Give the cloned voice a name to identify it (e.g., “Maria’s Voice - Support”).

Wait for processing

Cloning is processed in a few seconds. When complete, the cloned voice appears in the My Voices section of the selector.

Manage voices

In the Voice tab table, each preset displays:

Field	Description
Name	Name of the selected voice
Instructions	Summary of tone instructions
Speed	Configured speed factor (e.g., 1.2x)

Through the actions menu (... icon) you can edit, preview, or remove the configured voice.

Common use cases

Phone service

Use a neutral female or male catalog voice with 1.0x speed. Instructions: “Professional and patient tone.”

Brand assistant

Clone the voice of a company spokesperson to maintain sonic identity consistency across all touchpoints.

Animated sales agent

Select a voice with a more expressive character and set 1.1x speed. Instructions: “Enthusiastic, but not aggressive.”

Educational content

0.9x speed with “articulate and didactic speech” instructions. Good combination with Gemini for multimodal content.

Limits and best practices

Each agent supports one active voice preset at a time. To switch, edit or remove the existing preset.
The preview uses a generic sample text — always listen with a representative excerpt of the agent’s actual content.
Cloned voices are tied to the workspace and can be used by multiple agents.
Cloning quality depends directly on the quality of the provided audio — recordings with heavy noise result in low-fidelity clones.

Agents

Squads

Tools

MCP Servers

Workflows

Inbox

Workers

Channels

Knowledge Base

Datagrids

CRM

Time AI

Analytics

Billing

Team & Permissions

Settings

Overview

TTS providers

Cartesia

ElevenLabs

Fish Audio

Configure a voice

Voice cloning

Manage voices

Common use cases

Phone service

Brand assistant

Animated sales agent

Educational content

Limits and best practices

Next steps

Transfer rules

Test the agent

Agents

Squads

Tools

MCP Servers

Workflows

Inbox

Workers

Channels

Knowledge Base

Datagrids

CRM

Time AI

Analytics

Billing

Team & Permissions

Settings

​Overview

​TTS providers

Cartesia

ElevenLabs

Fish Audio

​Configure a voice

​Voice cloning

​Manage voices

​Common use cases

Phone service

Brand assistant

Animated sales agent

Educational content

​Limits and best practices

​Next steps

Transfer rules

Test the agent

Overview

TTS providers

Configure a voice

Voice cloning

Manage voices

Common use cases

Limits and best practices

Next steps