Overview
Voice configuration is in the Voice tab in the agent settings. You choose a voice from the catalog or create a cloned voice from an audio sample from your team.
The voice feature is only available on plans that include AI Audio. If your plan does not include this feature, the tab will display an upgrade option.
TTS providers
Timely.ai integrates with three high-quality speech synthesis providers:Cartesia
Ultra-realistic voices with very low latency. Recommended for real-time service where naturalness and response speed are critical.
ElevenLabs
The reference for voice cloning quality and expressiveness. Ideal for agents that need a highly personalized and emotive voice.
Fish Audio
High-performance provider with good support for multiple languages, including Brazilian Portuguese with good prosody.
Configure a voice
Click Add voice
The voice configuration panel opens with three fields: voice selector, instructions, and speed.
Select the voice
In the selector, you will see two sections:
- Catalog — pre-defined voices from providers, identified by name and gender (e.g., “Aria – Female”).
- My Voices — cloned voices you have created (appear with the “cloned” badge).
Adjust the instructions (optional)
The instructions field lets you describe the tone you expect from the voice. Examples:
- “Speak calmly and empathetically, like an experienced support attendant”
- “Energetic and enthusiastic tone, like a sales presenter”
- “Slow and articulate, to ensure the customer understands every detail”
Set the speed
Use the slider to adjust speaking speed between 0.5x (slow) and 2.0x (fast). The default is 1.0x.
Generate a preview
Click Generate preview to hear a sample of the voice with the current settings before saving.
Voice cloning
You can create a custom voice from an audio recording — useful for maintaining your brand’s sonic identity or using the voice of a team member.Start cloning
In the voice selector of the configuration dialog, click + Clone my voice (last option in the list). The cloning dialog opens.
Upload the audio sample
Upload an audio recording of the voice to be cloned. Requirements for best results:
- Minimum recommended duration: 30 seconds
- Quiet environment, no background noise
- Clear and natural voice, as in a normal conversation
- Accepted formats: MP3, WAV, M4A
Manage voices
In the Voice tab table, each preset displays:| Field | Description |
|---|---|
| Name | Name of the selected voice |
| Instructions | Summary of tone instructions |
| Speed | Configured speed factor (e.g., 1.2x) |
... icon) you can edit, preview, or remove the configured voice.
Common use cases
Phone service
Use a neutral female or male catalog voice with 1.0x speed. Instructions: “Professional and patient tone.”
Brand assistant
Clone the voice of a company spokesperson to maintain sonic identity consistency across all touchpoints.
Animated sales agent
Select a voice with a more expressive character and set 1.1x speed. Instructions: “Enthusiastic, but not aggressive.”
Educational content
0.9x speed with “articulate and didactic speech” instructions. Good combination with Gemini for multimodal content.
Limits and best practices
- Each agent supports one active voice preset at a time. To switch, edit or remove the existing preset.
- The preview uses a generic sample text — always listen with a representative excerpt of the agent’s actual content.
- Cloned voices are tied to the workspace and can be used by multiple agents.
- Cloning quality depends directly on the quality of the provided audio — recordings with heavy noise result in low-fidelity clones.
Next steps
Transfer rules
Configure when and to whom the agent should transfer the conversation.
Test the agent
Validate voice, tools, and behavior in the Playground before publishing.