Voice Studio
Advanced voice synthesis, clone your personal voice and premium HD voices for paid plans.
What you can do
Voice Studio transforms texts into audio with the voice you choose:
- Text → voice — read texts aloud
- Personal cloned voice — record YOUR voice and clone it
- Premium voices — ultra-realistic and natural voice
- Preview and controls — try instantly and adjust every detail
How to record and clone YOUR voice
This is Voice Studio's distinctive feature: you can record an audio sample and use it to generate texts with YOUR voice.
Step 1: Recording
- Click Record personal voice
- Grant the browser permission to use the microphone
- Speak in a natural and clear tone, for 10–30 seconds
- Live equalizer → See audio levels in real-time (dynamic graph). Avoid red peaks (distortion)
- Click End recording when done
Step 2: Save the sample
Once recorded:
- Listen to the preview (▶ button)
- Click Save sample — it's saved to your profile
You can record multiple samples (different voices, tones, languages). Each will be usable in the future.
Step 3: Use the sample
- Choose your voice sample from the list
- Write the text you want to hear with YOUR voice
- Press Generate
- The audio is created in a few seconds, with your cloned voice
Preset voices and premium voices
If you don't want to clone your voice, you can use a preset voice or a premium voice:
Preset voices (free)
Models like Chatterbox TTS and Qwen3-TTS offer pre-recorded voices:
- Choose from the list (e.g. "Serena", "Luca", "Maria")
- Write the text
- Generate instantly
Premium voices (with a paid plan)
If you have the Pro or Enterprise plan:
- IndexTTS2 — Premium HD — ultra-natural and emotional voice, generated locally
- ElevenLabs v3 — studio-quality voice, 32 presets + custom voice cloning
- MiniMax Speech 2.8 HD — with emotions (happy, sad, angry, etc.)
Premium voices cost more in credits but the quality is cinema-grade: almost indistinguishable from a real human voice.
Generation parameters
Depending on the voice chosen, you can adjust:
- Text: what the AI will say
- Voice/Sample: which voice to use
- Speed: from 0.5× (very slow) to 2.0× (very fast). Default 1.0×
- Language (some models): auto-detect or manual choice (IT, EN, ES, FR, etc.)
- Emotion (premium models): happy, sad, angry, neutral, etc.
Preview and download
- Generate → the audio appears in Gallery on the right
- Click an audio to listen
- Press Download to save to your computer
- Press Use in to send it to another studio (Cinema, Avatar, Video)
Voice synthesis models
| Model | Cost | Voice quality | Languages | When to use |
|---|---|---|---|---|
| Chatterbox TTS | Free | Good | IT, EN, ES, FR, DE, etc. | Fast and simple TTS |
| Chatterbox HD | Free | Good HD | IT, EN, ES, FR | Local TTS, more natural |
| Qwen3-TTS | Premium | Excellent | 11 languages | Multilingual, realistic voice |
| IndexTTS2 HD | Premium | Studio-quality | IT, EN, ES, FR | Ultra-natural, emotional voice |
| ElevenLabs v3 | Premium | Cinematic | 32 languages | Professional voices, premium voice clone |
Recording: how to get the best results
Before recording, find a quiet place:
- Close doors and windows
- Turn off fans, AC, background noise
- Use a dedicated microphone (headset mic, lavalier, USB condenser) if possible
- If using laptop/phone mic, position close (about 15 cm)
During recording:
- Speak in a natural and clear way, not robotic
- Keep a consistent tone (don't shout, don't whisper)
- Read 2–3 simple sentences
- Watch the live equalizer — keep the level in green (max yellow)
After recording:
- Listen to the preview
- If you hear noise or distortion, record again
The best sample = natural voice, no noise, moderate volume.
Common issues
"The recording doesn't start"
→ The browser doesn't have permission. Check the lock/microphone icon at the top left and allow access.
"The recorded audio sounds robotic"
→ Try again with a more natural tone and less rushed. Avoid sounding "like AI".
"The cloned voice doesn't sound like my sample"
→ Try recording again with better quality audio (quieter, clearer). The model you use (e.g. F5-TTS) will do its best, but fidelity depends on the sample.
"I want a premium voice but I don't have credits"
→ Buy credits from Account → Credits at the top right. Premium voices cost a bit more but the quality is worth it.
"The text isn't pronounced well (e.g. names, acronyms)"
→ Some models have limitations with acronyms and foreign names. Try rewriting the text to be phonetically clearer, or use a cloud model (ElevenLabs is better at this).
Pro tip
Record 5–10 different samples (different tones, languages, speeds) and save them. You can use them anytime to generate new texts with your voice. Perfect for narration, podcasts, YouTube videos.