Voice Studio

Advanced voice synthesis, clone your personal voice and premium HD voices for paid plans.

What you can do

Voice Studio transforms texts into audio with the voice you choose:

Text → voice — read texts aloud
Personal cloned voice — record YOUR voice and clone it
Premium voices — ultra-realistic and natural voice
Preview and controls — try instantly and adjust every detail

How to record and clone YOUR voice

This is Voice Studio's distinctive feature: you can record an audio sample and use it to generate texts with YOUR voice.

Step 1: Recording

Click Record personal voice
Grant the browser permission to use the microphone
Speak in a natural and clear tone, for 10–30 seconds
Live equalizer → See audio levels in real-time (dynamic graph). Avoid red peaks (distortion)
Click End recording when done

Step 2: Save the sample

Once recorded:

Listen to the preview (▶ button)
Click Save sample — it's saved to your profile

You can record multiple samples (different voices, tones, languages). Each will be usable in the future.

Step 3: Use the sample

Choose your voice sample from the list
Write the text you want to hear with YOUR voice
Press Generate
The audio is created in a few seconds, with your cloned voice

Preset voices and premium voices

If you don't want to clone your voice, you can use a preset voice or a premium voice:

Preset voices (free)

Models like Chatterbox TTS and Qwen3-TTS offer pre-recorded voices:

Choose from the list (e.g. "Serena", "Luca", "Maria")
Write the text
Generate instantly

Premium voices (with a paid plan)

If you have the Pro or Enterprise plan:

IndexTTS2 — Premium HD — ultra-natural and emotional voice, generated locally
ElevenLabs v3 — studio-quality voice, 32 presets + custom voice cloning
MiniMax Speech 2.8 HD — with emotions (happy, sad, angry, etc.)

Premium voices cost more in credits but the quality is cinema-grade: almost indistinguishable from a real human voice.

Generation parameters

Depending on the voice chosen, you can adjust:

Text: what the AI will say
Voice/Sample: which voice to use
Speed: from 0.5× (very slow) to 2.0× (very fast). Default 1.0×
Language (some models): auto-detect or manual choice (IT, EN, ES, FR, etc.)
Emotion (premium models): happy, sad, angry, neutral, etc.

Preview and download

Generate → the audio appears in Gallery on the right
Click an audio to listen
Press Download to save to your computer
Press Use in to send it to another studio (Cinema, Avatar, Video)

Voice synthesis models

Model	Cost	Voice quality	Languages	When to use
Chatterbox TTS	Free	Good	IT, EN, ES, FR, DE, etc.	Fast and simple TTS
Chatterbox HD	Free	Good HD	IT, EN, ES, FR	Local TTS, more natural
Qwen3-TTS	Premium	Excellent	11 languages	Multilingual, realistic voice
IndexTTS2 HD	Premium	Studio-quality	IT, EN, ES, FR	Ultra-natural, emotional voice
ElevenLabs v3	Premium	Cinematic	32 languages	Professional voices, premium voice clone

Recording: how to get the best results

Before recording, find a quiet place:

Close doors and windows
Turn off fans, AC, background noise
Use a dedicated microphone (headset mic, lavalier, USB condenser) if possible
If using laptop/phone mic, position close (about 15 cm)

During recording:

Speak in a natural and clear way, not robotic
Keep a consistent tone (don't shout, don't whisper)
Read 2–3 simple sentences
Watch the live equalizer — keep the level in green (max yellow)

After recording:

Listen to the preview
If you hear noise or distortion, record again

The best sample = natural voice, no noise, moderate volume.

Common issues

"The recording doesn't start"

→ The browser doesn't have permission. Check the lock/microphone icon at the top left and allow access.

"The recorded audio sounds robotic"

→ Try again with a more natural tone and less rushed. Avoid sounding "like AI".

"The cloned voice doesn't sound like my sample"

→ Try recording again with better quality audio (quieter, clearer). The model you use (e.g. F5-TTS) will do its best, but fidelity depends on the sample.

"I want a premium voice but I don't have credits"

→ Buy credits from Account → Credits at the top right. Premium voices cost a bit more but the quality is worth it.

"The text isn't pronounced well (e.g. names, acronyms)"

→ Some models have limitations with acronyms and foreign names. Try rewriting the text to be phonetically clearer, or use a cloud model (ElevenLabs is better at this).

Pro tip

Record 5–10 different samples (different tones, languages, speeds) and save them. You can use them anytime to generate new texts with your voice. Perfect for narration, podcasts, YouTube videos.