Documentation index

Voice Studio

Advanced voice synthesis, clone your personal voice and premium HD voices for paid plans.

What you can do

Voice Studio transforms texts into audio with the voice you choose:

  • Text → voice — read texts aloud
  • Personal cloned voice — record YOUR voice and clone it
  • Premium voices — ultra-realistic and natural voice
  • Preview and controls — try instantly and adjust every detail

How to record and clone YOUR voice

This is Voice Studio's distinctive feature: you can record an audio sample and use it to generate texts with YOUR voice.

Step 1: Recording

  1. Click Record personal voice
  2. Grant the browser permission to use the microphone
  3. Speak in a natural and clear tone, for 10–30 seconds
  4. Live equalizer → See audio levels in real-time (dynamic graph). Avoid red peaks (distortion)
  5. Click End recording when done

Step 2: Save the sample

Once recorded:

  • Listen to the preview (▶ button)
  • Click Save sample — it's saved to your profile

You can record multiple samples (different voices, tones, languages). Each will be usable in the future.

Step 3: Use the sample

  1. Choose your voice sample from the list
  2. Write the text you want to hear with YOUR voice
  3. Press Generate
  4. The audio is created in a few seconds, with your cloned voice

Preset voices and premium voices

If you don't want to clone your voice, you can use a preset voice or a premium voice:

Preset voices (free)

Models like Chatterbox TTS and Qwen3-TTS offer pre-recorded voices:

  • Choose from the list (e.g. "Serena", "Luca", "Maria")
  • Write the text
  • Generate instantly

Premium voices (with a paid plan)

If you have the Pro or Enterprise plan:

  • IndexTTS2 — Premium HD — ultra-natural and emotional voice, generated locally
  • ElevenLabs v3 — studio-quality voice, 32 presets + custom voice cloning
  • MiniMax Speech 2.8 HD — with emotions (happy, sad, angry, etc.)

Premium voices cost more in credits but the quality is cinema-grade: almost indistinguishable from a real human voice.

Generation parameters

Depending on the voice chosen, you can adjust:

  • Text: what the AI will say
  • Voice/Sample: which voice to use
  • Speed: from 0.5× (very slow) to 2.0× (very fast). Default 1.0×
  • Language (some models): auto-detect or manual choice (IT, EN, ES, FR, etc.)
  • Emotion (premium models): happy, sad, angry, neutral, etc.

Preview and download

  1. Generate → the audio appears in Gallery on the right
  2. Click an audio to listen
  3. Press Download to save to your computer
  4. Press Use in to send it to another studio (Cinema, Avatar, Video)

Voice synthesis models

ModelCostVoice qualityLanguagesWhen to use
Chatterbox TTSFreeGoodIT, EN, ES, FR, DE, etc.Fast and simple TTS
Chatterbox HDFreeGood HDIT, EN, ES, FRLocal TTS, more natural
Qwen3-TTSPremiumExcellent11 languagesMultilingual, realistic voice
IndexTTS2 HDPremiumStudio-qualityIT, EN, ES, FRUltra-natural, emotional voice
ElevenLabs v3PremiumCinematic32 languagesProfessional voices, premium voice clone

Recording: how to get the best results

Before recording, find a quiet place:

  • Close doors and windows
  • Turn off fans, AC, background noise
  • Use a dedicated microphone (headset mic, lavalier, USB condenser) if possible
  • If using laptop/phone mic, position close (about 15 cm)

During recording:

  • Speak in a natural and clear way, not robotic
  • Keep a consistent tone (don't shout, don't whisper)
  • Read 2–3 simple sentences
  • Watch the live equalizer — keep the level in green (max yellow)

After recording:

  • Listen to the preview
  • If you hear noise or distortion, record again

The best sample = natural voice, no noise, moderate volume.

Common issues

"The recording doesn't start"

→ The browser doesn't have permission. Check the lock/microphone icon at the top left and allow access.

"The recorded audio sounds robotic"

→ Try again with a more natural tone and less rushed. Avoid sounding "like AI".

"The cloned voice doesn't sound like my sample"

→ Try recording again with better quality audio (quieter, clearer). The model you use (e.g. F5-TTS) will do its best, but fidelity depends on the sample.

"I want a premium voice but I don't have credits"

→ Buy credits from Account → Credits at the top right. Premium voices cost a bit more but the quality is worth it.

"The text isn't pronounced well (e.g. names, acronyms)"

→ Some models have limitations with acronyms and foreign names. Try rewriting the text to be phonetically clearer, or use a cloud model (ElevenLabs is better at this).

Pro tip

Record 5–10 different samples (different tones, languages, speeds) and save them. You can use them anytime to generate new texts with your voice. Perfect for narration, podcasts, YouTube videos.