Documentation index

Audio Studio

Generate voice, instrumental music, music with vocals and audio effects. Post-production included.

What you can do

Audio Studio creates and edits audio in every form:

  • Synthesized voice — read texts aloud (TTS)
  • Instrumental music — genre and mood (jazz, lo-fi, cinematic, etc.)
  • Music with voice — description + sung lyrics
  • Post-production — stem separation, noise reduction, mastering

Part 1: Voice and music generation

Choose the type

At the top left, three buttons:

  • Voice — voice synthesis from text
  • Instrumental music — genre and atmosphere
  • Music with voice — full track with lyrics

Synthesized voice

  1. Choose the model:
  • Chatterbox TTS — fast, good quality, free
  • Chatterbox HD — HD quality local, free, more natural voice
  • IndexTTS2 — Premium HD — ultra-realistic, emotional voice, gating (paid plan)
  • Cloud models (MiniMax, F5-TTS, Qwen3-TTS) — excellent quality, rich voice options
  1. Write the text you want to hear read aloud
  1. Optional: select the voice
  • Cloud models have preset voices (Maria, Luca, etc.)
  • Some models allow custom voice: describe the voice you want (*"male voice deep and calm"*)
  1. Generate — the voice is created in a few seconds

Instrumental music

  1. Choose the model:
  • Stable Audio — free, quality instrumental
  • Lyria 2, Eleven Music — cloud, premium quality
  1. Describe the genre and mood
  • Example: *"lo-fi hip hop, relaxing, with piano and soft drums"*
  • Example: *"epic cinematic, strings, brass, dramatic"*
  1. Duration: 10–120 seconds (depends on the model)
  1. Generate — the music is created

Music with voice

  1. Choose the model:
  • ACE-Step — free, generates full tracks with lyrics
  • Yue, Lyria 2 (cloud) — premium, studio quality
  1. Write the style/genre
  • Example: *"melodic pop, energetic, female vocals"*
  1. Paste lyrics (optional)
  • Use tags like [verse], [chorus] to structure the song
  • If empty, the model generates instrumental music
  1. Duration: 15–300 seconds
  1. Generate — the track is created with synthesized voice

Part 2: Audio post-production

After generating audio or uploading your own, you can apply effects:

  1. Generate or upload audio
  2. In Gallery, click Use inAudio post-production
  3. Choose the effect:
  • Stem split (Demucs) — separate voice, drums, bass, other instruments
  • Denoise voice — remove background noise from voice
  • Master to reference — automatic mastering matching (provide a reference audio and shaping applies to yours)

Available effects

EffectWhat it doesWhen to use
DemucsSeparates voice from musicIf you want only the voice, or only the music from a song
DenoiseRemoves noise and humVoice recorded poorly, cheap microphone recording
MatcheringCopies mastering styleYou want your track to sound like another artist's

Session management

On the right, Audio Gallery shows:

  • All generated audio in the current session
  • Duration and model used
  • Buttons: Listen, Download, Use in, Delete

Press New session to clear the grid and start over.

Free vs Premium models

ModelCostQualityWhen to use
Chatterbox TTSFreeGoodFast TTS, narrative
Chatterbox HDFreeGood HDMore natural TTS, local
Stable AudioFreeGoodFast instrumental music
ACE-StepFreeGoodTracks with synthesized vocals
Cloud (MiniMax, F5, Qwen, Lyria, Eleven)CreditsExcellentStudio quality, very natural voice

Real timings

  • Voice TTS: 5–20 seconds
  • Instrumental music: 20–60 seconds
  • Music with voice: 1–3 minutes
  • Post-production (stem/denoise/master): 30–120 seconds

Common issues

"I can't hear the voice"

→ Try a different model. Some models require credits (lock 🔒).

"The audio is too fast/slow"

→ Cloud models allow the Speed parameter: adjust from 0.5× to 2.0×.

"I want to clone MY voice"

→ Record a sample audio (10–20 seconds), upload it as Voice sample. F5-TTS and Qwen3-TTS support zero-shot cloning.

"The mastering sounds "weird""

→ Use a high-quality reference audio (from Spotify, YouTube with good mastering). The matching will be faithful to the reference.

Pro tip

Generate the narration voice in Audio Studio, then send it to Cinema Studio to sync it with a video using lip sync. Or generate music + voice in a complete track and use it as soundtrack in your videos.