Audio Studio

Generate voice, instrumental music, music with vocals and audio effects. Post-production included.

What you can do

Audio Studio creates and edits audio in every form:

Synthesized voice — read texts aloud (TTS)
Instrumental music — genre and mood (jazz, lo-fi, cinematic, etc.)
Music with voice — description + sung lyrics
Post-production — stem separation, noise reduction, mastering

Part 1: Voice and music generation

Choose the type

At the top left, three buttons:

Voice — voice synthesis from text
Instrumental music — genre and atmosphere
Music with voice — full track with lyrics

Synthesized voice

Choose the model:

Chatterbox TTS — fast, good quality, free
Chatterbox HD — HD quality local, free, more natural voice
IndexTTS2 — Premium HD — ultra-realistic, emotional voice, gating (paid plan)
Cloud models (MiniMax, F5-TTS, Qwen3-TTS) — excellent quality, rich voice options

Write the text you want to hear read aloud

Optional: select the voice

Cloud models have preset voices (Maria, Luca, etc.)
Some models allow custom voice: describe the voice you want (*"male voice deep and calm"*)

Generate — the voice is created in a few seconds

Instrumental music

Choose the model:

Stable Audio — free, quality instrumental
Lyria 2, Eleven Music — cloud, premium quality

Describe the genre and mood

Example: *"lo-fi hip hop, relaxing, with piano and soft drums"*
Example: *"epic cinematic, strings, brass, dramatic"*

Duration: 10–120 seconds (depends on the model)

Generate — the music is created

Music with voice

Choose the model:

ACE-Step — free, generates full tracks with lyrics
Yue, Lyria 2 (cloud) — premium, studio quality

Write the style/genre

Example: *"melodic pop, energetic, female vocals"*

Paste lyrics (optional)

Use tags like [verse], [chorus] to structure the song
If empty, the model generates instrumental music

Duration: 15–300 seconds

Generate — the track is created with synthesized voice

Part 2: Audio post-production

After generating audio or uploading your own, you can apply effects:

From Gallery → "Use in"

Generate or upload audio
In Gallery, click Use in → Audio post-production
Choose the effect:

Stem split (Demucs) — separate voice, drums, bass, other instruments
Denoise voice — remove background noise from voice
Master to reference — automatic mastering matching (provide a reference audio and shaping applies to yours)

Available effects

Effect	What it does	When to use
Demucs	Separates voice from music	If you want only the voice, or only the music from a song
Denoise	Removes noise and hum	Voice recorded poorly, cheap microphone recording
Matchering	Copies mastering style	You want your track to sound like another artist's

Session management

On the right, Audio Gallery shows:

All generated audio in the current session
Duration and model used
Buttons: Listen, Download, Use in, Delete

Press New session to clear the grid and start over.

Free vs Premium models

Model	Cost	Quality	When to use
Chatterbox TTS	Free	Good	Fast TTS, narrative
Chatterbox HD	Free	Good HD	More natural TTS, local
Stable Audio	Free	Good	Fast instrumental music
ACE-Step	Free	Good	Tracks with synthesized vocals
Cloud (MiniMax, F5, Qwen, Lyria, Eleven)	Credits	Excellent	Studio quality, very natural voice

Real timings

Voice TTS: 5–20 seconds
Instrumental music: 20–60 seconds
Music with voice: 1–3 minutes
Post-production (stem/denoise/master): 30–120 seconds

Common issues

"I can't hear the voice"

→ Try a different model. Some models require credits (lock 🔒).

"The audio is too fast/slow"

→ Cloud models allow the Speed parameter: adjust from 0.5× to 2.0×.

"I want to clone MY voice"

→ Record a sample audio (10–20 seconds), upload it as Voice sample. F5-TTS and Qwen3-TTS support zero-shot cloning.

"The mastering sounds "weird""

→ Use a high-quality reference audio (from Spotify, YouTube with good mastering). The matching will be faithful to the reference.

Pro tip

Generate the narration voice in Audio Studio, then send it to Cinema Studio to sync it with a video using lip sync. Or generate music + voice in a complete track and use it as soundtrack in your videos.