Lip Sync Studio

Synchronize a face with audio. Use a photo or video, choose the model, get a video with lips synced.

What you can do

Lip Sync Studio synchronizes the mouth and face movement with audio:

Photo → video — upload a photo of a face, add audio, get a video with lips synced
Video → video — upload a video, replace the audio, sync automatically
Quality mode — choose between fast (basic lip-sync) or high quality (realistic facial expressions)

Perfect for: dubbing in other languages, talking avatars, narrated videos, voice clone of a face.

How to use it

Step 1: Upload the face

Click on Upload photo (or video):

Supported formats: JPG, PNG for photos; MP4, WebM for video
Max size: 100 MB for video, 10 MB for photo
The face must be visible and looking toward the camera (roughly frontal)

Step 2: Upload the audio

Click on Upload audio:

Supported formats: MP3, WAV, M4A
Max size: 50 MB
The audio will be synced with the face's lips

You can also:

Pass a URL of audio generated in Audio Studio
Record live voice from the microphone

Step 3: Choose the model

Two options:

Wav2Lip — fast, economical

Syncs mouth only (mouth movement)
Doesn't change facial expression
Generation: ~2–3 minutes for a 30-second clip
Free

MuseTalk — high quality, with expressions

Syncs lips + facial expression (eyes, eyebrows, chin)
Preserves face identity
Generation: ~3–5 minutes per 30-second clip
Gating: available for those with a paid plan
Supports long videos (e.g. 2–3 minutes)

Model	Cost	Speed	Quality	When to use
Wav2Lip	Free	~2–3 min/30s	Simple lip-sync	Quick test, basic lip-sync
MuseTalk	Premium	~3–5 min/30s	Natural expressions	Professional videos, realistic avatars

Step 4: Generate

Press Generate. The video is processed in the background. A desktop notification will alert you when it's ready.

In the Gallery on the right you see the generation status: Processing → Completed.

Step 5: Download or use in other studios

Once ready:

Download → save to your computer
Use in → send to Cinema Studio to edit it with other clips

Face requirements

For best results:

Lighting: the face must be well-lit (not backlit, not in shadow)
Position: looking at the camera (frontal or slight profile, not 90°)
Size: the face should occupy at least 30–50% of the image/video
Quality: sharp photo/video, not blurry, not pixelated
Expression: neutral or natural; the model will adapt the expression to the audio

If the face is in strict profile, side, or covered, sync will be worse.

Audio requirements

Language: the model auto-detects (IT, EN, ES, etc.)
Quality: clear audio, no heavy echo or background noise
Duration: up to 5 minutes for MuseTalk, 1 minute for Wav2Lip
Format: MP3, WAV, M4A supported

If audio is low or distorted, sync will be worse.

Typical workflow

Case 1: Dubbing an English video to Italian

Download the original video (English)
Audio Studio → Generate Italian narration voice
Upload original video to Lip Sync Studio
Upload Italian audio as "Audio"
Generate synced video (dubbed to Italian)
Download the final synced video

Case 2: Talking avatar from your photo

Take a photo of your face (well-lit, looking at camera)
Audio Studio → Record cloned voice or generate TTS
Lip Sync Studio → Upload your photo + audio
Generate lip-sync video (you talking)
Download your talking avatar

Case 3: Professional editing

Generate video/avatar in Lip Sync with final audio
Cinema Studio → Upload the synced video
Edit with other clips, add background music
Download the edited video

Real timings

Wav2Lip: 2–3 minutes for 30 seconds of video
MuseTalk: 3–5 minutes for 30 seconds of video
If the queue is full, wait your turn (may add 1–2 minutes)

Common issues

"The face is twisted / lips don't move well"

→ Check that the face in the photo is straight and well-lit. Try again with a sharper, more frontal photo.

"The audio isn't synced"

→ Use clear, good-quality audio. If audio is low or distorted, sync fails. Try regenerating.

"The video takes too long"

→ MuseTalk can reach 5 minutes. If it exceeds 10 minutes, likely backend crash → reload the page.

"MuseTalk doesn't appear (only Wav2Lip)"

→ MuseTalk is gated to Premium plan. If you don't see the button, your account doesn't have access. Contact your administrator.

"I want to sync a long video (3+ minutes)"

→ Use MuseTalk (supports up to 5 minutes, but with long times). Wav2Lip is optimized for clips under 1 minute.

Pro tip

Combine Lip Sync + Cinema: generate a talking avatar in Lip Sync, then send it to Cinema to edit it together with instrumental soundtrack generated in Audio Studio. Result: a completely AI-generated narrative video.