Lip Sync Studio
Synchronize a face with audio. Use a photo or video, choose the model, get a video with lips synced.
What you can do
Lip Sync Studio synchronizes the mouth and face movement with audio:
- Photo → video — upload a photo of a face, add audio, get a video with lips synced
- Video → video — upload a video, replace the audio, sync automatically
- Quality mode — choose between fast (basic lip-sync) or high quality (realistic facial expressions)
Perfect for: dubbing in other languages, talking avatars, narrated videos, voice clone of a face.
How to use it
Step 1: Upload the face
Click on Upload photo (or video):
- Supported formats: JPG, PNG for photos; MP4, WebM for video
- Max size: 100 MB for video, 10 MB for photo
- The face must be visible and looking toward the camera (roughly frontal)
Step 2: Upload the audio
Click on Upload audio:
- Supported formats: MP3, WAV, M4A
- Max size: 50 MB
- The audio will be synced with the face's lips
You can also:
- Pass a URL of audio generated in Audio Studio
- Record live voice from the microphone
Step 3: Choose the model
Two options:
Wav2Lip — fast, economical
- Syncs mouth only (mouth movement)
- Doesn't change facial expression
- Generation: ~2–3 minutes for a 30-second clip
- Free
MuseTalk — high quality, with expressions
- Syncs lips + facial expression (eyes, eyebrows, chin)
- Preserves face identity
- Generation: ~3–5 minutes per 30-second clip
- Gating: available for those with a paid plan
- Supports long videos (e.g. 2–3 minutes)
| Model | Cost | Speed | Quality | When to use |
|---|---|---|---|---|
| Wav2Lip | Free | ~2–3 min/30s | Simple lip-sync | Quick test, basic lip-sync |
| MuseTalk | Premium | ~3–5 min/30s | Natural expressions | Professional videos, realistic avatars |
Step 4: Generate
Press Generate. The video is processed in the background. A desktop notification will alert you when it's ready.
In the Gallery on the right you see the generation status: Processing → Completed.
Step 5: Download or use in other studios
Once ready:
- Download → save to your computer
- Use in → send to Cinema Studio to edit it with other clips
Face requirements
For best results:
- Lighting: the face must be well-lit (not backlit, not in shadow)
- Position: looking at the camera (frontal or slight profile, not 90°)
- Size: the face should occupy at least 30–50% of the image/video
- Quality: sharp photo/video, not blurry, not pixelated
- Expression: neutral or natural; the model will adapt the expression to the audio
If the face is in strict profile, side, or covered, sync will be worse.
Audio requirements
- Language: the model auto-detects (IT, EN, ES, etc.)
- Quality: clear audio, no heavy echo or background noise
- Duration: up to 5 minutes for MuseTalk, 1 minute for Wav2Lip
- Format: MP3, WAV, M4A supported
If audio is low or distorted, sync will be worse.
Typical workflow
Case 1: Dubbing an English video to Italian
- Download the original video (English)
- Audio Studio → Generate Italian narration voice
- Upload original video to Lip Sync Studio
- Upload Italian audio as "Audio"
- Generate synced video (dubbed to Italian)
- Download the final synced video
Case 2: Talking avatar from your photo
- Take a photo of your face (well-lit, looking at camera)
- Audio Studio → Record cloned voice or generate TTS
- Lip Sync Studio → Upload your photo + audio
- Generate lip-sync video (you talking)
- Download your talking avatar
Case 3: Professional editing
- Generate video/avatar in Lip Sync with final audio
- Cinema Studio → Upload the synced video
- Edit with other clips, add background music
- Download the edited video
Real timings
- Wav2Lip: 2–3 minutes for 30 seconds of video
- MuseTalk: 3–5 minutes for 30 seconds of video
- If the queue is full, wait your turn (may add 1–2 minutes)
Common issues
"The face is twisted / lips don't move well"
→ Check that the face in the photo is straight and well-lit. Try again with a sharper, more frontal photo.
"The audio isn't synced"
→ Use clear, good-quality audio. If audio is low or distorted, sync fails. Try regenerating.
"The video takes too long"
→ MuseTalk can reach 5 minutes. If it exceeds 10 minutes, likely backend crash → reload the page.
"MuseTalk doesn't appear (only Wav2Lip)"
→ MuseTalk is gated to Premium plan. If you don't see the button, your account doesn't have access. Contact your administrator.
"I want to sync a long video (3+ minutes)"
→ Use MuseTalk (supports up to 5 minutes, but with long times). Wav2Lip is optimized for clips under 1 minute.
Pro tip
Combine Lip Sync + Cinema: generate a talking avatar in Lip Sync, then send it to Cinema to edit it together with instrumental soundtrack generated in Audio Studio. Result: a completely AI-generated narrative video.