Documentation index

Lip Sync Studio

Synchronize a face with audio. Use a photo or video, choose the model, get a video with lips synced.

What you can do

Lip Sync Studio synchronizes the mouth and face movement with audio:

  • Photo → video — upload a photo of a face, add audio, get a video with lips synced
  • Video → video — upload a video, replace the audio, sync automatically
  • Quality mode — choose between fast (basic lip-sync) or high quality (realistic facial expressions)

Perfect for: dubbing in other languages, talking avatars, narrated videos, voice clone of a face.

How to use it

Step 1: Upload the face

Click on Upload photo (or video):

  • Supported formats: JPG, PNG for photos; MP4, WebM for video
  • Max size: 100 MB for video, 10 MB for photo
  • The face must be visible and looking toward the camera (roughly frontal)

Step 2: Upload the audio

Click on Upload audio:

  • Supported formats: MP3, WAV, M4A
  • Max size: 50 MB
  • The audio will be synced with the face's lips

You can also:

  • Pass a URL of audio generated in Audio Studio
  • Record live voice from the microphone

Step 3: Choose the model

Two options:

Wav2Lip — fast, economical

  • Syncs mouth only (mouth movement)
  • Doesn't change facial expression
  • Generation: ~2–3 minutes for a 30-second clip
  • Free

MuseTalk — high quality, with expressions

  • Syncs lips + facial expression (eyes, eyebrows, chin)
  • Preserves face identity
  • Generation: ~3–5 minutes per 30-second clip
  • Gating: available for those with a paid plan
  • Supports long videos (e.g. 2–3 minutes)
ModelCostSpeedQualityWhen to use
Wav2LipFree~2–3 min/30sSimple lip-syncQuick test, basic lip-sync
MuseTalkPremium~3–5 min/30sNatural expressionsProfessional videos, realistic avatars

Step 4: Generate

Press Generate. The video is processed in the background. A desktop notification will alert you when it's ready.

In the Gallery on the right you see the generation status: Processing → Completed.

Step 5: Download or use in other studios

Once ready:

  • Download → save to your computer
  • Use in → send to Cinema Studio to edit it with other clips

Face requirements

For best results:

  • Lighting: the face must be well-lit (not backlit, not in shadow)
  • Position: looking at the camera (frontal or slight profile, not 90°)
  • Size: the face should occupy at least 30–50% of the image/video
  • Quality: sharp photo/video, not blurry, not pixelated
  • Expression: neutral or natural; the model will adapt the expression to the audio

If the face is in strict profile, side, or covered, sync will be worse.

Audio requirements

  • Language: the model auto-detects (IT, EN, ES, etc.)
  • Quality: clear audio, no heavy echo or background noise
  • Duration: up to 5 minutes for MuseTalk, 1 minute for Wav2Lip
  • Format: MP3, WAV, M4A supported

If audio is low or distorted, sync will be worse.

Typical workflow

Case 1: Dubbing an English video to Italian

  1. Download the original video (English)
  2. Audio Studio → Generate Italian narration voice
  3. Upload original video to Lip Sync Studio
  4. Upload Italian audio as "Audio"
  5. Generate synced video (dubbed to Italian)
  6. Download the final synced video

Case 2: Talking avatar from your photo

  1. Take a photo of your face (well-lit, looking at camera)
  2. Audio Studio → Record cloned voice or generate TTS
  3. Lip Sync Studio → Upload your photo + audio
  4. Generate lip-sync video (you talking)
  5. Download your talking avatar

Case 3: Professional editing

  1. Generate video/avatar in Lip Sync with final audio
  2. Cinema Studio → Upload the synced video
  3. Edit with other clips, add background music
  4. Download the edited video

Real timings

  • Wav2Lip: 2–3 minutes for 30 seconds of video
  • MuseTalk: 3–5 minutes for 30 seconds of video
  • If the queue is full, wait your turn (may add 1–2 minutes)

Common issues

"The face is twisted / lips don't move well"

→ Check that the face in the photo is straight and well-lit. Try again with a sharper, more frontal photo.

"The audio isn't synced"

→ Use clear, good-quality audio. If audio is low or distorted, sync fails. Try regenerating.

"The video takes too long"

→ MuseTalk can reach 5 minutes. If it exceeds 10 minutes, likely backend crash → reload the page.

"MuseTalk doesn't appear (only Wav2Lip)"

→ MuseTalk is gated to Premium plan. If you don't see the button, your account doesn't have access. Contact your administrator.

"I want to sync a long video (3+ minutes)"

→ Use MuseTalk (supports up to 5 minutes, but with long times). Wav2Lip is optimized for clips under 1 minute.

Pro tip

Combine Lip Sync + Cinema: generate a talking avatar in Lip Sync, then send it to Cinema to edit it together with instrumental soundtrack generated in Audio Studio. Result: a completely AI-generated narrative video.