Documentation index

Synchronized Narrated Video

Create multi-scene videos with perfectly synchronized text, voice, and images

What is Narrated Video

Narrated Video transforms a topic or idea into a complete multi-scene video:

  • AI writes the script (scenes and narration).
  • Generates coherent images for each scene.
  • Creates narrated voice (TTS) perfectly synchronized.
  • Assembles everything with karaoke captions and background music.

Perfect for explaining a topic, presenting a product, telling a story — the result is a professional, coherent, and synchronized video.

How to use it

Narrated Video is a 3-step wizard:

Step 1: Define the topic

Describe what your video should be about:

  • Topic: what do you want to tell? (e.g., "3 tips to sleep better", "How to choose a backpack", etc.)
  • Product or character (optional):
  • Upload a product image to keep it consistent across all scenes.
  • Or upload a character image to appear throughout the video.
  • If you upload nothing, AI will generate abstract scenic scenes.

Click "Next →" to move to step 2.

Step 2: Configure rendering

Set technical and narrative aspects:

  • Number of scenes: how many scenes should the video have (e.g., 3, 4, 5).
  • Motion:
  • Hybrid: motion effects (zoom, pan) on images.
  • Ken Burns: classic slow zoom and pan effect.
  • Image-to-Video: each image becomes a short animated video.
  • Video engine for scenes:
  • FastWan (local · free): generates video free locally, fast.
  • Seedance, Wan, Kling, Veo (premium): premium cloud engines, uses credits.
  • Format: 9:16 (vertical), 16:9 (horizontal), 1:1 (square).

Click "Next →" for step 3.

Step 3: Audio and generate

Add music and generate the final video:

  • Background music (optional): describe the type of music you want (e.g., "relaxing classical", "upbeat electronic").
  • Karaoke caption: enable synchronized subtitles.
  • Language: Italian or English (determines voice and tone).
  • Click "🎙 Generate narrated video".

Generation happens in the background. When ready, you'll get a notification and the video will appear in the Gallery.

Available options

OptionFreePremiumNote
FastWan (local)Fast, free
Seedance/Wan/Kling/Veo (cloud)Superior quality, uses credits
OmniAvatar (full-motion local)Free but slow
Background musicGenerated from text
Karaoke captionSynchronized

Tips

Topic description: The more detailed, the better the generated scenes. Instead of "Fitness", write "Quick warm-up exercises (2 min) without equipment for busy people".

Coherent product: If you upload a product image, it appears in every scene. Perfect for presentations and marketing.

Coherent character: If you upload a person's image, that person guides the narration throughout.

FastWan vs premium: FastWan is free and fast (~5-10 min). Premium engines are slower but higher visual quality (uses credits).

Video length: Depends on number of scenes and engine. With FastWan and 4 scenes: ~8-12 minutes of processing.

Common issues

"No video produced"

  • Topic may be too vague. Describe better what you want to tell.

"Incoherent images"

  • AI generated different scenes. Retry or use premium engine (Kling, Veo) for better coherence.

"Voice in wrong language"

  • Select the correct Language (Italian or English) in step 3 before generating.

"Music doesn't fit"

  • Describe the genre/mood in "Background music" (e.g., "energetic rock", "relaxing ambient"). AI will generate fitting music.

"Very slow processing"

  • If using OmniAvatar full-motion, processing is very slow (even >30 min). Use a faster engine.