Synchronized Narrated Video

Create multi-scene videos with perfectly synchronized text, voice, and images

What is Narrated Video

Narrated Video transforms a topic or idea into a complete multi-scene video:

AI writes the script (scenes and narration).
Generates coherent images for each scene.
Creates narrated voice (TTS) perfectly synchronized.
Assembles everything with karaoke captions and background music.

Perfect for explaining a topic, presenting a product, telling a story — the result is a professional, coherent, and synchronized video.

How to use it

Narrated Video is a 3-step wizard:

Step 1: Define the topic

Describe what your video should be about:

Topic: what do you want to tell? (e.g., "3 tips to sleep better", "How to choose a backpack", etc.)
Product or character (optional):
Upload a product image to keep it consistent across all scenes.
Or upload a character image to appear throughout the video.
If you upload nothing, AI will generate abstract scenic scenes.

Click "Next →" to move to step 2.

Step 2: Configure rendering

Set technical and narrative aspects:

Number of scenes: how many scenes should the video have (e.g., 3, 4, 5).
Motion:
Hybrid: motion effects (zoom, pan) on images.
Ken Burns: classic slow zoom and pan effect.
Image-to-Video: each image becomes a short animated video.
Video engine for scenes:
FastWan (local · free): generates video free locally, fast.
Seedance, Wan, Kling, Veo (premium): premium cloud engines, uses credits.
Format: 9:16 (vertical), 16:9 (horizontal), 1:1 (square).

Click "Next →" for step 3.

Step 3: Audio and generate

Add music and generate the final video:

Background music (optional): describe the type of music you want (e.g., "relaxing classical", "upbeat electronic").
Karaoke caption: enable synchronized subtitles.
Language: Italian or English (determines voice and tone).
Click "🎙 Generate narrated video".

Generation happens in the background. When ready, you'll get a notification and the video will appear in the Gallery.

Available options

Option	Free	Premium	Note
FastWan (local)	✓	—	Fast, free
Seedance/Wan/Kling/Veo (cloud)	—	✓	Superior quality, uses credits
OmniAvatar (full-motion local)	✓	—	Free but slow
Background music	✓	—	Generated from text
Karaoke caption	✓	—	Synchronized

Tips

Topic description: The more detailed, the better the generated scenes. Instead of "Fitness", write "Quick warm-up exercises (2 min) without equipment for busy people".

Coherent product: If you upload a product image, it appears in every scene. Perfect for presentations and marketing.

Coherent character: If you upload a person's image, that person guides the narration throughout.

FastWan vs premium: FastWan is free and fast (~5-10 min). Premium engines are slower but higher visual quality (uses credits).

Video length: Depends on number of scenes and engine. With FastWan and 4 scenes: ~8-12 minutes of processing.

Common issues

"No video produced"

Topic may be too vague. Describe better what you want to tell.

"Incoherent images"

AI generated different scenes. Retry or use premium engine (Kling, Veo) for better coherence.

"Voice in wrong language"

Select the correct Language (Italian or English) in step 3 before generating.

"Music doesn't fit"

Describe the genre/mood in "Background music" (e.g., "energetic rock", "relaxing ambient"). AI will generate fitting music.

"Very slow processing"

If using OmniAvatar full-motion, processing is very slow (even >30 min). Use a faster engine.