Synchronized Narrated Video
Create multi-scene videos with perfectly synchronized text, voice, and images
What is Narrated Video
Narrated Video transforms a topic or idea into a complete multi-scene video:
- AI writes the script (scenes and narration).
- Generates coherent images for each scene.
- Creates narrated voice (TTS) perfectly synchronized.
- Assembles everything with karaoke captions and background music.
Perfect for explaining a topic, presenting a product, telling a story — the result is a professional, coherent, and synchronized video.
How to use it
Narrated Video is a 3-step wizard:
Step 1: Define the topic
Describe what your video should be about:
- Topic: what do you want to tell? (e.g., "3 tips to sleep better", "How to choose a backpack", etc.)
- Product or character (optional):
- Upload a product image to keep it consistent across all scenes.
- Or upload a character image to appear throughout the video.
- If you upload nothing, AI will generate abstract scenic scenes.
Click "Next →" to move to step 2.
Step 2: Configure rendering
Set technical and narrative aspects:
- Number of scenes: how many scenes should the video have (e.g., 3, 4, 5).
- Motion:
- Hybrid: motion effects (zoom, pan) on images.
- Ken Burns: classic slow zoom and pan effect.
- Image-to-Video: each image becomes a short animated video.
- Video engine for scenes:
- FastWan (local · free): generates video free locally, fast.
- Seedance, Wan, Kling, Veo (premium): premium cloud engines, uses credits.
- Format: 9:16 (vertical), 16:9 (horizontal), 1:1 (square).
Click "Next →" for step 3.
Step 3: Audio and generate
Add music and generate the final video:
- Background music (optional): describe the type of music you want (e.g., "relaxing classical", "upbeat electronic").
- Karaoke caption: enable synchronized subtitles.
- Language: Italian or English (determines voice and tone).
- Click "🎙 Generate narrated video".
Generation happens in the background. When ready, you'll get a notification and the video will appear in the Gallery.
Available options
| Option | Free | Premium | Note |
|---|---|---|---|
| FastWan (local) | ✓ | — | Fast, free |
| Seedance/Wan/Kling/Veo (cloud) | — | ✓ | Superior quality, uses credits |
| OmniAvatar (full-motion local) | ✓ | — | Free but slow |
| Background music | ✓ | — | Generated from text |
| Karaoke caption | ✓ | — | Synchronized |
Tips
Topic description: The more detailed, the better the generated scenes. Instead of "Fitness", write "Quick warm-up exercises (2 min) without equipment for busy people".
Coherent product: If you upload a product image, it appears in every scene. Perfect for presentations and marketing.
Coherent character: If you upload a person's image, that person guides the narration throughout.
FastWan vs premium: FastWan is free and fast (~5-10 min). Premium engines are slower but higher visual quality (uses credits).
Video length: Depends on number of scenes and engine. With FastWan and 4 scenes: ~8-12 minutes of processing.
Common issues
"No video produced"
- Topic may be too vague. Describe better what you want to tell.
"Incoherent images"
- AI generated different scenes. Retry or use premium engine (Kling, Veo) for better coherence.
"Voice in wrong language"
- Select the correct Language (Italian or English) in step 3 before generating.
"Music doesn't fit"
- Describe the genre/mood in "Background music" (e.g., "energetic rock", "relaxing ambient"). AI will generate fitting music.
"Very slow processing"
- If using OmniAvatar full-motion, processing is very slow (even >30 min). Use a faster engine.