🔑
Get your Incredible API key
Generate your API key to start using this endpoint
→
Overview
The Video Generation API creates high-quality short videos from text descriptions or reference images. Powered by advanced AI diffusion models, this endpoint produces smooth, realistic video clips that bring your concepts to life. Key capabilities:- Text-to-video - Generate videos from natural language descriptions
- Image-to-video - Animate static images into dynamic videos
- Multiple resolutions - From social media to high-definition formats
- Consistent motion - Smooth, natural movement and transitions
- Temporal coherence - Maintains visual consistency across frames
- Social media content - Create eye-catching videos for posts and Stories
- Marketing campaigns - Generate product demos and promotional videos
- Content creation - B-roll footage for videos and presentations
- E-commerce - Product showcases with dynamic movement
- Prototyping - Visualize concepts before expensive production
- Animation - Quick animated sequences for various purposes
- Education - Visual demonstrations and explainer content
How Video Generation Works
Video generation is significantly more complex than image generation because it must maintain consistency across multiple frames while creating natural motion:- Prompt/Image Analysis - Understanding what needs to be generated and how it should move
- Motion Planning - Determining camera movement, object motion, and scene dynamics
- Frame Generation - Creating each frame with temporal consistency
- Motion Smoothing - Ensuring smooth transitions between frames
- Upscaling & Enhancement - Improving quality and resolution
- Encoding - Compressing into MP4 format
- Delivery - Encoded as base64 data URI for transmission
Writing Effective Video Prompts
Video prompts require thinking about motion and time, not just static scenes:Prompt Structure for Videos
Template:- ✅ “Ocean waves crashing against rocky cliffs, slow motion, cinematic”
- ✅ “Golden retriever running through meadow, camera following, sunny day”
- ✅ “City traffic at night, time lapse, neon lights, aerial view”
- ✅ “Waterfall cascading down mountain, morning mist, drone footage”
- ❌ “Beach” (no motion described)
- ❌ “Person standing still” (minimal movement)
- ❌ “Multiple complex scenes” (too ambitious for 8 seconds)
Key Elements for Video Prompts
Motion & Action - What’s moving?- Natural motion: “leaves rustling in wind,” “water flowing”
- Object motion: “car driving,” “bird flying,” “person walking”
- Camera motion: “camera panning,” “zooming in,” “orbiting around”
- Set the scene: “in a bustling city,” “on a quiet beach,” “through a forest”
- Include atmosphere: “foggy morning,” “golden hour,” “stormy weather”
- “Slow motion,” “time lapse,” “steady cam,” “drone footage”
- “Close-up,” “wide shot,” “tracking shot,” “aerial view”
- “Handheld,” “smooth pan,” “dolly zoom”
- “Cinematic,” “documentary style,” “dreamy,” “dramatic”
- “Realistic,” “artistic,” “commercial quality”
Video-Specific Tips
Do’s:- ✅ Describe movement clearly
- ✅ Keep it simple - one main action or scene
- ✅ Specify camera movement if important
- ✅ Use temporal terms (slow motion, time lapse)
- ✅ Focus on visually interesting subjects
- ❌ Try to fit multiple scenes in 8 seconds
- ❌ Describe complex narratives
- ❌ Request rapid scene changes
- ❌ Include dialogue or specific audio
- ❌ Expect photorealistic human faces/actions (challenging for AI)
Examples
Request Parameters
prompt (required)
A detailed description of the video you want to generate, including motion, environment, and mood. Focus on describing dynamic elements and movement. Prompt length: 1-1000 characters. Optimal: 30-150 characters with clear action description. See “Writing Effective Video Prompts” above for detailed guidance.size (optional)
Video resolution and aspect ratio. Default:1280x720 (720p landscape).
Available resolutions:
1280x720 (720p Landscape) - Default
- Aspect ratio: 16:9
- Best for: YouTube, presentations, website embeds
- File size: Moderate
- Generation time: Standard
- Aspect ratio: 9:16
- Best for: Instagram/TikTok Stories, mobile-first content
- File size: Moderate
- Generation time: Standard
- Aspect ratio: 16:9
- Best for: High-quality YouTube, professional content
- File size: Larger
- Generation time: Longer
- Aspect ratio: 9:16
- Best for: Premium mobile content, vertical video platforms
- File size: Larger
- Generation time: Longer
- Aspect ratio: 1:1
- Best for: Instagram posts, social feeds
- File size: Moderate
- Generation time: Standard
- Mobile/Social: Use portrait (9:16) or square (1:1)
- YouTube/Web: Use landscape (16:9)
- Quality vs Speed: Lower resolutions generate faster
- File size: Higher resolutions = larger files
input_reference (optional)
A base64-encoded reference image to animate into video. This enables image-to-video generation where you provide a starting frame and the AI adds motion. Use cases for image-to-video:- Animate photos - Bring static images to life
- Product demonstrations - Show products in motion
- Brand consistency - Start from your existing visual assets
- Character animation - Animate illustrations or artwork
- Architectural visualization - Add life to renderings
- Start with high-quality, well-composed images
- Describe the motion/animation you want
- Keep motion expectations realistic (subtle movements work best)
- Avoid images with too much complexity
Best Practices
Prompt Engineering for Videos:- Focus on a single, clear action or scene
- Describe motion explicitly (“waves crashing,” “camera panning”)
- Include camera movement for more dynamic results
- Use cinematic terms for professional looks
- Specify pace (slow motion, time lapse)
- Lower resolutions generate faster and use less bandwidth
- Portrait formats are ideal for mobile platforms
- Landscape formats for traditional viewing
- Square for universal social media compatibility
- Video generation takes 1-10 minutes - implement proper timeout handling
- Consider queueing or background processing for user-facing apps
- Cache frequently requested videos
- Provide loading indicators and progress updates to users
- Use clear, simple prompts
- Avoid overly complex scenes
- Natural phenomena work well (water, fire, clouds)
- Camera movements are easier than complex object actions
- Test with various prompts to learn what works best
- Async processing - Start generation, notify user when complete
- Preview thumbnails - Generate image first to preview before video
- Batch generation - Queue multiple videos for efficient processing
- Fallback strategy - Have backup content if generation fails
Handling Video Output
Videos are returned as base64-encoded data URIs in the format:- Ensures reliable transmission over HTTP
- No need for separate file hosting
- Immediate availability
- Simplified API response structure
- 720p: ~1-3MB for 8 seconds
- 1080p: ~2-5MB for 8 seconds
- Depends on complexity and motion
Important Notes
Generation Time: Video generation is computationally intensive. Expect 1-10 minutes depending on:- Resolution requested
- Scene complexity
- Current server load
- Whether using image-to-video
- Video length: Fixed at 8 seconds
- No audio generation (silent videos)
- Best results with natural phenomena and simple motions
- Complex human actions may not be photorealistic
Response
video_url is a base64 data URI—extract and decode to save as MP4 file.