Back
Creative Tools
API Reference
Introduction
Sequencer is a creative platform for AI filmmaking. Generate images, video, audio, and 3D assets from 100+ AI models. Edit, compose, and export from one interface.
This documentation covers every tool in the platform and the complete REST API for building integrations.
Base URL
Copy
https://api.sequencer.media
Credits & Billing
Sequencer uses pay-per-use credits. Each operation costs a set amount based on the model and output type. Your balance is shown in the top-right of the dashboard.
Credits are consumed on successful completion only. Failed generations are not charged. Top up anytime from the Billing page.
Team sponsors can cover credits for members. If sponsor funds run out, the member's personal balance is used as fallback.
Generate Images
Generate images from text prompts using 20+ AI models. Select a model, set your aspect ratio, and click Generate.
Models: Gemini, Flux Pro/Dev/Schnell, DALL-E 3, GPT Image, SDXL, Ideogram, Recraft, Midjourney, and more. Each model shows supported features in the dropdown.
Reference images: Drag and drop onto the prompt area, paste from clipboard (Cmd+V), or click the attachment button. Up to 5 references supported.
Batch generation: Use the 1x/2x/4x/8x selector to generate multiple variations simultaneously.
Variations: Click the variation button on any result to generate alternatives that maintain the original composition.
Camera control: For supported models, specify shot types (close-up, wide, aerial, low angle, etc.) from the camera dropdown.
Prompt enhancement: Toggle the enhance button to let AI rewrite your prompt with detail, lighting, and model-specific optimizations.
Tips
Drag an image directly into the prompt area to use it as a style reference.
Use batch generation (4x or 8x) to explore different interpretations quickly.
Camera control works best with Flux and Ideogram models.
Paste images from clipboard with Cmd+V - no need to save files first.
Generation Modes
Text to image from any prompt
Image to image using a reference
Drag, paste, or upload reference images
Models and Quality
20+ models including Gemini, Flux, DALL-E, SDXL, and more
Aspect ratio selection (1:1, 16:9, 9:16, 4:3)
Resolution control for supported models
Camera style control for applicable models
Workflow
AI prompt enhancement to improve your descriptions
Batch generation of 1, 2, 4, or 8 images at once
Regenerate or create variations from any output
Reuse prompts across generations
Version history to track every iteration
Export and Integrations
Download images directly
Open in Photoshop or Photopea for further editing
Send directly to the Image Editor for AI refinement
How to Generate
Write a text prompt describing what you want to create. Select a model from the dropdown at the top of the prompt panel - each model displays its name, provider, and supported features. Choose an aspect ratio (1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3) and click Generate. Results appear in your feed.
Models
Over 20 models available: Gemini Imagen, Flux Pro, Flux Dev, Flux Schnell, DALL-E 3, GPT Image, Stable Diffusion XL, Ideogram, Recraft, Midjourney, and more. Each model has different strengths - photorealism, illustration, text rendering, speed, etc. The model dropdown shows which features each model supports.
Reference Images
Drag and drop an image directly onto the prompt area, paste from clipboard (Cmd+V / Ctrl+V), or click the attachment button to upload. You can attach up to 5 reference images to guide style, composition, or content. The AI uses your references as visual guides - matching colors, style, composition, or specific elements depending on the model.
Batch Generation
Use the batch selector (1x, 2x, 4x, 8x) to generate multiple variations from the same prompt simultaneously. Each variation uses the same settings but produces a unique interpretation. This is ideal for exploring different directions quickly before committing to a style.
Variations
Click the variation button on any generated image to create alternative versions. The AI uses your original image as a starting point and generates new interpretations while maintaining the core composition and style. This is different from regenerating - variations are intentionally similar to the source.
Camera Control
For supported models, the camera style dropdown lets you specify shot types: close-up, medium shot, wide shot, extreme wide, aerial/drone, low angle, high angle, over-the-shoulder, POV, Dutch angle, and more. This gives you cinematic framing without needing to describe it in your prompt text.
AI Prompt Enhancement
Toggle the enhance button (sparkle icon) to let AI rewrite your prompt before generation. Enhancement adds photographic detail, composition guidance, lighting descriptions, and model-specific optimizations automatically. The enhanced prompt is shown so you can learn what works well with each model.
Image Editor
AI-powered image editing with instruction editing, inpainting, object erasing, background removal, 3D camera control, cropping, and upscaling.
Instruction editing: Describe changes in natural language. The AI applies edits across the image without masking.
Inpainting: Paint a mask over the target area, then describe what should replace it. Adjust brush size with [ and ] keys.
Object erasing: Paint over any object to remove it. The AI fills the background naturally.
Background removal: One-click extraction to transparent background.
3D camera: Rotate and reposition the virtual camera to adjust perspective from a single image.
Multiple AI models available per editing mode. Version history preserves originals.
Shortcut
Action
B
Brush tool
E
Eraser
I
Inpaint
Cmd+Z
Undo
Cmd+Shift+Z
Redo
[ / ]
Decrease / Increase brush size
Cmd+S
Save
Space+Drag
Pan canvas
Cmd++ / Cmd+-
Zoom in / out
Editing Tools
Instruction-based editing: describe what you want changed
Inpainting with a brush mask and a prompt
Object erasing via painted mask selection
Background removal
3D camera movement and rotation control
Crop with preset ratios or custom dimensions
AI upscaling to enhance resolution
Brush and Mask
Adjustable brush size and softness
Add and subtract mask modes
One-click mask clear
Reference image support: drag, paste, or upload up to 5
Workflow
Multiple AI models available per editing mode
Resolution selection for supported models
Version history to preserve your originals
Photoshop and Photopea bridge integration
Instruction Editing
Describe what you want changed in natural language - no masking required. The AI interprets your instructions and applies edits across the entire image. Examples: "Make it nighttime", "Add snow on the mountains", "Change the color of the car to red".
Inpainting
Paint a mask over the area you want to change, then describe what should appear there. Adjust brush size with [ and ] keys. Use add/subtract mask modes for complex selections. The AI generates content that blends seamlessly with the surrounding area.
Object Erasing
Paint over any object to remove it. The AI fills in the background naturally, maintaining the scene coherence. Works best with objects that have clear boundaries against their background.
Background Removal
One-click background removal extracts the subject cleanly onto a transparent background. Useful for compositing, product photography, and creating assets for other projects.
3D Camera Control
Rotate and move the virtual camera in 3D space to adjust perspective. This creates parallax and depth effects from a single 2D image. Control pitch, yaw, and distance to reframe your shot without regenerating.
Cropping and Resizing
Crop with preset aspect ratios or custom dimensions. The crop tool supports standard ratios (1:1, 16:9, 4:3, etc.) and freeform cropping.
Upscaling
Enhance resolution using AI upscaling models. Increase image dimensions while preserving and enhancing detail. Multiple upscale models are available.
Generate Video
Generate video clips with text-to-video, image-to-video, video-to-video, and video extension.
Models: Sora, Veo 3.1, Kling, Runway, Seedance, Hailuo, Minimax, Luma, Pika, and more. Each model shows supported modes and duration options.
Image-to-video: Use any image as the first frame. Drag onto the prompt area, paste, or click the start frame slot. Some models support end frame for shot transitions.
Duration: 5s or 10s depending on the model. After generation, adjust speed non-destructively.
Post-generation: Extend clips, upscale to 4K with Topaz, add lip sync, adjust speed, or send to the Story editor.
Tips
Image-to-video produces more consistent results than text-to-video for specific scenes.
Extend works best when the last frame has clear motion direction.
Use Topaz upscaling for final delivery - dramatic quality improvement.
Generation Modes
Text to video from a prompt
Image to video with start frame control
Video to video on supported models
End frame support for precise shot transitions
Audio input on supported models
Models and Control
Leading models: Sora, Veo 3.1, Kling, Runway, Seedance, Hailuo, and more
Duration selection per model
Aspect ratio control (16:9, 9:16, 1:1, 4:3)
AI prompt enhancement
Workflow
Batch generation support
Regenerate and track versions
Prefill from other tools like the Storyboard
Text to Video
Describe the scene you want in your prompt. The AI generates a video clip matching your description. Most models produce 5-10 second clips at 720p-1080p. Use AI prompt enhancement for better cinematic results.
Image to Video
Use any generated or uploaded image as the first frame. Drag an image onto the prompt area, paste from clipboard, or click the start frame slot. Some models also support an end frame for precise shot transitions - the AI interpolates between your two frames.
Models
Leading video models: Sora, Veo 3.1, Kling 2.0, Runway Gen-4, Seedance, Hailuo Minimax, Luma Dream Machine, Pika, and more. Each model has different strengths in motion quality, prompt adherence, and style.
Extending Clips
Extend any generated video to add more frames. The AI continues the motion and scene from the last frame. Works best when the final frame has clear motion direction. You can extend multiple times to build longer sequences.
Upscaling to 4K
Upscale any video to 4K resolution using Topaz AI. The upscaler enhances detail and sharpness dramatically. Supports 10-bit HEVC (Main10) profile for HDR workflows. Also available as 2K intermediate upscale.
Duration and Speed
Choose clip length per model - typically 5s or 10s. After generation, adjust playback speed with slow-motion or speed-up controls. Speed changes are non-destructive and can be applied in the Story editor.
Generate Audio
Generate speech, music, sound effects, and clone voices from one interface.
Speech: Natural-sounding TTS with dozens of preset voices. Control emotion, pacing, and style.
Music: Describe mood, genre, and instrumentation. Generate tracks in any length for film, ads, or social.
Sound effects: Create foley SFX from text descriptions. Ambient, impacts, transitions, environmental.
Voice cloning: Upload a short sample and generate new speech in that voice. Providers include ElevenLabs, Hume Octave, Fish Audio.
Processing: Mix tracks, separate into stems (vocals, drums, bass), transcribe with speaker diarization.
Audio Types
Speech synthesis with custom voice selection
Sound effects generation
Music generation
Voice cloning and changer
Models and Styles
Multiple providers including ElevenLabs, Hume Octave, and more
AI prompt enhancement for style and emotion
Waveform visualization for quick previewing
Workflow
Batch generation
Regenerate and version history
Download audio directly
Speech Synthesis
Generate natural-sounding speech with custom voice selection. Choose from dozens of preset voices or clone your own. Control emotion, pacing, and style per voice provider.
Music Generation
Describe the mood, genre, and instrumentation you want. Generate background tracks for films, ads, or social content in any duration. Supports multiple music generation models.
Sound Effects
Create foley SFX from text descriptions. Great for ambient sounds, impacts, transitions, and environmental audio for video projects. Can also generate SFX from video input on supported models.
Voice Cloning
Upload a short audio sample and the AI learns the voice. Then generate new speech in that voice with any text. Multiple providers available including ElevenLabs, Hume Octave, and Fish Audio.
Stem Separation
Split any audio track into individual stems: vocals, drums, bass, and other instruments. Useful for remixing, isolating dialogue, or creating karaoke versions.
Transcription
Convert speech to text with speaker diarization (identifying who said what). Supports multiple languages and outputs timestamped transcripts.
Canvas
Infinite zoomable workspace for brainstorming, moodboarding, and planning. Arrange images, videos, audio, sticky notes, shapes, and text labels on a freeform surface.
Elements: Sticky notes (8 colors), text labels, media embeds, shapes (rectangle, circle, diamond, star, arrow), and connector arrows.
AI generation: Generate images directly on canvas at your viewport position.
Collaboration: Real-time editing with live cursors. Group elements with frames.
Export: Save as PDF or image.
Shortcut
Action
T
Text tool
R
Shape tool (rectangle)
L
Connector (line) tool
N
Sticky note tool
Delete / Backspace
Delete selected elements
Escape
Deselect all / cancel active tool
Cmd+G
Group selected elements
Cmd+Shift+G
Ungroup selected elements
Right-click drag
Pan canvas
Scroll
Pan (trackpad) or Zoom (mouse wheel / Ctrl+scroll)
Elements
Sticky notes with 8 color presets
Text labels with customizable font and weight
Images, video, and audio embeds
Shapes including rectangle, circle, diamond, star, and arrow
Connector arrows between any elements
AI and Collaboration
Generate images directly on the canvas
Real-time collaboration with live cursors
Group elements together with frames
Canvas Tools
Infinite zoomable surface
Drag, resize, and rotate any element
Export as PDF or image
Elements
Place sticky notes (8 color presets: yellow, pink, blue, green, orange, purple, red, gray), text labels with customizable fonts and weights, images/video/audio embeds, and shapes (rectangle, circle, diamond, star, arrow). Each element can be repositioned, resized, and styled independently.
Tool Shortcuts
Press T for the text tool, R for shape (rectangle), L for connector (line) tool, and N for sticky note. Press Escape to cancel the active tool or deselect all elements. Click on the canvas to place the element. For shapes, drag to set the size.
Connectors
Draw arrows between any two elements by activating the connector tool (L key) and dragging from one element to another. Connectors follow the elements when they are moved, creating flow charts and relationship diagrams.
Drag and Drop
Drag images, videos, or audio files directly from your file system onto the canvas. You can also paste media from the clipboard (Cmd+V). Dropped files are uploaded and placed at the drop location. Drag media from the FileBrowser sidebar to place them.
Grouping
Select multiple elements, then press Cmd+G to group them into a frame. Grouped elements move together. Press Cmd+Shift+G to ungroup. Right-click for additional context menu options.
Export
Export your canvas as a PDF or image for presentations and sharing. The export captures all visible elements at high resolution.
Storyboard & Video Editor
Organize shots into scenes, tag characters for consistency, and generate storyboards with AI. Includes a full video timeline editor.
Story structure: Scenes contain shots. Each shot has its own prompt, model, and generated media. Character library with @mention tagging for visual consistency.
Timeline: Multi-shot editor with trim sliders, per-shot speed control, and real-time preview. Apply AI tools directly: upscale, lip sync, voice changer, SFX.
Audio: Up to 5 concurrent tracks. Generate, browse, or upload.
Export: MP4 with optional 4K Topaz upscaling, CapCut project files, or Premiere Pro XML.
Story Structure
Organize shots into ordered scenes
Character library with @mention tagging for consistency
Location tagging per scene
Scene reference images, auto-generated or uploaded
Generation
Per-shot image generation with AI models
Per-shot video generation with start frame control
Lip sync generation for speech-driven shots
Auto-generate toggle for batch creation
Timeline Editing
Multi-shot timeline with trim sliders
Per-shot playback speed control
Shot navigation with previous and next controls
Real-time preview with play and pause
AI Tools
Upscale video to higher resolution
Lip sync using an attached audio file
Generate audio or SFX directly from the editor
Voice changer powered by ElevenLabs, Hume, and Fish Audio
Run any saved workflow directly on your video
Audio and Export
Up to 5 concurrent audio tracks
Add to or replace existing audio
Export to MP4 with optional 4K upscaling
Export to CapCut or Premiere Pro
Scenes and Shots
Projects are organized into scenes, and scenes contain individual shots. Each shot has its own prompt, model selection, and generated media. Reorder shots by dragging, add new scenes to organize narrative beats, and collapse scenes to focus on specific parts of your project.
Character Consistency
Define characters in the character library with a name, description, and reference images. Use @mention tagging in shot prompts to reference characters - the AI maintains visual consistency across all shots that reference the same character. Add location tags for environmental consistency.
Per-Shot Generation
Generate images or videos for each shot individually. For video shots, use start frame control to ensure smooth transitions between shots. Toggle auto-generate to batch create media across all shots in a scene automatically.
Timeline Editor
The timeline shows all shots sequentially with trim sliders for each. Adjust per-shot speed, navigate between shots with previous/next controls, and preview the full sequence with play/pause. The timeline updates in real-time as you generate new media.
Audio Tracks
Add up to 5 concurrent audio tracks. Browse your audio library, generate music/SFX/speech inline, or upload audio files. Each track can be positioned independently on the timeline. Add to or replace existing audio per shot.
AI Tools
Apply AI enhancements directly from the editor: upscale video to higher resolution, add lip sync from an audio file, change voices with ElevenLabs/Hume/Fish Audio, generate SFX, or run any saved workflow on selected clips.
Export Formats
Export your completed project as high-quality MP4 with optional 4K Topaz upscaling, CapCut project files for further editing, or Premiere Pro XML for professional post-production workflows.
Node Editor
Visual pipeline builder. Connect nodes in a graph editor to chain 65+ node types and 100+ AI models into reusable workflows.
Building: Drag nodes from the palette or press Tab for Quick Add. Connect outputs to inputs. Configure parameters per node. Run to test.
Publishing: Save any workflow as an API endpoint. Enable it, copy the Endpoint ID, call via REST. This powers all Sequencer plugins.
Node categories: Image, Video, Audio, 3D, Text/LLM, Logic, I/O, Effects. Plus 19 video transition types and sticky notes for documentation.
Shortcut
Action
Tab
Open Quick Add node menu at cursor position
F
Fit all nodes in view (animated zoom to fit)
Delete / Backspace
Remove selected nodes
Cmd+C
Copy selected nodes
Cmd+V
Paste nodes at cursor position
Cmd+D
Duplicate selected nodes
Shift+Click
Multi-select nodes
Scroll
Pan (trackpad) or Zoom (mouse wheel / Ctrl+scroll)
Node Types
65+ node types across generation, processing, and logic categories
Generate image, video, audio, 3D models, and text via LLM
Video: trim, split, stitch, adjust speed, upscale, extend, and composite
Audio: extract, separate stems, mix tracks, and replace audio
Image: inpaint, erase, mask, color grade, composite, and convert
Models and Logic
100+ AI models available as nodes
Logic nodes for math, comparisons, array packing, and more
HTTP request node for calling external APIs
LLM text generation with structured output support
Automation and Sharing
Save any workflow as a reusable API endpoint
Publish and share workflows publicly
19 video transition types including morph, fade, and dissolve
Sticky notes to document your workflow
Adding Nodes
Press Tab anywhere on the canvas to open the Quick Add menu at your cursor position. The menu shows all available node types grouped by category (Image, Video, Audio, 3D, Text, Logic, I/O, Effects). Search by typing to filter. Press Enter to add the selected node. You can also right-click the canvas to open the full node palette.
Connecting Nodes
Drag from an output socket (right side of a node) to an input socket (left side of another node) to create a connection. Sockets are color-coded by type: string, number, boolean, image, video, audio, 3D, etc. Only compatible types can be connected. Connections are drawn as curved lines showing data flow direction.
Quick Add from Output
Drag from an output socket into empty space to open the Quick Add menu, pre-filtered to show only nodes with compatible input sockets. Select a node and it will be created and automatically connected to the output you dragged from. This is the fastest way to build a chain of nodes.
Node Type Reference
Image nodes: Generate, Inpaint, Erase, Variations, Variation Grid, Depth Estimation, ControlNet, Composition, Image Filter, Convert Image, Paint Mask. Video nodes: Generate, Extend, Edit, Stitch, Split, Speed, Merge, Upscale, Timeline Composite, Timeline Editor, Lip Sync, Media Pad, Convert Video. Audio nodes: Generate SFX, Extract Audio, Separate Audio, Mix Audio, Clone Voice, Generate Voice, Transcribe, Split Timestamps. 3D nodes: Generate 3D, Render 3D. Text nodes: LLM Text (GPT/Claude/Gemini), String Join, String Replace, Separate Text. Logic nodes: Math, Compare, Select Index, Pack Array, Pluck Array, Map Entities. I/O nodes: Constants (Text, Number, Bool, Media), Webhook Start, Return Data, HTTP Request, File Download, Reroute. Effects nodes: Color Grade, Transform, Media Pad. Utility nodes: Sticky Note, Media Viewer, Raw Output, API Test.
Node Groups
Select multiple nodes and group them together with a labeled colored frame. Groups help organize complex workflows visually. Each group has a customizable label and color. Nodes within a group can still be individually selected and configured.
Running and Debugging
Click Run to execute the workflow. Nodes execute in dependency order - upstream nodes complete before downstream nodes start. Each node shows its status: idle, pending, running, completed, or error. Errors display inline with the failed node. The debug panel shows execution details and timing. A watchdog automatically detects stuck nodes and marks them as timed out.
Publishing as API
Click Publish to save your workflow as a reusable API endpoint. Enable the endpoint, copy the Endpoint ID, and call it via REST API from any application. Published workflows can be shared publicly in the gallery. Viewers can clone published workflows into their own account if the creator allows it.
Batch Execution
Nodes that support batch execution can process multiple inputs in parallel. The batch status shows total items, completed count, and individual results. Failed items within a batch do not block other items from completing.
Run Workflow (API)
Copy
POST /run/{endpointId} { "prompt": "A cinematic shot...", "style": "photorealistic" } // Response { "runId": "abc123", "status": "accepted", "pollUrl": "https://api.sequencer.media/status/{endpointId}/abc123" }