Documentation | Sequencer

Back

Creative Tools

API Reference

Introduction

Sequencer is a creative platform for AI filmmaking. Generate images, video, audio, and 3D assets from 100+ AI models. Edit, compose, and export from one interface.

This documentation covers every tool in the platform and the complete REST API for building integrations.

Base URL

Copy

https://api.sequencer.media

Credits & Billing

Sequencer uses pay-per-use credits. Each operation costs a set amount based on the model and output type. Your balance is shown in the top-right of the dashboard.

Credits are consumed on successful completion only. Failed generations are not charged. Top up anytime from the Billing page.

Team sponsors can cover credits for members. If sponsor funds run out, the member's personal balance is used as fallback.

Generate Images

Generate images from text prompts using 20+ AI models. Select a model, set your aspect ratio, and click Generate.

Models: Gemini, Flux Pro/Dev/Schnell, DALL-E 3, GPT Image, SDXL, Ideogram, Recraft, Midjourney, and more. Each model shows supported features in the dropdown.

Reference images: Drag and drop onto the prompt area, paste from clipboard (Cmd+V), or click the attachment button. Up to 5 references supported.

Batch generation: Use the 1x/2x/4x/8x selector to generate multiple variations simultaneously.

Variations: Click the variation button on any result to generate alternatives that maintain the original composition.

Camera control: For supported models, specify shot types (close-up, wide, aerial, low angle, etc.) from the camera dropdown.

Prompt enhancement: Toggle the enhance button to let AI rewrite your prompt with detail, lighting, and model-specific optimizations.

Tips

•

Drag an image directly into the prompt area to use it as a style reference.

•

Use batch generation (4x or 8x) to explore different interpretations quickly.

•

Camera control works best with Flux and Ideogram models.

•

Paste images from clipboard with Cmd+V - no need to save files first.

Generation Modes

Text to image from any prompt

Image to image using a reference

Drag, paste, or upload reference images

Models and Quality

20+ models including Gemini, Flux, DALL-E, SDXL, and more

Aspect ratio selection (1:1, 16:9, 9:16, 4:3)

Resolution control for supported models

Camera style control for applicable models

Workflow

AI prompt enhancement to improve your descriptions

Batch generation of 1, 2, 4, or 8 images at once

Regenerate or create variations from any output

Reuse prompts across generations

Version history to track every iteration

Export and Integrations

Download images directly

Open in Photoshop or Photopea for further editing

Send directly to the Image Editor for AI refinement

How to Generate

Write a text prompt describing what you want to create. Select a model from the dropdown at the top of the prompt panel - each model displays its name, provider, and supported features. Choose an aspect ratio (1:1, 16:9, 9:16, 4:3, 3:4, 3:2, 2:3) and click Generate. Results appear in your feed.

Models

Over 20 models available: Gemini Imagen, Flux Pro, Flux Dev, Flux Schnell, DALL-E 3, GPT Image, Stable Diffusion XL, Ideogram, Recraft, Midjourney, and more. Each model has different strengths - photorealism, illustration, text rendering, speed, etc. The model dropdown shows which features each model supports.

Reference Images

Drag and drop an image directly onto the prompt area, paste from clipboard (Cmd+V / Ctrl+V), or click the attachment button to upload. You can attach up to 5 reference images to guide style, composition, or content. The AI uses your references as visual guides - matching colors, style, composition, or specific elements depending on the model.

Batch Generation

Use the batch selector (1x, 2x, 4x, 8x) to generate multiple variations from the same prompt simultaneously. Each variation uses the same settings but produces a unique interpretation. This is ideal for exploring different directions quickly before committing to a style.

Variations

Click the variation button on any generated image to create alternative versions. The AI uses your original image as a starting point and generates new interpretations while maintaining the core composition and style. This is different from regenerating - variations are intentionally similar to the source.

Camera Control

For supported models, the camera style dropdown lets you specify shot types: close-up, medium shot, wide shot, extreme wide, aerial/drone, low angle, high angle, over-the-shoulder, POV, Dutch angle, and more. This gives you cinematic framing without needing to describe it in your prompt text.

AI Prompt Enhancement

Toggle the enhance button (sparkle icon) to let AI rewrite your prompt before generation. Enhancement adds photographic detail, composition guidance, lighting descriptions, and model-specific optimizations automatically. The enhanced prompt is shown so you can learn what works well with each model.

Image Editor

AI-powered image editing with instruction editing, inpainting, object erasing, background removal, 3D camera control, cropping, and upscaling.

Instruction editing: Describe changes in natural language. The AI applies edits across the image without masking.

Inpainting: Paint a mask over the target area, then describe what should replace it. Adjust brush size with [ and ] keys.

Object erasing: Paint over any object to remove it. The AI fills the background naturally.

Background removal: One-click extraction to transparent background.

3D camera: Rotate and reposition the virtual camera to adjust perspective from a single image.

Multiple AI models available per editing mode. Version history preserves originals.

Shortcut

Action

Brush tool

Eraser

Inpaint

Cmd+Z

Undo

Cmd+Shift+Z

Redo

[ / ]

Decrease / Increase brush size

Cmd+S

Save

Space+Drag

Pan canvas

Cmd++ / Cmd+-

Zoom in / out

Editing Tools

Instruction-based editing: describe what you want changed

Inpainting with a brush mask and a prompt

Object erasing via painted mask selection

Background removal

3D camera movement and rotation control

Crop with preset ratios or custom dimensions

AI upscaling to enhance resolution

Brush and Mask

Adjustable brush size and softness

Add and subtract mask modes

One-click mask clear

Reference image support: drag, paste, or upload up to 5

Workflow

Multiple AI models available per editing mode

Resolution selection for supported models

Version history to preserve your originals

Photoshop and Photopea bridge integration

Instruction Editing

Describe what you want changed in natural language - no masking required. The AI interprets your instructions and applies edits across the entire image. Examples: "Make it nighttime", "Add snow on the mountains", "Change the color of the car to red".

Inpainting

Paint a mask over the area you want to change, then describe what should appear there. Adjust brush size with [ and ] keys. Use add/subtract mask modes for complex selections. The AI generates content that blends seamlessly with the surrounding area.

Object Erasing

Paint over any object to remove it. The AI fills in the background naturally, maintaining the scene coherence. Works best with objects that have clear boundaries against their background.

Background Removal

One-click background removal extracts the subject cleanly onto a transparent background. Useful for compositing, product photography, and creating assets for other projects.

3D Camera Control

Rotate and move the virtual camera in 3D space to adjust perspective. This creates parallax and depth effects from a single 2D image. Control pitch, yaw, and distance to reframe your shot without regenerating.

Cropping and Resizing

Crop with preset aspect ratios or custom dimensions. The crop tool supports standard ratios (1:1, 16:9, 4:3, etc.) and freeform cropping.

Upscaling

Enhance resolution using AI upscaling models. Increase image dimensions while preserving and enhancing detail. Multiple upscale models are available.

Generate Video

Generate video clips with text-to-video, image-to-video, video-to-video, and video extension.

Models: Sora, Veo 3.1, Kling, Runway, Seedance, Hailuo, Minimax, Luma, Pika, and more. Each model shows supported modes and duration options.

Image-to-video: Use any image as the first frame. Drag onto the prompt area, paste, or click the start frame slot. Some models support end frame for shot transitions.

Duration: 5s or 10s depending on the model. After generation, adjust speed non-destructively.

Post-generation: Extend clips, upscale to 4K with Topaz, add lip sync, adjust speed, or send to the Story editor.

Tips

•

Image-to-video produces more consistent results than text-to-video for specific scenes.

•

Extend works best when the last frame has clear motion direction.

•

Use Topaz upscaling for final delivery - dramatic quality improvement.

Generation Modes

Text to video from a prompt

Image to video with start frame control

Video to video on supported models

End frame support for precise shot transitions

Audio input on supported models

Models and Control

Leading models: Sora, Veo 3.1, Kling, Runway, Seedance, Hailuo, and more

Duration selection per model

Aspect ratio control (16:9, 9:16, 1:1, 4:3)

AI prompt enhancement

Workflow

Batch generation support

Regenerate and track versions

Prefill from other tools like the Storyboard

Text to Video

Describe the scene you want in your prompt. The AI generates a video clip matching your description. Most models produce 5-10 second clips at 720p-1080p. Use AI prompt enhancement for better cinematic results.

Image to Video

Use any generated or uploaded image as the first frame. Drag an image onto the prompt area, paste from clipboard, or click the start frame slot. Some models also support an end frame for precise shot transitions - the AI interpolates between your two frames.

Models

Leading video models: Sora, Veo 3.1, Kling 2.0, Runway Gen-4, Seedance, Hailuo Minimax, Luma Dream Machine, Pika, and more. Each model has different strengths in motion quality, prompt adherence, and style.

Extending Clips

Extend any generated video to add more frames. The AI continues the motion and scene from the last frame. Works best when the final frame has clear motion direction. You can extend multiple times to build longer sequences.

Upscaling to 4K

Upscale any video to 4K resolution using Topaz AI. The upscaler enhances detail and sharpness dramatically. Supports 10-bit HEVC (Main10) profile for HDR workflows. Also available as 2K intermediate upscale.

Duration and Speed

Choose clip length per model - typically 5s or 10s. After generation, adjust playback speed with slow-motion or speed-up controls. Speed changes are non-destructive and can be applied in the Story editor.

Generate Audio

Generate speech, music, sound effects, and clone voices from one interface.

Speech: Natural-sounding TTS with dozens of preset voices. Control emotion, pacing, and style.

Music: Describe mood, genre, and instrumentation. Generate tracks in any length for film, ads, or social.

Sound effects: Create foley SFX from text descriptions. Ambient, impacts, transitions, environmental.

Voice cloning: Upload a short sample and generate new speech in that voice. Providers include ElevenLabs, Hume Octave, Fish Audio.

Processing: Mix tracks, separate into stems (vocals, drums, bass), transcribe with speaker diarization.

Audio Types

Speech synthesis with custom voice selection

Sound effects generation

Music generation

Voice cloning and changer

Models and Styles

Multiple providers including ElevenLabs, Hume Octave, and more

AI prompt enhancement for style and emotion

Waveform visualization for quick previewing

Workflow

Batch generation

Regenerate and version history

Download audio directly

Speech Synthesis

Generate natural-sounding speech with custom voice selection. Choose from dozens of preset voices or clone your own. Control emotion, pacing, and style per voice provider.

Music Generation

Describe the mood, genre, and instrumentation you want. Generate background tracks for films, ads, or social content in any duration. Supports multiple music generation models.

Sound Effects

Create foley SFX from text descriptions. Great for ambient sounds, impacts, transitions, and environmental audio for video projects. Can also generate SFX from video input on supported models.

Voice Cloning

Upload a short audio sample and the AI learns the voice. Then generate new speech in that voice with any text. Multiple providers available including ElevenLabs, Hume Octave, and Fish Audio.

Stem Separation

Split any audio track into individual stems: vocals, drums, bass, and other instruments. Useful for remixing, isolating dialogue, or creating karaoke versions.

Transcription

Convert speech to text with speaker diarization (identifying who said what). Supports multiple languages and outputs timestamped transcripts.

Canvas

Infinite zoomable workspace for brainstorming, moodboarding, and planning. Arrange images, videos, audio, sticky notes, shapes, and text labels on a freeform surface.

Elements: Sticky notes (8 colors), text labels, media embeds, shapes (rectangle, circle, diamond, star, arrow), and connector arrows.

AI generation: Generate images directly on canvas at your viewport position.

Collaboration: Real-time editing with live cursors. Group elements with frames.

Export: Save as PDF or image.

Shortcut

Action

Text tool

Shape tool (rectangle)

Connector (line) tool

Sticky note tool

Delete / Backspace

Delete selected elements

Escape

Deselect all / cancel active tool

Cmd+G

Group selected elements

Cmd+Shift+G

Ungroup selected elements

Right-click drag

Pan canvas

Scroll

Pan (trackpad) or Zoom (mouse wheel / Ctrl+scroll)

Elements

Sticky notes with 8 color presets

Text labels with customizable font and weight

Images, video, and audio embeds

Shapes including rectangle, circle, diamond, star, and arrow

Connector arrows between any elements

AI and Collaboration

Generate images directly on the canvas

Real-time collaboration with live cursors

Group elements together with frames

Canvas Tools

Infinite zoomable surface

Drag, resize, and rotate any element

Export as PDF or image

Elements

Place sticky notes (8 color presets: yellow, pink, blue, green, orange, purple, red, gray), text labels with customizable fonts and weights, images/video/audio embeds, and shapes (rectangle, circle, diamond, star, arrow). Each element can be repositioned, resized, and styled independently.

Tool Shortcuts

Press T for the text tool, R for shape (rectangle), L for connector (line) tool, and N for sticky note. Press Escape to cancel the active tool or deselect all elements. Click on the canvas to place the element. For shapes, drag to set the size.

Connectors

Draw arrows between any two elements by activating the connector tool (L key) and dragging from one element to another. Connectors follow the elements when they are moved, creating flow charts and relationship diagrams.

Drag and Drop

Drag images, videos, or audio files directly from your file system onto the canvas. You can also paste media from the clipboard (Cmd+V). Dropped files are uploaded and placed at the drop location. Drag media from the FileBrowser sidebar to place them.

Grouping

Select multiple elements, then press Cmd+G to group them into a frame. Grouped elements move together. Press Cmd+Shift+G to ungroup. Right-click for additional context menu options.

Export

Export your canvas as a PDF or image for presentations and sharing. The export captures all visible elements at high resolution.

Storyboard & Video Editor

Organize shots into scenes, tag characters for consistency, and generate storyboards with AI. Includes a full video timeline editor.

Story structure: Scenes contain shots. Each shot has its own prompt, model, and generated media. Character library with @mention tagging for visual consistency.

Timeline: Multi-shot editor with trim sliders, per-shot speed control, and real-time preview. Apply AI tools directly: upscale, lip sync, voice changer, SFX.

Audio: Up to 5 concurrent tracks. Generate, browse, or upload.

Export: MP4 with optional 4K Topaz upscaling, CapCut project files, or Premiere Pro XML.

Story Structure

Organize shots into ordered scenes

Character library with @mention tagging for consistency

Location tagging per scene

Scene reference images, auto-generated or uploaded

Generation

Per-shot image generation with AI models

Per-shot video generation with start frame control

Lip sync generation for speech-driven shots

Auto-generate toggle for batch creation

Timeline Editing

Multi-shot timeline with trim sliders

Per-shot playback speed control

Shot navigation with previous and next controls

Real-time preview with play and pause

AI Tools

Upscale video to higher resolution

Lip sync using an attached audio file

Generate audio or SFX directly from the editor

Voice changer powered by ElevenLabs, Hume, and Fish Audio

Run any saved workflow directly on your video

Audio and Export

Up to 5 concurrent audio tracks

Add to or replace existing audio

Export to MP4 with optional 4K upscaling

Export to CapCut or Premiere Pro

Scenes and Shots

Projects are organized into scenes, and scenes contain individual shots. Each shot has its own prompt, model selection, and generated media. Reorder shots by dragging, add new scenes to organize narrative beats, and collapse scenes to focus on specific parts of your project.

Character Consistency

Define characters in the character library with a name, description, and reference images. Use @mention tagging in shot prompts to reference characters - the AI maintains visual consistency across all shots that reference the same character. Add location tags for environmental consistency.

Per-Shot Generation

Generate images or videos for each shot individually. For video shots, use start frame control to ensure smooth transitions between shots. Toggle auto-generate to batch create media across all shots in a scene automatically.

Timeline Editor

The timeline shows all shots sequentially with trim sliders for each. Adjust per-shot speed, navigate between shots with previous/next controls, and preview the full sequence with play/pause. The timeline updates in real-time as you generate new media.

Audio Tracks

Add up to 5 concurrent audio tracks. Browse your audio library, generate music/SFX/speech inline, or upload audio files. Each track can be positioned independently on the timeline. Add to or replace existing audio per shot.

AI Tools

Apply AI enhancements directly from the editor: upscale video to higher resolution, add lip sync from an audio file, change voices with ElevenLabs/Hume/Fish Audio, generate SFX, or run any saved workflow on selected clips.

Export Formats

Export your completed project as high-quality MP4 with optional 4K Topaz upscaling, CapCut project files for further editing, or Premiere Pro XML for professional post-production workflows.

Node Editor

Visual pipeline builder. Connect nodes in a graph editor to chain 65+ node types and 100+ AI models into reusable workflows.

Building: Drag nodes from the palette or press Tab for Quick Add. Connect outputs to inputs. Configure parameters per node. Run to test.

Publishing: Save any workflow as an API endpoint. Enable it, copy the Endpoint ID, call via REST. This powers all Sequencer plugins.

Node categories: Image, Video, Audio, 3D, Text/LLM, Logic, I/O, Effects. Plus 19 video transition types and sticky notes for documentation.

Shortcut

Action

Tab

Open Quick Add node menu at cursor position

Fit all nodes in view (animated zoom to fit)

Delete / Backspace

Remove selected nodes

Cmd+C

Copy selected nodes

Cmd+V

Paste nodes at cursor position

Cmd+D

Duplicate selected nodes

Shift+Click

Multi-select nodes

Scroll

Pan (trackpad) or Zoom (mouse wheel / Ctrl+scroll)

Node Types

65+ node types across generation, processing, and logic categories

Generate image, video, audio, 3D models, and text via LLM

Video: trim, split, stitch, adjust speed, upscale, extend, and composite

Audio: extract, separate stems, mix tracks, and replace audio

Image: inpaint, erase, mask, color grade, composite, and convert

Models and Logic

100+ AI models available as nodes

Logic nodes for math, comparisons, array packing, and more

HTTP request node for calling external APIs

LLM text generation with structured output support

Automation and Sharing

Save any workflow as a reusable API endpoint

Publish and share workflows publicly

19 video transition types including morph, fade, and dissolve

Sticky notes to document your workflow

Adding Nodes

Press Tab anywhere on the canvas to open the Quick Add menu at your cursor position. The menu shows all available node types grouped by category (Image, Video, Audio, 3D, Text, Logic, I/O, Effects). Search by typing to filter. Press Enter to add the selected node. You can also right-click the canvas to open the full node palette.

Connecting Nodes

Drag from an output socket (right side of a node) to an input socket (left side of another node) to create a connection. Sockets are color-coded by type: string, number, boolean, image, video, audio, 3D, etc. Only compatible types can be connected. Connections are drawn as curved lines showing data flow direction.

Quick Add from Output

Drag from an output socket into empty space to open the Quick Add menu, pre-filtered to show only nodes with compatible input sockets. Select a node and it will be created and automatically connected to the output you dragged from. This is the fastest way to build a chain of nodes.

Node Type Reference

Image nodes: Generate, Inpaint, Erase, Variations, Variation Grid, Depth Estimation, ControlNet, Composition, Image Filter, Convert Image, Paint Mask. Video nodes: Generate, Extend, Edit, Stitch, Split, Speed, Merge, Upscale, Timeline Composite, Timeline Editor, Lip Sync, Media Pad, Convert Video. Audio nodes: Generate SFX, Extract Audio, Separate Audio, Mix Audio, Clone Voice, Generate Voice, Transcribe, Split Timestamps. 3D nodes: Generate 3D, Render 3D. Text nodes: LLM Text (GPT/Claude/Gemini), String Join, String Replace, Separate Text. Logic nodes: Math, Compare, Select Index, Pack Array, Pluck Array, Map Entities. I/O nodes: Constants (Text, Number, Bool, Media), Webhook Start, Return Data, HTTP Request, File Download, Reroute. Effects nodes: Color Grade, Transform, Media Pad. Utility nodes: Sticky Note, Media Viewer, Raw Output, API Test.

Node Groups

Select multiple nodes and group them together with a labeled colored frame. Groups help organize complex workflows visually. Each group has a customizable label and color. Nodes within a group can still be individually selected and configured.

Running and Debugging

Click Run to execute the workflow. Nodes execute in dependency order - upstream nodes complete before downstream nodes start. Each node shows its status: idle, pending, running, completed, or error. Errors display inline with the failed node. The debug panel shows execution details and timing. A watchdog automatically detects stuck nodes and marks them as timed out.

Publishing as API

Click Publish to save your workflow as a reusable API endpoint. Enable the endpoint, copy the Endpoint ID, and call it via REST API from any application. Published workflows can be shared publicly in the gallery. Viewers can clone published workflows into their own account if the creator allows it.

Batch Execution

Nodes that support batch execution can process multiple inputs in parallel. The batch status shows total items, completed count, and individual results. Failed items within a batch do not block other items from completing.

Run Workflow (API)

Copy

POST /run/{endpointId} { "prompt": "A cinematic shot...", "style": "photorealistic" } // Response { "runId": "abc123", "status": "accepted", "pollUrl": "https://api.sequencer.media/status/{endpointId}/abc123" }