Reference Image to Video | Sequencer Academy

Back to Tutorials

Workflow

Reference Image to Video

Turn still images into cinematic AI-generated video. Using Veo 3.1's Reference Image mode, you can feed up to three reference images into a single generation. The AI understands the objects, style, and composition from your images and brings them to life.

Step 1: Add Your Reference Images

Start by dragging and dropping your reference images onto the workflow canvas. Each image you drop automatically creates an Image node. For this tutorial we are using three product images of leather goods: boots, a backpack, and a duffle bag.

You can use up to three reference images per generation. More images give the AI more context about the style, materials, and environment you want in the final video.

Step 2: Create a Generate Video Node

Add a Generate Video node to the canvas. Set the model to Veo 3.1 and change the mode to Reference Images. This unlocks three input slots: Reference Image 1, Reference Image 2, and Reference Image 3.

Connect each of your Image nodes to a Reference Image input on the Generate Video node. The order does not matter. The AI analyzes all three images together to understand your visual intent.

Three Image nodes connected to a Generate Video node in Reference Image mode.

Step 3: Write a Prompt and Generate

Add a Text node with your prompt and connect it to the Prompt input. Describe the motion, camera, and mood you want. The AI will combine your reference images with the prompt to generate the video.

Set your desired Duration and Aspect Ratio, then hit run. Veo 3.1 will generate a video that incorporates the objects, textures, and environment from your reference images.

In our example, the three leather product images produce a cinematic product transformation video where the items morph and animate on a workshop table.

Pro Tip: Extend with Loopback

Infinite Extension

Want a longer video? Use an Extract Image node to pull the last frame from your generated video. Connect that frame to a new Generate Video node as a reference image. Generate again. Now stitch the two videos together for a seamless extended clip. Repeat as many times as you want.

This loopback technique works because the AI uses the extracted frame as visual context. It naturally continues the motion and style from where the previous generation ended.

You can also extract the first frame from the generated video and use it as a reference for a completely new generation with a different prompt. This gives you a whole family of videos that share the same visual DNA.

Getting Better Results

Use High-Quality Images

The better your reference images, the better the output. Use sharp, well-lit images with clear subjects. Avoid blurry or low-resolution inputs.

Keep a Consistent Style

Reference images that share similar lighting, color palette, and visual style produce more cohesive results. In this tutorial, all three images share a warm leather workshop aesthetic.

Be Specific in Your Prompt

The prompt guides the motion and composition. Describe camera movement, lighting conditions, and the action you want. The reference images handle the visual style.

Try the Workflow

We built a public workflow that implements this exact pipeline. Clone it, swap in your own images, and start generating.