JSON prompting gives you far more accurate outputs than natural language prompts. This structured approach lets you control every detail of your video generation with precision.
Veo 3.1 Prompting Best Practices
Veo 3.1 is all about creative control. Here's the core formula for effective prompts:
Cinematography: Camera work and shot composition like dolly, crane, tracking, POV
Subject: Your main focal point with specific details
Action: What's happening and when
Context: Environment and background elements
Style & Ambiance: Mood, lighting, and overall aesthetic
What is JSON Prompting?
JSON (JavaScript Object Notation) is a structured data format that breaks down prompts into key-value pairs. Instead of writing a paragraph, you define each component separately: description, style, camera, lighting, audio, and sequences.
This gives you precise control over elements that would otherwise be loosely interpreted.
Why Use JSON Prompting?
Enhanced Consistency
Keep elements consistent across multiple shots. Perfect for long-form content that needs unified style, lighting, and color.
Granular Control
Control every visual, sound, and environmental detail. No more hoping the AI gets what you want.
Efficient Iteration
Adjust one parameter without rewriting your entire prompt. Quick tweaks, fast results.
Timestamp Sequencing
Define precise actions at specific timestamps. Perfect for orchestrating complex multi-phase shots.
Case Study: Nike Vaporfly Commercial
Here's a complete JSON prompt for a high-end product commercial. This creates an infinite macro zoom that flies inside the shoe materials. Normally this would require expensive probe lenses and CGI.
Copy
{
"description": "A single, unbroken continuous camera shot with NO cuts. The camera performs an infinite macro zoom, starting wide on the Nike Vaporfly 3 and flying physically INSIDE the materials, dodging fibers and mist before bursting out the other side.",
"style": "High-end CGI, hyper-realistic macro cinematography, 4K, kinetic energy, cold atmosphere",
"camera": "Continuous forward dolly, infinite macro zoom, simulated probe lens, obstacle-dodging motion, no cuts",
"lighting": "Clean studio lighting transitioning to internal atmospheric cold blue light, ending with warm spotlight",
"environment": "Minimalist studio -> Internal Microscopic World -> Concrete Wall",
"elements": [
"Nike Vaporfly 3 (White Flyknit upper, orange midsole)",
"Tan synthetic mesh fibers (giant scale)",
"Cold white vapor/mist",
"Carbon Fiber Flyplate (black rigid weave)",
"ZoomX Foam (compressed white pebbles/beads)",
"Orange Nike Swoosh Logo"
],
"motion": "Fast, fluid, obstacle-dodging, plunging downward, bursting forward",
"text": "none",
"music": "Whoosh of air, icy wind sound, mechanical crunch of carbon, squeak of foam, heavy bass impact",
"sequences": [
{
"sequence": 1,
"timestamp": "00:00-00:08",
"action": "In one continuous, unbroken take with no cuts, the camera starts on a completely static profile of the Nike Vaporfly 3 and accelerates rapidly forward, shrinking to microscopic scale to fly directly into the tan mesh upper. Inside the fabric, the camera weaves and dodges around giant tan fiber ropes while cold white vapor gusts swirl through the gaps. The camera then dives sharply downward, skimming fast over the rigid, black cross-hatch texture of the Carbon Fiber Flyplate, before pushing aggressively through the white ZoomX foam core, squeezing past tight clusters of inflated white foam beads. Finally, the camera bursts out the other side, pulling back to reveal the orange Nike Swoosh logo painted on a textured concrete wall.",
"audio": "Sonic boom buildup, rushing wind, icy gusts, metallic hum, squeaky rubber friction, final explosion boom"
}
]
}
Notice how everything is explicitly defined. The camera never cuts, lighting transitions match the journey, elements list exact colors, and audio evolves with the visuals.
JSON Prompting Best Practices
Be Descriptive: Write "Nike Vaporfly 3 (White Flyknit upper, orange midsole)" instead of "a shoe". Specificity drives accuracy.
Specify Audio: Include ambient sounds, music, and sound effects. Veo 3.1 generates rich synchronized audio when guided.
Use Cinematic Terms: Keywords like "dolly", "probe lens", "macro zoom", and "no cuts" trigger specific camera behaviors.
Define Transitions: Describe what the environment looks like at each point in the journey, especially for complex camera moves.
Workflow Setup
Connect a Text node with your JSON prompt to a Generate Video node. Add Image nodes for reference frames to guide the visual style.
Try This Workflow
Experiment with JSON prompting using the workflow below. Connect your own reference images and customize the prompt for any product.