> ## Documentation Index
> Fetch the complete documentation index at: https://heygen-1fa696a7.mintlify.app/llms.txt
> Use this file to discover all available pages before exploring further.

# Writing Effective Video Prompts

> Get better video agent results with prompt engineering. Covers structure, specificity, style direction, and common pitfalls, with before-and-after prompt.

Video Agent is prompt-driven. But "more detail" doesn't always mean "better video." We ran 14 experiments with different prompting strategies to find out what actually produces the best results. Here's what we learned.

## See the Difference

Same topic, different prompts. Watch both — the difference is the entire argument of this page.

<Tabs>
  <Tab title="Vague prompt">
    Prompt:

    ```text theme={null}
    Make a video about remote work benefits.
    ```

    <iframe width="560" height="315" src="https://app.heygen.com/embeds/a10374ad53274be1b1052eb00d897068" title="HeyGen video player" frameborder="0" allow="encrypted-media; fullscreen;" allowfullscreen />
  </Tab>

  <Tab title="Crafted prompt">
    Prompt:

    ```text theme={null}
    Two years ago, I could only hire people within 30 miles of our
    office. Today, my team spans 4 countries and 3 time zones. We
    found engineers we never would have found locally. Our office
    costs dropped to nearly zero. And here's the surprising part —
    people actually stayed longer. Remote isn't the future. It's
    already the default.

    Tone: Like a founder on a podcast — reflective, honest, sharing
    a personal experience. Not a pitch, not a lecture. Just someone
    who tried something and it worked.
    Background: Casual home office or coffee shop. Warm, natural.
    30 seconds. Landscape.
    ```

    <iframe width="560" height="315" src="https://app.heygen.com/embeds/8103828eacf444dd8565b345d3014a14" title="HeyGen video player" frameborder="0" allow="encrypted-media; fullscreen;" allowfullscreen />
  </Tab>
</Tabs>

Both are about remote work benefits. The second used a natural story script with a tone description — no timestamps, no scene structure, no prescribed overlays. Just a great script and a feeling.

## The #1 Rule: Write a Great Script

The single biggest factor in video quality is the script — the actual words the presenter will say. Everything else (visuals, overlays, pacing) is secondary. Video Agent makes good production decisions on its own. Your job is to give it great words to work with.

<Tabs>
  <Tab title="Weak script">
    ```text theme={null}
    Here are three science-backed ways to sleep better tonight.
    First: cut screens 30 minutes before bed — blue light
    suppresses melatonin. Second: cool your room to 65 degrees.
    Third: wake up at the same time every day.
    ```

    Informational, clinical, reads like a textbook. The video will be competent but forgettable.
  </Tab>

  <Tab title="Strong script">
    ```text theme={null}
    Six months ago I was averaging 5 hours of broken sleep. I
    tried everything — supplements, meditation apps, white noise
    machines. Nothing worked. Then I did three stupidly simple
    things: I put my phone charger in the kitchen. I turned the
    thermostat down to 65. And I set one alarm — same time, every
    single day. No more negotiating with the snooze button. Within
    two weeks I was sleeping 7 hours straight. No supplements. No
    apps. Just discipline and a cold room.
    ```

    Personal, narrative, has an arc. The viewer is hooked because someone is telling a real story — not listing facts.
  </Tab>
</Tabs>

In our experiments, the personal story consistently produced better videos than the informational version — better B-roll choices, better pacing, more engaging delivery.

## What Makes a Script Work

**Stories beat lists.** First-person narratives ("I tried X, then Y happened") give Video Agent richer material to work with than bullet points. The agent generates better visuals when the script has emotional texture.

**Bold beats safe.** Provocative framing ("Stop trying to sleep 8 hours. Seriously.") produced more engaging videos than neutral framing. The agent matched the script's energy with bolder visual choices.

**Flow beats structure.** Scripts that read naturally — like someone talking to a friend — deliver better than scripts chopped into rigid segments. If it sounds awkward to read aloud, it'll sound awkward in the video.

**Questions don't work well.** Scripts built around questions ("Do you check your phone before bed? What temperature is your bedroom?") felt unnatural with a single speaker. Save the Socratic method for [Live Avatar](/cookbook/live-avatar/ai-tutor) conversations.

## Add Tone, Not Timestamps

After writing your script, the most useful thing you can add is a **tone description** — how the video should *feel*, not how it should be structured.

<Tabs>
  <Tab title="Tone description (do this)">
    ```text theme={null}
    [your script here]

    Tone: Like a founder on a podcast — reflective, honest, no
    corporate speak. The presenter should feel like they're sharing
    a personal experience, not reading a script.
    Background: Casual home office or coffee shop. Warm, natural.
    Duration: 30 seconds.
    ```

    Guides the delivery and mood without constraining the production.
  </Tab>

  <Tab title="Timestamp structure (avoid this)">
    ```text theme={null}
    Scene 1 (0-5s): Hook — "..."
    Scene 2 (5-12s): Tip 1 — "..."
    Scene 3 (12-20s): Tip 2 — "..."
    Scene 4 (20-27s): Tip 3 — "..."
    Scene 5 (27-30s): Close — "..."
    ```

    Gives you precise control but makes the delivery feel robotic. The agent follows the timing exactly, and the result sounds choppy.
  </Tab>
</Tabs>

In our tests, adding tone improved delivery quality. Adding timestamps and scene structure gave more control but hurt the natural flow of speech.

## Let Video Agent Handle Production

Video Agent makes surprisingly good decisions about:

* **B-roll selection** — relevant, well-timed visuals
* **Text overlays** — clean typography, good placement
* **Color palette** — matches the mood of the script
* **Music** — appropriate energy and tone
* **Pacing** — natural rhythm based on the script

You don't always need to specify these. In our experiments (tested on a health/wellness topic), the minimal prompt ("Make a 30-second video about 3 tips for better sleep") produced a video with solid B-roll, thoughtful overlays, and a calming color palette — all chosen by the agent. Results may vary by topic and content type.

**Only override production decisions when you have a specific need.** For example:

* `Orientation: portrait` — when targeting TikTok/Reels
* `Duration: 30 seconds` — when you have a length constraint
* Keep the presenter on screen (see below for translation-ready videos)

## Reference Files for Context

When your video is about something visual — a product, a document, a website — attach files so the agent has context to work with.

```json theme={null}
{
  "prompt": "Create a product walkthrough based on the attached screenshots...",
  "files": [
    { "type": "url", "url": "https://example.com/screenshot.png" }
  ]
}
```

This works well for product demos, content summaries, and brand-consistent videos. See [Video Agent docs](/docs/video-agent#file-input-formats) for supported file types.

## Translation-Ready Videos

If you plan to translate your video into other languages using [Video Translation](/cookbook/video-agent/multilingual-content), the presenter's face needs to be visible throughout for lip-sync to work. Add this to your prompt:

```text theme={null}
This is a direct-to-camera message. Think of it like a FaceTime
call — one person, one camera, sincere eye contact throughout.
The presenter should be visible and speaking for the entire video.
```

<Warning>
  **Don't use restrictive language** like "No B-roll, no cutaway scenes, no stock footage." In our tests, this produced a flat, visually boring result. The positive framing above keeps the avatar on screen while still allowing the agent to add text overlays for visual interest.
</Warning>

## Prompt Templates

These templates use the patterns that worked best in our experiments: natural scripts, tone descriptions, and minimal production direction.

<Accordion title="Personal Story (30s)">
  ```text theme={null}
  [Write a first-person story about your topic. Include a problem,
  what you tried, what actually worked, and the result. Make it
  conversational — read it aloud to check if it flows naturally.]

  Tone: Honest, slightly amazed it worked. Like a podcast story.
  Not polished — real.
  Duration: 30 seconds.
  ```
</Accordion>

<Accordion title="Bold Take (30s)">
  ```text theme={null}
  [Open with a contrarian or surprising statement. Challenge a
  common assumption. Then deliver 2-3 rapid points that support
  your take. Close with a memorable line.]

  Tone: Confident, slightly provocative. Not angry — just done
  with bad advice. Like a friend who's tired of watching you
  struggle.
  Duration: 30 seconds.
  ```
</Accordion>

<Accordion title="Micro-Story (30s, portrait)">
  ```text theme={null}
  [Write one continuous thought — no bullet points, no lists, no
  sections. Just a person telling a 30-second story directly to
  camera. The simpler and more honest, the better.]

  Tone: Deadpan, honest, slightly amused. The humor is in the
  delivery, not the words.
  Orientation: portrait.
  ```
</Accordion>

<Accordion title="Translation-Ready Message (30-45s)">
  ```text theme={null}
  [Write a warm, universal message. Avoid idioms, slang, or
  culturally specific references — this will be translated into
  multiple languages. Keep sentences short and clear.]

  This is a direct-to-camera message — one person, one camera,
  sincere eye contact throughout. Like a FaceTime call from a
  friend.
  Tone: Warm, sincere, inclusive.
  Duration: 35 seconds. Landscape.
  ```
</Accordion>

## Common Mistakes

<Warning>
  **Don't over-structure.** Timestamps per scene (0-5s, 5-12s) make the delivery sound robotic. Write a flowing script and let the agent decide the pacing.
</Warning>

<Warning>
  **Don't prescribe visuals you don't need.** "Text overlay: Global Talent Pool" or "Show a visual of a thermostat" — the agent makes good visual choices on its own. Only specify visuals when they're critical to the message.
</Warning>

<Warning>
  **Don't use question-driven scripts.** "Do you check your phone before bed?" feels unnatural coming from a single presenter talking to camera. Questions work in conversations, not monologues.
</Warning>

<Warning>
  **Don't use restrictive instructions.** "Do NOT use stock footage. Do NOT include music." Telling the agent what NOT to do makes it play safe. Use positive framing: describe what you want, not what you don't.
</Warning>

<Info>
  **How we know this:** We ran 14 experiments generating the same topic ("3 tips for better sleep") with different prompting strategies — varying detail level, script style, format instructions, and avatar visibility. The findings on this page are based on those rendered videos, not theory.
</Info>

***

## Next Steps

<CardGroup cols={2}>
  <Card title="Social Media Pipeline" icon="share-nodes" href="/cookbook/video-agent/social-media-pipeline">
    Apply these techniques to batch-generate social content.
  </Card>

  <Card title="Multilingual Content" icon="globe" href="/cookbook/video-agent/multilingual-content">
    Generate translation-ready videos using the positive framing technique.
  </Card>
</CardGroup>
