heygen-video - HeyGen Documentation

Skill: heygen-video · Invoke: /heygen-video [topic_or_script] [--avatar avatar_id] · Source

heygen-video is a video producer, not a CLI wrapper. It takes an idea and guides it to a finished cut: Discovery → Script → Prompt Craft → Frame Check → Generate → Deliver. It runs entirely on the v3 Video Agent pipeline and handles the things that separate good output from bad — aspect-ratio correction, prompt engineering, avatar conflict detection, and voice matching. It accepts an avatar_id from heygen-avatar for identity-first videos, or uses a stock presenter. Returns a video share URL plus a session URL for iteration.

v3 only. Never call deprecated POST /v2/video/generate, GET /v2/avatars, or v1 endpoints directly. Raw HTTP skips Frame Check and prompt engineering and produces visibly worse videos. Route through MCP, the OpenClaw plugin, or the heygen CLI via this skill.

When to Use

Any HeyGen video generation — “generate a HeyGen video”, “make a talking-head video”.
Personalized video messages — outreach, team updates, announcements, pitches.
Presenter-led explainers, tutorials, product demos with a human face.
“Make a video of me saying…”, “send a video to my leads”, “summarize this article as a 60-second explainer from my avatar”.

Not for: avatar creation (use heygen-avatar first), cinematic b-roll without a presenter, translating videos (use heygen-translate), or streaming avatars.

Pipeline

First Look — avatar check

Scans the workspace for AVATAR-*.md files. If found, pre-loads the avatar as a default. If none, runs heygen-avatar first. Blocking gate: verifies the avatar is trained (preview_image_url non-null) before proceeding — unready avatars fail silently.

Discovery

Conversationally gathers purpose, audience, duration, tone, distribution (landscape/portrait), assets, key message, visual style, avatar, and language. Asks one or two items at a time — never a 10-item form.

Script

Writes for the ear in the video’s language, structured by type (Product Demo, Explainer, Tutorial, Sales Pitch, Announcement). Extracts literal on-screen text and adds the script-framing directive so Video Agent expands naturally instead of padding with silence.

Prompt Craft

Transforms the script into an optimized Video Agent prompt: narrator framing, duration signal, asset anchoring, tone calibration, media-type guidance, and a style block stacked at the end.

Frame Check

Resolves the look_id fresh from the group_id, inspects orientation and background, and appends framing/background correction notes so the avatar fits the target aspect ratio without black bars.

Generate & Deliver

Submits to the v3 Video Agent, polls silently (20–45 min typical), then delivers the video with a one-line summary and a HeyGen dashboard link for editing.

Modes

The skill routes to a mode based on how much the user has already decided:

Signal	Mode	Starts at
Vague idea (“make a video about X”)	Full Producer	Discovery
Has a written prompt	Enhanced Prompt	Prompt Craft
”Just generate” / skip questions	Quick Shot	Generate
”Dry run” / “preview”	Dry-Run	Creative preview, no API call

Visual Styles

Two ways to control the look — combine them freely.

API Styles (style_id)

Curated templates. One parameter replaces all visual direction. Browse via list_video_agent_styles(tag, limit) — tags include cinematic, retro-tech, iconic-artist, pop-culture, handmade, print. Show thumbnails + previews before choosing.

Prompt Styles

Full manual control. Pick from 20 named styles (Soft Signal, Swiss Pulse, Deconstructed, Maximalist Type…) and paste a copy-paste STYLE block at the end of the prompt. Match mood first, content second.

Top performers across 40+ production videos: Deconstructed (most reliable across topics), Swiss Pulse (data-heavy), Digital Grid (tech), Geometric Bold (elegant, versatile), Maximalist Type (high energy — use sparingly).

Media Types

Video Agent supports three media types — be explicit or it guesses (often wrong):

Use case	Best media type
Data, stats, brand elements, diagrams	Motion Graphics — animated text, charts, icons
Abstract concepts, custom scenarios	AI-Generated — for things stock can’t cover
Real environments, human emotions	Stock Media — authentic footage

Critical Rules

When avatar_id is set, never describe the avatar’s appearance in the prompt — say “the selected presenter.” Describing it is the #1 cause of avatar mismatch.
Always include the script-framing directive so Video Agent expands the concept instead of padding with dead air.
Always end the prompt with a style block — without one, visuals drift scene-to-scene.
Capture the session_id immediately — the session URL https://app.heygen.com/video-agent/{session_id} can’t be recovered later.

Example Prompts

Prompt	What happens
”Make a 30-second cinematic founder intro of me. Ask what you need.”	Full pipeline: avatar → style recommendation → video.
”Turn the key points from this PDF into a team update from my avatar: [file]“	PDF → script → presenter video.
”Summarize this article as a 60-second explainer: [URL]“	Fetches content, extracts points, scripts, generates.
”Make a 20-second outreach video to an investor — what should I include?”	Guides the message, you approve the script, the avatar delivers it.

View the full SKILL.md

Includes the complete pipeline, all 20 style blocks, prompt-craft anatomy, motion vocabulary, and Frame Check correction matrix.

​When to Use

​Pipeline

​Modes

​Visual Styles

API Styles (style_id)

Prompt Styles

​Media Types

​Critical Rules

​Example Prompts

View the full SKILL.md

When to Use

Pipeline

Modes

Visual Styles

Media Types

Critical Rules

Example Prompts