Skill:
heygen-video · Invoke: /heygen-video [topic_or_script] [--avatar avatar_id] · Sourceheygen-video is a video producer, not a CLI wrapper. It takes an idea and guides it to a finished cut: Discovery → Script → Prompt Craft → Frame Check → Generate → Deliver. It runs entirely on the v3 Video Agent pipeline and handles the things that separate good output from bad — aspect-ratio correction, prompt engineering, avatar conflict detection, and voice matching.
It accepts an avatar_id from heygen-avatar for identity-first videos, or uses a stock presenter. Returns a video share URL plus a session URL for iteration.
When to Use
- Any HeyGen video generation — “generate a HeyGen video”, “make a talking-head video”.
- Personalized video messages — outreach, team updates, announcements, pitches.
- Presenter-led explainers, tutorials, product demos with a human face.
- “Make a video of me saying…”, “send a video to my leads”, “summarize this article as a 60-second explainer from my avatar”.
Not for: avatar creation (use heygen-avatar first), cinematic b-roll without a presenter, translating videos (use heygen-translate), or streaming avatars.
Pipeline
First Look — avatar check
Scans the workspace for
AVATAR-*.md files. If found, pre-loads the avatar as a default. If none, runs heygen-avatar first. Blocking gate: verifies the avatar is trained (preview_image_url non-null) before proceeding — unready avatars fail silently.Discovery
Conversationally gathers purpose, audience, duration, tone, distribution (landscape/portrait), assets, key message, visual style, avatar, and language. Asks one or two items at a time — never a 10-item form.
Script
Writes for the ear in the video’s language, structured by type (Product Demo, Explainer, Tutorial, Sales Pitch, Announcement). Extracts literal on-screen text and adds the script-framing directive so Video Agent expands naturally instead of padding with silence.
Prompt Craft
Transforms the script into an optimized Video Agent prompt: narrator framing, duration signal, asset anchoring, tone calibration, media-type guidance, and a style block stacked at the end.
Frame Check
Resolves the look_id fresh from the group_id, inspects orientation and background, and appends framing/background correction notes so the avatar fits the target aspect ratio without black bars.
Modes
The skill routes to a mode based on how much the user has already decided:| Signal | Mode | Starts at |
|---|---|---|
| Vague idea (“make a video about X”) | Full Producer | Discovery |
| Has a written prompt | Enhanced Prompt | Prompt Craft |
| ”Just generate” / skip questions | Quick Shot | Generate |
| ”Dry run” / “preview” | Dry-Run | Creative preview, no API call |
Visual Styles
Two ways to control the look — combine them freely.API Styles (style_id)
Curated templates. One parameter replaces all visual direction. Browse via
list_video_agent_styles(tag, limit) — tags include cinematic, retro-tech, iconic-artist, pop-culture, handmade, print. Show thumbnails + previews before choosing.Prompt Styles
Full manual control. Pick from 20 named styles (Soft Signal, Swiss Pulse, Deconstructed, Maximalist Type…) and paste a copy-paste STYLE block at the end of the prompt. Match mood first, content second.
Media Types
Video Agent supports three media types — be explicit or it guesses (often wrong):| Use case | Best media type |
|---|---|
| Data, stats, brand elements, diagrams | Motion Graphics — animated text, charts, icons |
| Abstract concepts, custom scenarios | AI-Generated — for things stock can’t cover |
| Real environments, human emotions | Stock Media — authentic footage |
Critical Rules
- When
avatar_idis set, never describe the avatar’s appearance in the prompt — say “the selected presenter.” Describing it is the #1 cause of avatar mismatch. - Always include the script-framing directive so Video Agent expands the concept instead of padding with dead air.
- Always end the prompt with a style block — without one, visuals drift scene-to-scene.
- Capture the
session_idimmediately — the session URLhttps://app.heygen.com/video-agent/{session_id}can’t be recovered later.
Example Prompts
| Prompt | What happens |
|---|---|
| ”Make a 30-second cinematic founder intro of me. Ask what you need.” | Full pipeline: avatar → style recommendation → video. |
| ”Turn the key points from this PDF into a team update from my avatar: [file]“ | PDF → script → presenter video. |
| ”Summarize this article as a 60-second explainer: [URL]“ | Fetches content, extracts points, scripts, generates. |
| ”Make a 20-second outreach video to an investor — what should I include?” | Guides the message, you approve the script, the avatar delivers it. |
View the full SKILL.md
Includes the complete pipeline, all 20 style blocks, prompt-craft anatomy, motion vocabulary, and Frame Check correction matrix.

