Creates a video from a HeyGen avatar or an arbitrary image. Supports scripts or pre-recorded audio for lip-sync. Supports Avatar IV and Avatar V engines; set the ‘engine’ field to select. Avatar III video generation requires the legacy API (v1 or v2).
Documentation Index
Fetch the complete documentation index at: https://heygen-1fa696a7.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
HeyGen API key. Obtain from your HeyGen dashboard.
Create a video from a HeyGen avatar (video or photo avatar).
Provide an avatar_id to use a previously created avatar. Supports all
avatar types: studio_avatar, digital_twin, and photo_avatar. Optionally
set engine to select Avatar V for eligible avatars; when omitted, the
server defaults to Avatar IV.
Must be 'avatar' for avatar-based video creation.
"avatar"HeyGen avatar ID (video avatar or photo avatar look ID).
Display title for the video in the HeyGen dashboard.
Output video resolution.
4k, 1080p, 720p Output video aspect ratio.
16:9, 9:16 How the subject is fitted to the output canvas. 'cover' scales to fill the frame (may crop edges). 'contain' scales to fit entirely within the frame (may show background). When omitted, the server picks the best option based on the source and canvas orientations.
contain, cover Background settings for the video.
Remove the avatar background. Video avatars must be trained with matting enabled.
Webhook URL to receive a POST notification when the video is ready.
Caller-defined identifier echoed back in the webhook payload.
Custom watermark image to overlay on the video (PNG or JPEG). Available as a premium option for select Enterprise customers. To request access, please contact our support team.
Caption generation settings. A sidecar subtitle file is always returned via subtitle_url; set 'style' to additionally burn captions into the rendered video.
Output container. 'webm' returns a video with a transparent background (alpha channel); 'mp4' (default) returns a standard video. 'webm' requires an avatar that supports matting. When 'webm' is selected, any 'background' value is rejected and background removal is applied automatically — the caller does not need to set 'remove_background'.
mp4, webm Text script for the avatar to speak. Pair with voice_id, or omit voice_id when using avatar_id to use the avatar's default voice. Mutually exclusive with audio_url/audio_asset_id.
1Voice ID for text-to-speech. Required when script is provided, unless avatar_id is set (the avatar's default voice is used as fallback).
Public URL of an audio file to lip-sync. Mutually exclusive with script.
HeyGen asset ID of an uploaded audio file. Mutually exclusive with script.
Voice tuning parameters (speed, pitch, locale).
Natural-language prompt controlling avatar body motion. Photo avatars only. Avatar IV only; not supported when engine.type is 'avatar_v'.
Avatar expressiveness level. Photo avatars only. Defaults to 'low' when omitted. Avatar IV only; not supported when engine.type is 'avatar_v'.
high, medium, low Avatar V engine configuration with cross-reference-driven animation.
Successful response