Create Video - HeyGen Documentation

Authorizations

x-api-key

string

header

required

HeyGen API key. Obtain from your HeyGen dashboard.

Headers

Idempotency-Key

string

Optional client-supplied key for safely retrying mutations. Subsequent calls within 24 hours that share this key replay the original response — even if the request body differs slightly (a warning is logged). A retry that arrives while the original is still in flight gets a 409 request_in_progress. Keys must be 1–255 characters from [A-Za-z0-9_:.-]; a UUID is a safe default. Scope is per-endpoint and per-resource: the same key on a different route or path parameter is independent.

Required string length: 1 - 255

Pattern: ^[A-Za-z0-9_\-:.]{1,255}$

Body

application/json

CreateVideoFromAvatar
CreateVideoFromImage

Create a video from a HeyGen avatar (video or photo avatar).

Provide an avatar_id to use a previously created avatar. Supports all avatar types: studio_avatar, digital_twin, and photo_avatar. Optionally set engine to select Avatar V for eligible avatars; when omitted, the server defaults to Avatar IV.

type

string

required

Must be 'avatar' for avatar-based video creation.

Allowed value: "avatar"

avatar_id

string

required

HeyGen avatar ID (video avatar or photo avatar look ID).

title

string | null

Display title for the video in the HeyGen dashboard.

resolution

enum<string> | null

Output video resolution.

Available options:

4k,

1080p,

720p

aspect_ratio

enum<string> | null

default:16:9

Output video aspect ratio. Supported values: '16:9', '9:16', '4:5', '5:4', '1:1', 'auto'. Defaults to '16:9'. 'auto' preserves the source's aspect ratio (avatar source frames or uploaded image), short-edge anchored to the requested resolution and capped at the tier's long edge. Falls back to '16:9' when source dimensions can't be read.

Available options:

16:9,

9:16,

4:5,

5:4,

1:1,

auto

fit

enum<string> | null

How the subject is fitted to the output canvas. 'cover' scales to fill the frame (may crop edges). 'contain' scales to fit entirely within the frame (may show background). When omitted, the server picks the best option based on the source and canvas orientations.

Available options:

contain,

cover

background

BackgroundSetting · object

Background settings for the video.

Show child attributes

remove_background

boolean | null

Remove the avatar background. Video avatars must be trained with matting enabled.

callback_url

string | null

Webhook URL to receive a POST notification when the video is ready.

callback_id

string | null

Caller-defined identifier echoed back in the webhook payload.

watermark

WatermarkInput · object

Custom watermark image to overlay on the video (PNG or JPEG). Available as a premium option for select Enterprise customers. To request access, please contact our support team.

Show child attributes

caption

CaptionSetting · object

Caption generation settings. A sidecar subtitle file is always returned via subtitle_url; set 'style' to additionally burn captions into the rendered video.

Show child attributes

output_format

enum<string>

default:mp4

Output container. 'webm' returns a video with a transparent background (alpha channel); 'mp4' (default) returns a standard video. 'webm' requires an avatar that supports matting. When 'webm' is selected, any 'background' value is rejected and background removal is applied automatically — the caller does not need to set 'remove_background'.

Available options:

mp4,

webm

script

string | null

Text script for the avatar to speak. Pair with voice_id, or omit voice_id when using avatar_id to use the avatar's default voice. Mutually exclusive with audio_url/audio_asset_id.

Minimum string length: 1

voice_id

string | null

Voice ID for text-to-speech. Required when script is provided, unless avatar_id is set (the avatar's default voice is used as fallback).

audio_url

string | null

Public URL of an audio file to lip-sync. Mutually exclusive with script.

audio_asset_id

string | null

HeyGen asset ID of an uploaded audio file. Mutually exclusive with script.

voice_settings

VoiceSettingsInput · object

Voice tuning parameters (speed, pitch, locale).

Show child attributes

motion_prompt

string | null

Natural-language prompt controlling avatar body motion. Photo avatars only. Avatar IV only; rejected when engine.type is 'avatar_v'.

expressiveness

enum<string> | null

Avatar expressiveness level. Photo avatars only. Defaults to 'low' when omitted. Avatar IV only; rejected when engine.type is 'avatar_v'.

Available options:

high,

medium,

low

engine

AvatarVEngineConfig · object

Avatar V engine configuration with cross-reference-driven animation.

AvatarVEngineConfig
AvatarIVEngineConfig

Show child attributes

Response

Successful response

data

CreateAvatarVideoResponse · object

Show child attributes

Documentation Index

Authorizations

Headers

Body

Response