Skip to main content

Documentation Index

Fetch the complete documentation index at: https://developers.heygen.com/llms.txt

Use this file to discover all available pages before exploring further.

Overview

POST https://api.heygen.com/v2/video/generate Generates videos using the AI Studio backend with support for avatars, voices, and dynamic backgrounds. You can create videos using either your photo avatar or digital twin. This endpoint supports Avatar III and Avatar IV. Each video is composed of one or more scenes (up to 50), where each scene defines its own avatar, voice, background, and on-screen text.

Request Body

Top-Level Parameters

ParameterTypeRequiredDescription
video_inputsarrayYesArray of scene objects (1–50). Each scene defines an avatar, voice, background, and optional text.
captionbooleanNoEnable captions in the video. Only supported for text-based voice input. Default: false.
caption_modestringNofile_only or burn_in. When set, takes precedence over caption.
titlestringNoTitle of the video.
callback_idstringNoCustom ID for callback/webhook tracking.
callback_urlstringNoURL to notify when video rendering is complete.
dimensionobjectNoCustom output dimensions. Defaults to 1920×1080. Width and height must be even, between 128 and 4096.
dimension.widthintegerNoWidth of the output video. Default: 1920.
dimension.heightintegerNoHeight of the output video. Default: 1080.
fpsfloatNoOutput frame rate. Default: 25.0.
folder_idstringNoFolder ID where the video is stored.
enable_watermarkbooleanNoApply the HeyGen watermark to the output. Default: false.
subtitlesobjectNoBurned-in subtitle overlay settings (see Subtitles below).
testbooleanNoRender in test mode (lower quality, no quota deduction). Default: false.

Scene Object (video_inputs[])

Each item in the video_inputs array represents a scene and can contain the following:

character

Defines the avatar or talking photo for the scene. The type field discriminates between avatar and talking_photo; some fields apply to one type only.
ParameterTypeRequiredDescription
typestringYesavatar or talking_photo.
avatar_idstringYes*Unique avatar identifier. Required when type is avatar.
talking_photo_idstringYes*Unique talking photo identifier. Required when type is talking_photo.
avatar_stylestringNonormal, closeUp, or circle. Applies only to avatar type. Default: normal.
talking_photo_stylestringNocircle or square. Applies only to talking_photo type.
talking_stylestringNostable or expressive. Applies only to talking_photo type. Default: stable.
expressionstringNodefault or happy. Applies only to talking_photo type. Default: default.
scalefloatNoAvatar size. Range: 0.05.0. Default: 1.0.
offsetobjectNoPosition adjustment: { "x": 0.0, "y": 0.0 }. Each axis range: -1.01.0.
fitstringNoHow the character fits inside the scene: contain or cover. Default: contain.
use_avatar_iv_modelbooleanNoWhether to use Avatar IV. See Avatar engine default change.
modelstringNoAvatar IV model version (e.g. 4.3, 4.3_turbo, 4.3_turbo_edge). Applies when use_avatar_iv_model is true.
resolutionstringNoAvatar IV output resolution: 720p, 1080p, or 4k. Applies when use_avatar_iv_model is true.
promptstringNoAvatar IV motion prompt. Applies when use_avatar_iv_model is true.
keep_original_promptbooleanNoPreserve the motion prompt as-is (skip enhancement). Applies when use_avatar_iv_model is true.
alphafloatNoAvatar IV expressiveness level. Range: -0.50.5. Lower values are more expressive.
mattingbooleanNoRemove the photo background.
super_resolutionbooleanNoEnhance image quality. Applies only to talking_photo type.
circle_background_colorstringNoHex color for circle/square style background (e.g., #FFFFFF). Default: #F6F6FC.
use_legacy_photo_avatar_modelbooleanNoForce the deprecated Avatar 3 photo avatar model. Applies only to talking_photo type. Not recommended for new requests.

voice

Defines what the avatar says in this scene.
ParameterTypeRequiredDescription
typestringYestext, audio, or silence.
voice_idstringYes*Voice identifier. Required for text type.
input_textstringYes*Text the avatar will speak. Required for text type.
speedfloatNoVoice speed. Range: 0.51.5. Default: 1.0. Applies to text type.
pitchfloatNoVoice pitch. Range: -50.050.0. Default: 0.0. Applies to text type.
volumefloatNoVoice audio volume. Range: 0.01.0 (0.0 silent, 1.0 full). Applies to text type.
emotionstringNoExcited, Friendly, Serious, Soothing, Broadcaster, or Angry. Applies to text type.
localestringNoVoice accent/locale (e.g., en-US, pt-BR). Applies to text type.
audio_urlstringYes*URL of uploaded audio. Required for audio type (provide either this or audio_asset_id).
audio_asset_idstringYes*Asset ID of uploaded audio. Required for audio type (provide either this or audio_url).
durationfloatNoSilence duration in seconds. Range: 1.0100.0. Default: 1.0. Applies to silence type.
elevenlabs_settingsobjectNoAdvanced ElevenLabs voice settings (see below). Applies to text type.
ElevenLabs Settings:
ParameterTypeDescription
modelstringElevenLabs model: eleven_monolingual_v1, eleven_multilingual_v1, eleven_multilingual_v2, eleven_turbo_v2, eleven_turbo_v2_5, or eleven_v3.
similarity_boostfloatSimilarity to original voice. Range: 0.01.0.
stabilityfloatVoice consistency. Range: 0.01.0. For eleven_v3, default is 1.0 and allowed values are 0, 0.5, 1.0.
stylefloatStyle intensity. Range: 0.01.0.

background

Defines the scene background.
ParameterTypeRequiredDescription
typestringYescolor, image, or video.
valuestringYes*Hex color code (e.g., #FFFFFF). Required for color type. Default: #f6f6fc.
urlstringYes*URL of uploaded image/video. Required for image/video type (provide either this or the corresponding asset ID).
image_asset_idstringYes*Asset ID for image background. Provide either this or url.
video_asset_idstringYes*Asset ID for video background. Provide either this or url.
play_stylestringYes*Playback mode: freeze, loop, or fit_to_scene. Required for video type.
fitstringNoHow the background fits the screen: cover, contain, crop, or none. Default: cover.
volumefloatNoVolume for video backgrounds. Range: 0.01.0. Applies to video type.

text

Optional on-screen text overlay.
ParameterTypeRequiredDescription
typestringYesMust be text.
textstringYesText content to display.
font_familystringNoFont family. Default: Arial.
font_sizefloatNoFont size in points. Default: 24.0.
font_weightstringNoFont weight (e.g., bold). Default: bold.
colorstringNoText color in hex (e.g., #FFFFFF). Default: #FFFFFF.
positionobjectNoPosition offset: { "x": 0.0, "y": 0.0 }. Each axis range: -1.01.0.
text_alignstringNoleft, center, or right. Default: center.
line_heightfloatYesLine height / spacing between lines. Must be > 0.
widthfloatNoText container width.
heightfloatNoText container height.
rotatefloatNoRotation angle in degrees. Range: 0360.
scale_xfloatNoHorizontal scale. Must be >= 0.
scale_yfloatNoVertical scale. Must be >= 0.
transform_scale_xfloatNoAdditional horizontal transform scale. Must be >= 0.
transform_scale_yfloatNoAdditional vertical transform scale. Must be >= 0.

audio

Optional secondary audio track for this scene (in addition to voice). Useful for background music or sound effects.
ParameterTypeRequiredDescription
audio_urlstringYes*URL of the uploaded audio. Provide either this or audio_asset_id.
audio_asset_idstringYes*Asset ID of the uploaded audio. Provide either this or audio_url.
volumefloatNoAudio volume. Range: 0.01.0. Default: 1.0.
durationfloatNoAudio duration in seconds.
timelineobjectNoTimeline placement: { "start": 0.0, "duration": 0.0 }.

Subtitles

ParameterTypeRequiredDescription
preset_namestringYesSubtitle preset name.
alignmentintegerNoSubtitle alignment code. Default: 2.
disable_highlightbooleanNoOverride the preset’s highlight style. Default: false.
font_sizeintegerNoFont size override.
positionobjectNoSubtitle position: { "x": 0.0, "y": 0.0 }.

Example Request

{
  "title": "My Studio Video",
  "caption": false,
  "dimension": {
    "width": 1920,
    "height": 1080
  },
  "video_inputs": [
    {
      "character": {
        "type": "avatar",
        "avatar_id": "YOUR_AVATAR_ID",
        "avatar_style": "normal"
      },
      "voice": {
        "type": "text",
        "voice_id": "YOUR_VOICE_ID",
        "input_text": "Welcome to the first scene of this video.",
        "speed": 1.0
      },
      "background": {
        "type": "color",
        "value": "#1a1a2e"
      }
    },
    {
      "character": {
        "type": "avatar",
        "avatar_id": "YOUR_AVATAR_ID",
        "avatar_style": "closeUp"
      },
      "voice": {
        "type": "text",
        "voice_id": "YOUR_VOICE_ID",
        "input_text": "And here is the second scene with a different style."
      },
      "background": {
        "type": "color",
        "value": "#16213e"
      }
    }
  ]
}

Response

200 — Success

{
  "error": null,
  "data": {
    "video_id": "af273759c9xa47369e05418c69drq174"
  }
}
FieldTypeDescription
errorstring | nullError message if the request fails; null on success.
data.video_idstringUnique identifier of the generated video.

Full API Reference

For complete details, see the Create Avatar Video (V2) endpoint documentation.