Studio API

Overview

POST https://api.heygen.com/v2/video/generate Generates videos using the AI Studio backend with support for avatars, voices, and dynamic backgrounds. You can create videos using either your photo avatar or digital twin. This endpoint supports Avatar III and Avatar IV. Each video is composed of one or more scenes (up to 50), where each scene defines its own avatar, voice, background, and on-screen text.

Request Body

Top-Level Parameters

Parameter	Type	Required	Description
`video_inputs`	array	Yes	Array of scene objects (1–50). Each scene defines an avatar, voice, background, and optional text.
`caption`	boolean	No	Enable captions in the video. Only supported for text-based voice input. Default: `false`.
`caption_mode`	string	No	`file_only` or `burn_in`. When set, takes precedence over `caption`.
`title`	string	No	Title of the video.
`callback_id`	string	No	Custom ID for callback/webhook tracking.
`callback_url`	string	No	URL to notify when video rendering is complete.
`dimension`	object	No	Custom output dimensions. Defaults to `1920×1080`. Width and height must be even, between `128` and `4096`.
`dimension.width`	integer	No	Width of the output video. Default: `1920`.
`dimension.height`	integer	No	Height of the output video. Default: `1080`.
`fps`	float	No	Output frame rate. Default: `25.0`.
`folder_id`	string	No	Folder ID where the video is stored.
`enable_watermark`	boolean	No	Apply the HeyGen watermark to the output. Default: `false`.
`subtitles`	object	No	Burned-in subtitle overlay settings (see Subtitles below).
`test`	boolean	No	Render in test mode (lower quality, no quota deduction). Default: `false`.

Scene Object (`video_inputs[]`)

Each item in the video_inputs array represents a scene and can contain the following:

`character`

Defines the avatar or talking photo for the scene. The type field discriminates between avatar and talking_photo; some fields apply to one type only.

Parameter	Type	Required	Description
`type`	string	Yes	`avatar` or `talking_photo`.
`avatar_id`	string	Yes*	Unique avatar identifier. Required when `type` is `avatar`.
`talking_photo_id`	string	Yes*	Unique talking photo identifier. Required when `type` is `talking_photo`.
`avatar_style`	string	No	`normal`, `closeUp`, or `circle`. Applies only to `avatar` type. Default: `normal`.
`talking_photo_style`	string	No	`circle` or `square`. Applies only to `talking_photo` type.
`talking_style`	string	No	`stable` or `expressive`. Applies only to `talking_photo` type. Default: `stable`.
`expression`	string	No	`default` or `happy`. Applies only to `talking_photo` type. Default: `default`.
`scale`	float	No	Avatar size. Range: `0.0`–`5.0`. Default: `1.0`.
`offset`	object	No	Position adjustment: `{ "x": 0.0, "y": 0.0 }`. Each axis range: `-1.0`–`1.0`.
`fit`	string	No	How the character fits inside the scene: `contain` or `cover`. Default: `contain`.
`use_avatar_iv_model`	boolean	No	Whether to use Avatar IV. See Avatar engine default change.
`model`	string	No	Avatar IV model version (e.g. `4.3`, `4.3_turbo`, `4.3_turbo_edge`). Applies when `use_avatar_iv_model` is `true`.
`resolution`	string	No	Avatar IV output resolution: `720p`, `1080p`, or `4k`. Applies when `use_avatar_iv_model` is `true`.
`prompt`	string	No	Avatar IV motion prompt. Applies when `use_avatar_iv_model` is `true`.
`keep_original_prompt`	boolean	No	Preserve the motion prompt as-is (skip enhancement). Applies when `use_avatar_iv_model` is `true`.
`alpha`	float	No	Avatar IV expressiveness level. Range: `-0.5`–`0.5`. Lower values are more expressive.
`matting`	boolean	No	Remove the photo background.
`super_resolution`	boolean	No	Enhance image quality. Applies only to `talking_photo` type.
`circle_background_color`	string	No	Hex color for circle/square style background (e.g., `#FFFFFF`). Default: `#F6F6FC`.
`use_legacy_photo_avatar_model`	boolean	No	Force the deprecated Avatar 3 photo avatar model. Applies only to `talking_photo` type. Not recommended for new requests.

`voice`

Defines what the avatar says in this scene.

Parameter	Type	Required	Description
`type`	string	Yes	`text`, `audio`, or `silence`.
`voice_id`	string	Yes*	Voice identifier. Required for `text` type.
`input_text`	string	Yes*	Text the avatar will speak. Required for `text` type.
`speed`	float	No	Voice speed. Range: `0.5`–`1.5`. Default: `1.0`. Applies to `text` type.
`pitch`	float	No	Voice pitch. Range: `-50.0`–`50.0`. Default: `0.0`. Applies to `text` type.
`volume`	float	No	Voice audio volume. Range: `0.0`–`1.0` (`0.0` silent, `1.0` full). Applies to `text` type.
`emotion`	string	No	`Excited`, `Friendly`, `Serious`, `Soothing`, `Broadcaster`, or `Angry`. Applies to `text` type.
`locale`	string	No	Voice accent/locale (e.g., `en-US`, `pt-BR`). Applies to `text` type.
`audio_url`	string	Yes*	URL of uploaded audio. Required for `audio` type (provide either this or `audio_asset_id`).
`audio_asset_id`	string	Yes*	Asset ID of uploaded audio. Required for `audio` type (provide either this or `audio_url`).
`duration`	float	No	Silence duration in seconds. Range: `1.0`–`100.0`. Default: `1.0`. Applies to `silence` type.
`elevenlabs_settings`	object	No	Advanced ElevenLabs voice settings (see below). Applies to `text` type.

ElevenLabs Settings:

Parameter	Type	Description
`model`	string	ElevenLabs model: `eleven_monolingual_v1`, `eleven_multilingual_v1`, `eleven_multilingual_v2`, `eleven_turbo_v2`, `eleven_turbo_v2_5`, or `eleven_v3`.
`similarity_boost`	float	Similarity to original voice. Range: `0.0`–`1.0`.
`stability`	float	Voice consistency. Range: `0.0`–`1.0`. For `eleven_v3`, default is `1.0` and allowed values are `0`, `0.5`, `1.0`.
`style`	float	Style intensity. Range: `0.0`–`1.0`.

`background`

Defines the scene background.

Parameter	Type	Required	Description
`type`	string	Yes	`color`, `image`, or `video`.
`value`	string	Yes*	Hex color code (e.g., `#FFFFFF`). Required for `color` type. Default: `#f6f6fc`.
`url`	string	Yes*	URL of uploaded image/video. Required for `image`/`video` type (provide either this or the corresponding asset ID).
`image_asset_id`	string	Yes*	Asset ID for image background. Provide either this or `url`.
`video_asset_id`	string	Yes*	Asset ID for video background. Provide either this or `url`.
`play_style`	string	Yes*	Playback mode: `freeze`, `loop`, or `fit_to_scene`. Required for `video` type.
`fit`	string	No	How the background fits the screen: `cover`, `contain`, `crop`, or `none`. Default: `cover`.
`volume`	float	No	Volume for video backgrounds. Range: `0.0`–`1.0`. Applies to `video` type.

`text`

Optional on-screen text overlay.

Parameter	Type	Required	Description
`type`	string	Yes	Must be `text`.
`text`	string	Yes	Text content to display.
`font_family`	string	No	Font family. Default: `Arial`.
`font_size`	float	No	Font size in points. Default: `24.0`.
`font_weight`	string	No	Font weight (e.g., `bold`). Default: `bold`.
`color`	string	No	Text color in hex (e.g., `#FFFFFF`). Default: `#FFFFFF`.
`position`	object	No	Position offset: `{ "x": 0.0, "y": 0.0 }`. Each axis range: `-1.0`–`1.0`.
`text_align`	string	No	`left`, `center`, or `right`. Default: `center`.
`line_height`	float	Yes	Line height / spacing between lines. Must be `> 0`.
`width`	float	No	Text container width.
`height`	float	No	Text container height.
`rotate`	float	No	Rotation angle in degrees. Range: `0`–`360`.
`scale_x`	float	No	Horizontal scale. Must be `>= 0`.
`scale_y`	float	No	Vertical scale. Must be `>= 0`.
`transform_scale_x`	float	No	Additional horizontal transform scale. Must be `>= 0`.
`transform_scale_y`	float	No	Additional vertical transform scale. Must be `>= 0`.

`audio`

Optional secondary audio track for this scene (in addition to voice). Useful for background music or sound effects.

Parameter	Type	Required	Description
`audio_url`	string	Yes*	URL of the uploaded audio. Provide either this or `audio_asset_id`.
`audio_asset_id`	string	Yes*	Asset ID of the uploaded audio. Provide either this or `audio_url`.
`volume`	float	No	Audio volume. Range: `0.0`–`1.0`. Default: `1.0`.
`duration`	float	No	Audio duration in seconds.
`timeline`	object	No	Timeline placement: `{ "start": 0.0, "duration": 0.0 }`.

Subtitles

Parameter	Type	Required	Description
`preset_name`	string	Yes	Subtitle preset name.
`alignment`	integer	No	Subtitle alignment code. Default: `2`.
`disable_highlight`	boolean	No	Override the preset’s highlight style. Default: `false`.
`font_size`	integer	No	Font size override.
`position`	object	No	Subtitle position: `{ "x": 0.0, "y": 0.0 }`.

Example Request

{
  "title": "My Studio Video",
  "caption": false,
  "dimension": {
    "width": 1920,
    "height": 1080
  },
  "video_inputs": [
    {
      "character": {
        "type": "avatar",
        "avatar_id": "YOUR_AVATAR_ID",
        "avatar_style": "normal"
      },
      "voice": {
        "type": "text",
        "voice_id": "YOUR_VOICE_ID",
        "input_text": "Welcome to the first scene of this video.",
        "speed": 1.0
      },
      "background": {
        "type": "color",
        "value": "#1a1a2e"
      }
    },
    {
      "character": {
        "type": "avatar",
        "avatar_id": "YOUR_AVATAR_ID",
        "avatar_style": "closeUp"
      },
      "voice": {
        "type": "text",
        "voice_id": "YOUR_VOICE_ID",
        "input_text": "And here is the second scene with a different style."
      },
      "background": {
        "type": "color",
        "value": "#16213e"
      }
    }
  ]
}

Response

200 — Success

{
  "error": null,
  "data": {
    "video_id": "af273759c9xa47369e05418c69drq174"
  }
}

Field	Type	Description
`error`	string \| null	Error message if the request fails; `null` on success.
`data.video_id`	string	Unique identifier of the generated video.

Full API Reference

For complete details, see the Create Avatar Video (V2) endpoint documentation.

Auth

User Info

Pricing

Video Agent

Video Generation

Video Translation

Avatars

Voices

Lipsync

Webhook

Assets

Integrations

Legacy APIs

Limits

Overview

Request Body

Top-Level Parameters

Scene Object (`video_inputs[]`)

`character`

`voice`

`background`

`text`

`audio`

Subtitles

Example Request

Response

200 — Success

Full API Reference

Auth

User Info

Pricing

Video Agent

Video Generation

Video Translation

Avatars

Voices

Lipsync

Webhook

Assets

Integrations

Legacy APIs

Limits

Documentation Index

​Overview

​Request Body

​Top-Level Parameters

​Scene Object (video_inputs[])

​character

​voice

​background

​text

​audio

​Subtitles

​Example Request

​Response

​200 — Success

​Full API Reference

Overview

Request Body

Top-Level Parameters

Scene Object (`video_inputs[]`)

`character`

`voice`

`background`

`text`

`audio`

Subtitles

Example Request

Response

200 — Success

Full API Reference