> ## Documentation Index
> Fetch the complete documentation index at: https://heygen-1fa696a7.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Studio API

> Embed the HeyGen Studio video editor in your application via the Studio API. Let your users record, edit, and render avatar videos inside your product.

## Overview

`POST https://api.heygen.com/v2/video/generate`

Generates videos using the AI Studio backend with support for avatars, voices, and dynamic backgrounds. You can create videos using either your photo avatar or digital twin. This endpoint supports Avatar III and Avatar IV.

Each video is composed of one or more **scenes** (up to 50), where each scene defines its own avatar, voice, background, and on-screen text.

## Request Body

### Top-Level Parameters

| Parameter          | Type    | Required | Description                                                                                                 |
| ------------------ | ------- | -------- | ----------------------------------------------------------------------------------------------------------- |
| `video_inputs`     | array   | Yes      | Array of scene objects (1–50). Each scene defines an avatar, voice, background, and optional text.          |
| `caption`          | boolean | No       | Enable captions in the video. Only supported for text-based voice input. Default: `false`.                  |
| `caption_mode`     | string  | No       | `file_only` or `burn_in`. When set, takes precedence over `caption`.                                        |
| `title`            | string  | No       | Title of the video.                                                                                         |
| `callback_id`      | string  | No       | Custom ID for callback/webhook tracking.                                                                    |
| `callback_url`     | string  | No       | URL to notify when video rendering is complete.                                                             |
| `dimension`        | object  | No       | Custom output dimensions. Defaults to `1920×1080`. Width and height must be even, between `128` and `4096`. |
| `dimension.width`  | integer | No       | Width of the output video. Default: `1920`.                                                                 |
| `dimension.height` | integer | No       | Height of the output video. Default: `1080`.                                                                |
| `fps`              | float   | No       | Output frame rate. Default: `25.0`.                                                                         |
| `folder_id`        | string  | No       | Folder ID where the video is stored.                                                                        |
| `enable_watermark` | boolean | No       | Apply the HeyGen watermark to the output. Default: `false`.                                                 |
| `subtitles`        | object  | No       | Burned-in subtitle overlay settings (see [Subtitles](#subtitles) below).                                    |
| `test`             | boolean | No       | Render in test mode (lower quality, no quota deduction). Default: `false`.                                  |

### Scene Object (`video_inputs[]`)

Each item in the `video_inputs` array represents a scene and can contain the following:

#### `character`

Defines the avatar or talking photo for the scene. The `type` field discriminates between `avatar` and `talking_photo`; some fields apply to one type only.

| Parameter                       | Type    | Required | Description                                                                                                               |
| ------------------------------- | ------- | -------- | ------------------------------------------------------------------------------------------------------------------------- |
| `type`                          | string  | Yes      | `avatar` or `talking_photo`.                                                                                              |
| `avatar_id`                     | string  | Yes\*    | Unique avatar identifier. *Required when `type` is `avatar`.*                                                             |
| `talking_photo_id`              | string  | Yes\*    | Unique talking photo identifier. *Required when `type` is `talking_photo`.*                                               |
| `avatar_style`                  | string  | No       | `normal`, `closeUp`, or `circle`. Applies only to `avatar` type. Default: `normal`.                                       |
| `talking_photo_style`           | string  | No       | `circle` or `square`. Applies only to `talking_photo` type.                                                               |
| `talking_style`                 | string  | No       | `stable` or `expressive`. Applies only to `talking_photo` type. Default: `stable`.                                        |
| `expression`                    | string  | No       | `default` or `happy`. Applies only to `talking_photo` type. Default: `default`.                                           |
| `scale`                         | float   | No       | Avatar size. Range: `0.0`–`5.0`. Default: `1.0`.                                                                          |
| `offset`                        | object  | No       | Position adjustment: `{ "x": 0.0, "y": 0.0 }`. Each axis range: `-1.0`–`1.0`.                                             |
| `fit`                           | string  | No       | How the character fits inside the scene: `contain` or `cover`. Default: `contain`.                                        |
| `use_avatar_iv_model`           | boolean | No       | Whether to use Avatar IV. See [Avatar engine default change](#avatar-engine-default-change).                              |
| `model`                         | string  | No       | Avatar IV model version (e.g. `4.3`, `4.3_turbo`, `4.3_turbo_edge`). Applies when `use_avatar_iv_model` is `true`.        |
| `resolution`                    | string  | No       | Avatar IV output resolution: `720p`, `1080p`, or `4k`. Applies when `use_avatar_iv_model` is `true`.                      |
| `prompt`                        | string  | No       | Avatar IV motion prompt. Applies when `use_avatar_iv_model` is `true`.                                                    |
| `keep_original_prompt`          | boolean | No       | Preserve the motion prompt as-is (skip enhancement). Applies when `use_avatar_iv_model` is `true`.                        |
| `alpha`                         | float   | No       | Avatar IV expressiveness level. Range: `-0.5`–`0.5`. Lower values are more expressive.                                    |
| `matting`                       | boolean | No       | Remove the photo background.                                                                                              |
| `super_resolution`              | boolean | No       | Enhance image quality. Applies only to `talking_photo` type.                                                              |
| `circle_background_color`       | string  | No       | Hex color for circle/square style background (e.g., `#FFFFFF`). Default: `#F6F6FC`.                                       |
| `use_legacy_photo_avatar_model` | boolean | No       | Force the deprecated Avatar 3 photo avatar model. Applies only to `talking_photo` type. Not recommended for new requests. |

#### `voice`

Defines what the avatar says in this scene.

| Parameter             | Type   | Required | Description                                                                                      |
| --------------------- | ------ | -------- | ------------------------------------------------------------------------------------------------ |
| `type`                | string | Yes      | `text`, `audio`, or `silence`.                                                                   |
| `voice_id`            | string | Yes\*    | Voice identifier. *Required for `text` type.*                                                    |
| `input_text`          | string | Yes\*    | Text the avatar will speak. *Required for `text` type.*                                          |
| `speed`               | float  | No       | Voice speed. Range: `0.5`–`1.5`. Default: `1.0`. Applies to `text` type.                         |
| `pitch`               | float  | No       | Voice pitch. Range: `-50.0`–`50.0`. Default: `0.0`. Applies to `text` type.                      |
| `volume`              | float  | No       | Voice audio volume. Range: `0.0`–`1.0` (`0.0` silent, `1.0` full). Applies to `text` type.       |
| `emotion`             | string | No       | `Excited`, `Friendly`, `Serious`, `Soothing`, `Broadcaster`, or `Angry`. Applies to `text` type. |
| `locale`              | string | No       | Voice accent/locale (e.g., `en-US`, `pt-BR`). Applies to `text` type.                            |
| `audio_url`           | string | Yes\*    | URL of uploaded audio. *Required for `audio` type (provide either this or `audio_asset_id`).*    |
| `audio_asset_id`      | string | Yes\*    | Asset ID of uploaded audio. *Required for `audio` type (provide either this or `audio_url`).*    |
| `duration`            | float  | No       | Silence duration in seconds. Range: `1.0`–`100.0`. Default: `1.0`. Applies to `silence` type.    |
| `elevenlabs_settings` | object | No       | Advanced ElevenLabs voice settings (see below). Applies to `text` type.                          |

**ElevenLabs Settings:**

| Parameter          | Type   | Description                                                                                                                                            |
| ------------------ | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `model`            | string | ElevenLabs model: `eleven_monolingual_v1`, `eleven_multilingual_v1`, `eleven_multilingual_v2`, `eleven_turbo_v2`, `eleven_turbo_v2_5`, or `eleven_v3`. |
| `similarity_boost` | float  | Similarity to original voice. Range: `0.0`–`1.0`.                                                                                                      |
| `stability`        | float  | Voice consistency. Range: `0.0`–`1.0`. For `eleven_v3`, default is `1.0` and allowed values are `0`, `0.5`, `1.0`.                                     |
| `style`            | float  | Style intensity. Range: `0.0`–`1.0`.                                                                                                                   |

#### `background`

Defines the scene background.

| Parameter        | Type   | Required | Description                                                                                                           |
| ---------------- | ------ | -------- | --------------------------------------------------------------------------------------------------------------------- |
| `type`           | string | Yes      | `color`, `image`, or `video`.                                                                                         |
| `value`          | string | Yes\*    | Hex color code (e.g., `#FFFFFF`). *Required for `color` type.* Default: `#f6f6fc`.                                    |
| `url`            | string | Yes\*    | URL of uploaded image/video. *Required for `image`/`video` type (provide either this or the corresponding asset ID).* |
| `image_asset_id` | string | Yes\*    | Asset ID for image background. *Provide either this or `url`.*                                                        |
| `video_asset_id` | string | Yes\*    | Asset ID for video background. *Provide either this or `url`.*                                                        |
| `play_style`     | string | Yes\*    | Playback mode: `freeze`, `loop`, or `fit_to_scene`. *Required for `video` type.*                                      |
| `fit`            | string | No       | How the background fits the screen: `cover`, `contain`, `crop`, or `none`. Default: `cover`.                          |
| `volume`         | float  | No       | Volume for video backgrounds. Range: `0.0`–`1.0`. Applies to `video` type.                                            |

#### `text`

Optional on-screen text overlay.

| Parameter           | Type   | Required | Description                                                               |
| ------------------- | ------ | -------- | ------------------------------------------------------------------------- |
| `type`              | string | Yes      | Must be `text`.                                                           |
| `text`              | string | Yes      | Text content to display.                                                  |
| `font_family`       | string | No       | Font family. Default: `Arial`.                                            |
| `font_size`         | float  | No       | Font size in points. Default: `24.0`.                                     |
| `font_weight`       | string | No       | Font weight (e.g., `bold`). Default: `bold`.                              |
| `color`             | string | No       | Text color in hex (e.g., `#FFFFFF`). Default: `#FFFFFF`.                  |
| `position`          | object | No       | Position offset: `{ "x": 0.0, "y": 0.0 }`. Each axis range: `-1.0`–`1.0`. |
| `text_align`        | string | No       | `left`, `center`, or `right`. Default: `center`.                          |
| `line_height`       | float  | Yes      | Line height / spacing between lines. Must be `> 0`.                       |
| `width`             | float  | No       | Text container width.                                                     |
| `height`            | float  | No       | Text container height.                                                    |
| `rotate`            | float  | No       | Rotation angle in degrees. Range: `0`–`360`.                              |
| `scale_x`           | float  | No       | Horizontal scale. Must be `>= 0`.                                         |
| `scale_y`           | float  | No       | Vertical scale. Must be `>= 0`.                                           |
| `transform_scale_x` | float  | No       | Additional horizontal transform scale. Must be `>= 0`.                    |
| `transform_scale_y` | float  | No       | Additional vertical transform scale. Must be `>= 0`.                      |

#### `audio`

Optional secondary audio track for this scene (in addition to `voice`). Useful for background music or sound effects.

| Parameter        | Type   | Required | Description                                                           |
| ---------------- | ------ | -------- | --------------------------------------------------------------------- |
| `audio_url`      | string | Yes\*    | URL of the uploaded audio. *Provide either this or `audio_asset_id`.* |
| `audio_asset_id` | string | Yes\*    | Asset ID of the uploaded audio. *Provide either this or `audio_url`.* |
| `volume`         | float  | No       | Audio volume. Range: `0.0`–`1.0`. Default: `1.0`.                     |
| `duration`       | float  | No       | Audio duration in seconds.                                            |
| `timeline`       | object | No       | Timeline placement: `{ "start": 0.0, "duration": 0.0 }`.              |

### Subtitles

| Parameter           | Type    | Required | Description                                              |
| ------------------- | ------- | -------- | -------------------------------------------------------- |
| `preset_name`       | string  | Yes      | Subtitle preset name.                                    |
| `alignment`         | integer | No       | Subtitle alignment code. Default: `2`.                   |
| `disable_highlight` | boolean | No       | Override the preset's highlight style. Default: `false`. |
| `font_size`         | integer | No       | Font size override.                                      |
| `position`          | object  | No       | Subtitle position: `{ "x": 0.0, "y": 0.0 }`.             |

## Example Request

```json theme={null}
{
  "title": "My Studio Video",
  "caption": false,
  "dimension": {
    "width": 1920,
    "height": 1080
  },
  "video_inputs": [
    {
      "character": {
        "type": "avatar",
        "avatar_id": "YOUR_AVATAR_ID",
        "avatar_style": "normal"
      },
      "voice": {
        "type": "text",
        "voice_id": "YOUR_VOICE_ID",
        "input_text": "Welcome to the first scene of this video.",
        "speed": 1.0
      },
      "background": {
        "type": "color",
        "value": "#1a1a2e"
      }
    },
    {
      "character": {
        "type": "avatar",
        "avatar_id": "YOUR_AVATAR_ID",
        "avatar_style": "closeUp"
      },
      "voice": {
        "type": "text",
        "voice_id": "YOUR_VOICE_ID",
        "input_text": "And here is the second scene with a different style."
      },
      "background": {
        "type": "color",
        "value": "#16213e"
      }
    }
  ]
}
```

## Response

### 200 — Success

```json theme={null}
{
  "error": null,
  "data": {
    "video_id": "af273759c9xa47369e05418c69drq174"
  }
}
```

| Field           | Type           | Description                                            |
| --------------- | -------------- | ------------------------------------------------------ |
| `error`         | string \| null | Error message if the request fails; `null` on success. |
| `data.video_id` | string         | Unique identifier of the generated video.              |

### Full API Reference

For complete details, see the [Create Avatar Video (V2)](https://docs.heygen.com/reference/create-an-avatar-video-v2) endpoint documentation.