Legacy Endpoint — This is a legacy endpoint that supports scene-by-scene video generation. The V3 APIs do not offer scene-by-scene generation.
Overview
POST https://api.heygen.com/v2/video/generate
Generates videos using the AI Studio backend with support for avatars, voices, and dynamic backgrounds. You can create videos using either your photo avatar or digital twin. This endpoint supports Avatar III and Avatar IV.
Each video is composed of one or more scenes (up to 50), where each scene defines its own avatar, voice, background, and on-screen text.
Authentication
Include your API key in the request header:
| Header | Value |
|---|
x-api-key | Your HeyGen API key |
Content-Type | application/json |
Request Body
Top-Level Parameters
| Parameter | Type | Required | Description |
|---|
video_inputs | array | Yes | Array of scene objects (1–50). Each scene defines an avatar, voice, background, and optional text. |
caption | boolean | No | Enable captions in the video. Only supported for text-based voice input. Default: false. |
title | string | No | Title of the video. |
callback_id | string | No | Custom ID for callback/webhook tracking. |
dimension | object | No | Custom output dimensions. Defaults to 1920×1080. |
dimension.width | integer | No | Width of the output video. Default: 1920. |
dimension.height | integer | No | Height of the output video. Default: 1080. |
folder_id | string | No | Folder ID where the video is stored. |
callback_url | string | No | URL to notify when video rendering is complete. |
Each item in the video_inputs array represents a scene and can contain the following:
character
Defines the avatar or talking photo for the scene.
| Parameter | Type | Required | Description |
|---|
type | string | Yes | avatar or talking_photo. |
avatar_id | string | Yes* | Unique avatar identifier. Required when type is avatar. |
talking_photo_id | string | Yes* | Unique talking photo identifier. Required when type is talking_photo. |
avatar_style | string | No | normal, closeUp, or circle. Applies only to avatar type. Default: normal. |
talking_photo_style | string | No | circle. Applies only to talking_photo type. |
talking_style | string | No | stable or expressive. Applies only to talking_photo type. Default: stable. |
expression | string | No | default or happy. Applies only to talking_photo type. |
scale | float | No | Avatar size. Range: 0.0–5.0. Default: 1. |
offset | object | No | Position adjustment: { "x": 0.0, "y": 0.0 }. |
use_avatar_iv_model | boolean | No | Whether to use Avatar IV. |
prompt | string | No | Avatar IV motion prompt. Applies to talking_photo type when use_avatar_iv_model is true. |
keep_original_prompt | boolean | No | Preserve motion prompt as-is (skip enhancement). Applies when use_avatar_iv_model is true. |
matting | boolean | No | Remove photo background. |
super_resolution | boolean | No | Enhance image quality. Applies only to talking_photo type. |
circle_background_color | string | No | Hex color for circle style background (e.g., #FFFFFF). |
voice
Defines what the avatar says in this scene.
| Parameter | Type | Required | Description |
|---|
type | string | Yes | text, audio, or silence. |
voice_id | string | Yes* | Voice identifier. Required for text type. |
input_text | string | Yes* | Text the avatar will speak. Required for text type. |
speed | float | No | Voice speed. Range: 0.5–1.5. Default: 1. Applies to text type. |
pitch | integer | No | Voice pitch. Range: -50–50. Default: 0. Applies to text type. |
emotion | string | No | Excited, Friendly, Serious, Soothing, or Broadcaster. Applies to text type. |
locale | string | No | Voice accent/locale (e.g., en-US, pt-BR). Applies to text type. |
audio_url | string | Yes* | URL of uploaded audio. Required for audio type (provide either this or audio_asset_id). |
audio_asset_id | string | Yes* | Asset ID of uploaded audio. Required for audio type (provide either this or audio_url). |
duration | string | No | Silence duration in seconds. Range: 1.0–100.0. Default: 1. Applies to silence type. |
elevenlabs_settings | object | No | Advanced ElevenLabs voice settings (see below). Applies to text type. |
ElevenLabs Settings:
| Parameter | Type | Description |
|---|
model | string | ElevenLabs model: eleven_monolingual_v1, eleven_multilingual_v1, eleven_multilingual_v2, eleven_turbo_v2, eleven_turbo_v2_5, or eleven_v3. |
similarity_boost | float | Similarity to original voice. Range: 0.0–1.0. |
stability | float | Voice consistency. Range: 0.0–1.0. For eleven_v3, default is 1.0 and allowed values are 0, 0.5, 1.0. |
style | float | Style intensity. Range: 0.0–1.0. |
background
Defines the scene background.
| Parameter | Type | Required | Description |
|---|
type | string | Yes | color, image, or video. |
value | string | Yes* | Hex color code (e.g., #FFFFFF). Required for color type. |
url | string | Yes* | URL of uploaded image/video. Required for image/video type (provide either this or the corresponding asset ID). |
image_asset_id | string | Yes* | Asset ID for image background. Provide either this or url. |
video_asset_id | string | Yes* | Asset ID for video background. Provide either this or url. |
play_style | string | No | Playback mode: freeze, loop, or fit_to_scene. Applies to video type. |
fit | string | No | How background fits the screen: crop, cover, contain, or none. Default: cover. |
text
Optional on-screen text overlay.
| Parameter | Type | Required | Description |
|---|
type | string | Yes | Must be text. |
text | string | Yes | Text content to display. |
font_family | string | No | Font family (e.g., Arial). |
font_size | float | No | Font size in points. |
font_weight | string | No | bold. |
color | string | No | Text color in hex (e.g., #FFFFFF). |
position | object | No | Position: { "x": 0.0, "y": 0.0 }. |
text_align | string | No | left, center, or right. |
line_height | float | Yes | Line height / spacing between lines. |
width | number | No | Text container width. |
Example Request
{
"title": "My Legacy Video",
"caption": false,
"dimension": {
"width": 1920,
"height": 1080
},
"video_inputs": [
{
"character": {
"type": "avatar",
"avatar_id": "YOUR_AVATAR_ID",
"avatar_style": "normal"
},
"voice": {
"type": "text",
"voice_id": "YOUR_VOICE_ID",
"input_text": "Welcome to the first scene of this video.",
"speed": 1.0
},
"background": {
"type": "color",
"value": "#1a1a2e"
}
},
{
"character": {
"type": "avatar",
"avatar_id": "YOUR_AVATAR_ID",
"avatar_style": "closeUp"
},
"voice": {
"type": "text",
"voice_id": "YOUR_VOICE_ID",
"input_text": "And here is the second scene with a different style."
},
"background": {
"type": "color",
"value": "#16213e"
}
}
]
}
Response
200 — Success
{
"error": null,
"data": {
"video_id": "af273759c9xa47369e05418c69drq174"
}
}
| Field | Type | Description |
|---|
error | string | null | Error message if the request fails; null on success. |
data.video_id | string | Unique identifier of the generated video. |
Full API Reference
For complete details, see the Create Avatar Video (V2) endpoint documentation.