Stream Avatar Realtime Word Timestamps

curl --request GET \
  --url https://api.heygen.com/v3/avatar-realtime/{stream_id}/words \
  --header 'x-api-key: <api-key>'

{
  "words": [
    {
      "word": "<string>",
      "start": 123,
      "end": 123
    }
  ]
}

Avatar Realtime

Stream Avatar Realtime Word Timestamps

Open a Server-Sent Events (text/event-stream) stream of per-word timestamps for a session. Each data: frame is a WordBatch ({"words": [{"word", "start", "end"}, ...]}); times are in seconds from the start of the streamed audio (matching OpenAI Whisper and ElevenLabs). Punctuation is emitted as its own word event. Batches are capped at ~1s of audio or 10 words, whichever comes first. Late subscribers receive the full session history; completed sessions are served from a durable snapshot. The stream ends with a single event: end frame carrying a WordsEndEvent.

GET

avatar-realtime

{stream_id}

words

Stream Avatar Realtime Word Timestamps

curl --request GET \
  --url https://api.heygen.com/v3/avatar-realtime/{stream_id}/words \
  --header 'x-api-key: <api-key>'

{
  "words": [
    {
      "word": "<string>",
      "start": 123,
      "end": 123
    }
  ]
}

Authorizations

x-api-key

string

header

required

HeyGen API key. Obtain from your HeyGen dashboard.

Path Parameters

stream_id

string

required

Streaming session identifier returned by POST /v3/avatar-realtime.

Response

Server-Sent Events stream (text/event-stream). Each data: frame carries one WordBatch JSON payload; the stream terminates with a single event: end frame whose data is a WordsEndEvent.

One SSE data: payload — a batch of words that share a ~1s window.

words

WordEvent · object[]

required

Words in this batch, ordered by start time.

Show child attributes

Get avatar realtime session Append avatar realtime text

⌘I