Stream Avatar Realtime Word Timestamps
Open a Server-Sent Events (text/event-stream) stream of per-word timestamps for a session. Each data: frame is a WordBatch ({"words": [{"word", "start", "end"}, ...]}); times are in seconds from the start of the streamed audio (matching OpenAI Whisper and ElevenLabs). Punctuation is emitted as its own word event. Batches are capped at ~1s of audio or 10 words, whichever comes first. Late subscribers receive the full session history; completed sessions are served from a durable snapshot. The stream ends with a single event: end frame carrying a WordsEndEvent.
Authorizations
HeyGen API key. Obtain from your HeyGen dashboard.
Path Parameters
Streaming session identifier returned by POST /v3/avatar-realtime.
Response
Server-Sent Events stream (text/event-stream). Each data: frame carries one WordBatch JSON payload; the stream terminates with a single event: end frame whose data is a WordsEndEvent.
One SSE data: payload — a batch of words that share a ~1s window.
Words in this batch, ordered by start time.

