"precision" Best for: high-quality final delivery, talking-head videos, and content where accurate lip-sync is critical. For faster turnaround at lower fidelity, see Speed mode. For many videos in one job, see Bulk Video Translation.
How Precision Mode Works
Precision mode uses avatar inference and multiple models to re-render the speaker’s mouth movements to match the translated audio—producing significantly more realistic lip-sync than Speed mode. It requires longer processing time and is recommended for polished, client-facing, or broadcast-quality output.Quick Start
1. List Supported Languages
Fetch available target language codes viaGET /v3/video-translations/languages:
2. Submit a Translation (Single Language)
Full schema:POST /v3/video-translations.
Batch (Multiple Languages)
3. Poll for Status
UseGET /v3/video-translations/{video_translation_id}. Skip polling by passing callback_url — see Webhooks.
| Status | Meaning |
|---|---|
pending | Queued |
running | Avatar inference in progress |
completed | Done — video_url is available |
failed | Check failure_message |
Precision mode takes longer than Speed mode — plan polling intervals accordingly (e.g. every 30–60 seconds for longer videos).
Source Video Input
| Type | Example |
|---|---|
| URL | { "type": "url", "url": "https://example.com/video.mp4" } |
| Asset ID | { "type": "asset_id", "asset_id": "<asset_id>" } |
The URL must be publicly accessible (test by opening in an incognito browser). To use anasset_id, upload first viaPOST /v3/assets— see the Upload Assets guide.
Precision Mode Options
These parameters are particularly relevant for Precision mode:| Parameter | Default | Description |
|---|---|---|
mode | "speed" | Set to "precision" to enable avatar inference |
speaker_num | auto | Number of speakers |
translate_audio_only | false | When true, skips avatar inference and only dubs audio (negates precision benefit) |
enable_dynamic_duration | true | Allows output duration to vary to match natural speech pacing |
disable_music_track | false | Strips background music from output |
enable_speech_enhancement | false | Improves speech audio quality |
enable_caption | false | Generates captions alongside the video |
brand_voice_id | — | Apply a custom brand voice (requires setup) |
srt | — | Custom subtitle file — Enterprise plan only |
srt_role | — | "input" or "output" — which video the SRT applies to. Enterprise only |
callback_url | — | Webhook URL notified on completion or failure |
callback_id | — | Your own ID, echoed back in the webhook payload |
Tip: Setting speaker_num is especially important in Precision mode — accurate speaker separation directly improves the quality of avatar inference per speaker.
Captions
To enable captions, setenable_caption: true in the translation request. Once completed, download them:
srt, vtt.
Proofread Before Finalizing
Precision mode fully supports the proofread workflow — review and edit subtitles before committing to the full avatar inference render. This is especially valuable in Precision mode since generation takes longer and costs more. Reference: Create · Get · Download SRT · Upload SRT · Generate Final Video.Step 1 — Create Proofread Session
Full schema:POST /v3/video-translations/proofreads.
proofread_ids — one per language.
Step 2 — Poll Until completed
GET /v3/video-translations/proofreads/{proofread_id}.
Step 3 — Download & Edit the SRT
Download viaGET /v3/video-translations/proofreads/{proofread_id}/srt; upload the revised file via Upload Proofread SRT.
srt_url file locally, then upload the revised version:
Step 4 — Generate Final Video
POST /v3/video-translations/proofreads/{proofread_id}/generate.
video_translation_id to poll via GET /v3/video-translations/{video_translation_id}.
Other Operations
List All Translations
GET /v3/video-translations.
has_more + next_token for pagination.
Delete a Translation
DELETE /v3/video-translations/{video_translation_id}.
When to Use Speed vs. Precision
| Speed | Precision | |
|---|---|---|
| Processing Time | Faster | Slower |
| Translation | Adequate | Context- and Gender-Aware |
| Lip-Sync Quality | Standard | High |
| Best For | Faces with little movement, quick drafts | Faces with significant movement, side angles, or occlusions; final delivery videos |

