qwen3-livetranslate-flash translates audio and video through the OpenAI-compatible chat completions endpoint. All requests are streamed.
Note: The DashScope interface is not supported.
Supported models
-
qwen3-livetranslate-flash -
qwen3-livetranslate-flash-2025-12-01
Prerequisites
Before you begin, complete the following:
-
Install the OpenAI SDK (for Python or Node.js)
Endpoints
| Region | SDK base_url |
HTTP endpoint |
|---|---|---|
| Singapore | https://dashscope-intl.aliyuncs.com/compatible-mode/v1 |
POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions |
| Beijing | https://dashscope.aliyuncs.com/compatible-mode/v1 |
POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions |
Quick start
The following examples translate an audio file and return both translated text and audio through streaming. Replace the base_url if you use the Beijing region.
Python
import os
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1", # Singapore
)
completion = client.chat.completions.create(
model="qwen3-livetranslate-flash",
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
"format": "wav",
},
}
],
}
],
modalities=["text", "audio"],
audio={"voice": "Cherry", "format": "wav"},
stream=True,
stream_options={"include_usage": True},
extra_body={"translation_options": {"source_lang": "zh", "target_lang": "en"}},
)
for chunk in completion:
print(chunk)
Node.js
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1", // Singapore
});
async function main() {
const completion = await client.chat.completions.create({
model: "qwen3-livetranslate-flash",
messages: [
{
role: "user",
content: [
{
type: "input_audio",
input_audio: {
data: "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
format: "wav",
},
},
],
},
],
modalities: ["text", "audio"],
audio: { voice: "Cherry", format: "wav" },
stream: true,
stream_options: { include_usage: true },
translation_options: { source_lang: "zh", target_lang: "en" },
});
for await (const chunk of completion) {
console.log(JSON.stringify(chunk));
}
}
main();
curl
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-livetranslate-flash",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
"format": "wav"
}
}
]
}
],
"modalities": ["text", "audio"],
"audio": {
"voice": "Cherry",
"format": "wav"
},
"stream": true,
"stream_options": {
"include_usage": true
},
"translation_options": {
"source_lang": "zh",
"target_lang": "en"
}
}'
Video input
To translate video instead of audio, set the content type to video_url:
messages = [
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
},
}
],
},
]
All other parameters remain the same.
Request body
Required parameters
| Parameter | Type | Description |
|---|---|---|
model |
string | Model name. Valid values: qwen3-livetranslate-flash, qwen3-livetranslate-flash-2025-12-01. |
messages |
array | An array of messages. Only one user message is supported. |
stream |
boolean | Must be true. Default is false, but only streaming output is supported, so you must set this to true. |
translation_options |
object | Translation configuration. See Translation options. This is a non-standard OpenAI parameter. In the Python SDK, pass it inside extra_body. In Node.js or HTTP, pass it at the top level. |
Optional parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
modalities |
array | ["text"] |
Output modality. Set to ["text", "audio"] to receive both text and audio output, or ["text"] for text only. |
audio |
object | - | Output audio configuration. Required when modalities includes "audio". See Audio output options. |
stream_options |
object | - | Streaming configuration. See Stream options. |
max_tokens |
integer | Model maximum | The maximum number of tokens to generate. Generation stops at this limit or when complete. |
seed |
integer | - | Random seed for reproducibility. The same seed produces identical output for identical requests. Range: [0, 2^31-1]. |
Sampling parameters
For translation accuracy, keep these parameters at their default values.
| Parameter | Type | Default | Range | Notes |
|---|---|---|---|---|
temperature |
float | 0.000001 | [0, 2) | Controls output diversity. |
top_p |
float | 0.8 | (0, 1.0] | Nucleus sampling threshold. |
presence_penalty |
float | 0 | [-2.0, 2.0] | Reduces repetition when positive. |
top_k |
integer | 1 | >= 0 | Candidate set size. If the value is None or greater than 100, top_k is disabled and only top_p takes effect. Non-standard OpenAI parameter. Python SDK: use extra_body. |
repetition_penalty |
float | 1.05 | > 0 | Penalizes repeated sequences. Non-standard OpenAI parameter. Python SDK: use extra_body. |
Message object
The messages array must contain exactly one object with role set to user.
Properties of content array items:
| Field | Type | Required | Description |
|---|---|---|---|
type |
string | Yes | input_audio for audio input, video_url for video input. |
input_audio |
object | When type is input_audio |
Audio input. See below. |
video_url |
object | When type is video_url |
Video input. See below. |
input_audio object:
| Field | Type | Required | Description |
|---|---|---|---|
data |
string | Yes | URL of the audio file, or a Base64 data URL. For local files, see Input a Base64-encoded local file. |
format |
string | Yes | Audio format, such as mp3 or wav. |
video_url object:
| Field | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | Public URL of the video file, or a Base64 data URL. For local files, see Input a Base64-encoded local file. |
Translation options
| Field | Type | Required | Description |
|---|---|---|---|
source_lang |
string | No | Full English name of the source language. See Supported languages. If omitted, language is auto-detected. |
target_lang |
string | Yes | Full English name of the target language. See Supported languages. |
Note:translation_optionsis a non-standard OpenAI parameter. In the Python SDK, pass it insideextra_body: In Node.js or HTTP, pass it at the top level of the request body.
extra_body={"translation_options": {"source_lang": "zh", "target_lang": "en"}}
Audio output options
Required when modalities is ["text", "audio"].
| Field | Type | Required | Description |
|---|---|---|---|
voice |
string | Yes | Voice for the output audio. See Supported voices. |
format |
string | Yes | Output audio format. Only wav is supported. |
Stream options
| Field | Type | Default | Description |
|---|---|---|---|
include_usage |
boolean | false |
When true, the final chunk includes token usage details. |
Response
The API returns a series of streaming chunks, each as a chat.completion.chunk object. Chunks fall into three categories: text, audio, and token usage.
Text chunk
Contains incremental translated text in choices[0].delta.content:
{
"id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
"choices": [
{
"delta": {
"content": " of",
"role": null,
"audio": null
},
"finish_reason": null,
"index": 0
}
],
"created": 1764755440,
"model": "qwen3-livetranslate-flash",
"object": "chat.completion.chunk"
}
Audio chunk
Contains incremental Base64-encoded audio in choices[0].delta.audio.data:
{
"id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
"choices": [
{
"delta": {
"content": null,
"role": null,
"audio": {
"data": "///+//7////+////////////AAAAAAAAAAABA......",
"expires_at": 1764755440,
"id": "audio_c22a54b8-40cc-4a1d-988b-f84cdf86868f"
}
},
"finish_reason": null,
"index": 0
}
],
"created": 1764755440,
"model": "qwen3-livetranslate-flash",
"object": "chat.completion.chunk"
}
Token usage chunk
Returned as the final chunk when include_usage is true. The choices array is empty, and usage contains the token breakdown:
{
"id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
"choices": [],
"created": 1764755440,
"model": "qwen3-livetranslate-flash",
"object": "chat.completion.chunk",
"usage": {
"completion_tokens": 242,
"prompt_tokens": 415,
"total_tokens": 657,
"completion_tokens_details": {
"accepted_prediction_tokens": null,
"audio_tokens": 191,
"reasoning_tokens": null,
"rejected_prediction_tokens": null,
"text_tokens": 51
},
"prompt_tokens_details": {
"audio_tokens": 415,
"cached_tokens": null,
"text_tokens": 0,
"video_tokens": null
}
}
}
Note: For video input,prompt_tokens_details.audio_tokensincludes the audio tokens extracted from the video.video_tokensreports the video-specific token count.
Response fields
| Field | Type | Description |
|---|---|---|
id |
string | The request identifier. Identical across all chunks. |
choices |
array | Generated content. Empty in the final usage chunk. |
choices[].delta.content |
string | Incremental translated text. null in audio chunks. |
choices[].delta.audio |
object | Incremental audio data. null in text chunks. |
choices[].delta.audio.data |
string | Base64-encoded audio segment. |
choices[].delta.audio.id |
string | Unique identifier for the output audio. |
choices[].delta.audio.expires_at |
integer | Timestamp when the request was created. |
choices[].delta.role |
string | Message role. Present only in the first chunk. |
choices[].finish_reason |
string | stop when generation completes normally, length when truncated by max_tokens, null while in progress. |
choices[].index |
integer | Always 0. |
created |
integer | Unix timestamp for the request. Identical across all chunks. |
model |
string | The model name. |
object |
string | Always chat.completion.chunk. |
usage |
object | Token consumption. Present only in the final chunk when include_usage is true. |
usage.prompt_tokens |
integer | Total input tokens. |
usage.completion_tokens |
integer | Total output tokens. |
usage.total_tokens |
integer | Sum of prompt_tokens and completion_tokens. |
usage.completion_tokens_details.audio_tokens |
integer | Output audio tokens. |
usage.completion_tokens_details.text_tokens |
integer | Output text tokens. |
usage.prompt_tokens_details.audio_tokens |
integer | Input audio tokens. For video input, this includes audio extracted from the video. |
usage.prompt_tokens_details.text_tokens |
integer | Input text tokens. Always 0. |
usage.prompt_tokens_details.video_tokens |
integer | Input video tokens. Present only for video input. |
Fields fixed to null
The following fields are present in the response for OpenAI compatibility but always return null:
reasoning_content, function_call, refusal, tool_calls, logprobs, service_tier, system_fingerprint
Usage notes
-
Streaming only. Set
streamtotrue. Non-streaming calls are unsupported. -
Single message. The
messagesarray accepts one user message only. -
Non-standard parameters.
translation_options,top_k, andrepetition_penaltyare not in the standard OpenAI API. Python SDK: pass inextra_body. Node.js/HTTP: include at top level. -
Sampling defaults. Defaults for
temperature,top_p,top_k,presence_penalty, andrepetition_penaltyare optimized for translation accuracy. Changing them may degrade quality. -
Output audio format. Only
wavis supported. -
Automatic language detection. If
source_langis omitted, the input language is auto-detected.