API reference for qwen3-livetranslate-flash - Alibaba Cloud Model Studio

This topic describes the input and output parameters for qwen3-livetranslate-flash through an OpenAI-compatible interface.

Reference: Audio and video translation - Qwen

The DashScope interface is not supported.

OpenAI compatibility

Singapore region

The base_url for SDK: https://dashscope-intl.aliyuncs.com/compatible-mode/v1

The endpoint for HTTP: POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions

Beijing region

The base_url for SDK: https://dashscope.aliyuncs.com/compatible-mode/v1

The endpoint for HTTP: POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions

First create an API key and export the API key as an environment variable. To use the OpenAI SDK, install the SDK.

Request body

Python

import os
from openai import OpenAI

client = OpenAI(
    # If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

# ---------------- Audio input ----------------
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "input_audio",
                "input_audio": {
                    "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
                    "format": "wav",
                },
            }
        ],
    }
]

# ---------------- Video input (uncomment to use) ----------------
# messages = [
#     {
#         "role": "user",
#         "content": [
#             {
#                 "type": "video_url",
#                 "video_url": {
#                     "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
#                 },
#             }
#         ],
#     },
# ]

completion = client.chat.completions.create(
    model="qwen3-livetranslate-flash",
    messages=messages,
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    stream=True,
    stream_options={"include_usage": True},
    extra_body={"translation_options": {"source_lang": "zh", "target_lang": "en"}},
)

for chunk in completion:
    print(chunk)

Node.js

import OpenAI from "openai";

const client = new OpenAI({
    // If you have not configured the environment variable, replace the following line with your Model Studio API key: apiKey: "sk-xxx",
    apiKey: process.env.DASHSCOPE_API_KEY,
    // The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
    baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
});

// ---------------- Audio input ----------------
const messages = [
    {
        role: "user",
        content: [
            {
                type: "input_audio",
                input_audio: {
                    data: "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
                    format: "wav",
                },
            },
        ],
    },
];

// ---------------- Video input (uncomment to use) ----------------
// const messages = [
//     {
//         role: "user",
//         content: [
//             {
//                 type: "video_url",
//                 video_url: {
//                     url: "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4",
//                 },
//             },
//         ],
//     },
// ];

async function main() {
    const completion = await client.chat.completions.create({
        model: "qwen3-livetranslate-flash",
        messages: messages,
        modalities: ["text", "audio"],
        audio: { voice: "Cherry", format: "wav" },
        stream: true,
        stream_options: { include_usage: true },
        translation_options: { source_lang: "zh", target_lang: "en" },
    });

    for await (const chunk of completion) {
        console.log(JSON.stringify(chunk));
    }
}

main();

curl

# ======= Important =======
# The following is an example for the Singapore region. If you use a model in the Beijing region, replace the request URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-livetranslate-flash",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
                        "format": "wav"
                    }
                }
            ]
        }
    ],
    "modalities": ["text", "audio"],
    "audio": {
        "voice": "Cherry",
        "format": "wav"
    },
    "stream": true,
    "stream_options": {
        "include_usage": true
    },
    "translation_options": {
        "source_lang": "zh",
        "target_lang": "en"
    }
}'

model string (Required)

The model name. The supported models are qwen3-livetranslate-flash and qwen3-livetranslate-flash-2025-12-01.

messages array (Required)

An array of messages that provides context to the large language model. Only one user message is supported.

Message type

User message object (Required)

The user message.

Properties

content array (Required)

The message content.

Properties

type string (Required)

Valid values:

input_audio
Set the value to input_audio for audio input.
video_url
Set the value to video_url for video file input.

input_audio object

The input audio information. This parameter is required when type is input_audio.

Properties

data string (Required)

The URL of the audio file or a Base64 data URL. For more information about how to pass a local file, see Input a Base64-encoded local file.

format string (Required)

The format of the input audio, such as mp3 or wav.

video_url object

The input video file information. This parameter is required when type is video_url.

Properties

url string (Required)

The public URL of the video file or a Base64 data URL. For more information about how to input a local video file, see Input a Base64-encoded local file.

role string (Required)

The role of the user message. The value is fixed to user.

stream boolean (Required) Defaults to:false

Specifies whether to return the output in a streaming manner. The model supports only calls with streaming output. You must set this parameter to true.

stream_options object (Optional)

The configuration items for streaming output. This parameter takes effect only when the stream parameter is set to true.

Properties

include_usage boolean (Optional) Defaults to:false

Specifies whether to include token usage information in the last data block.

Valid values:

true
false

modalities array (Optional) Defaults to:["text"]

The modality of the output data. The valid values are:

["text","audio"]: Outputs text and audio.
["text"]: Outputs only text.

audio object (Optional)

The voice and format of the output audio. The modalities parameter must be set to ["text","audio"].

Properties

voice string (Required)

The voice of the output audio. For more information, see Supported voices.

format string (Required)

The format of the output audio. Only wav is supported.

max_tokens integer (Optional)

The maximum number of tokens to generate. If the generated content exceeds this value, the response is truncated.

The default and maximum values are both the maximum output length of the model. For more information, see Model selection.

seed integer (Optional)

The random number seed. This parameter ensures that results are reproducible when you use the same input and parameters. If you use the same seed and other parameters for a call, the model returns the same result as much as possible.

Value range: [0, 2³¹-1].

temperature float (Optional) Defaults to:0.000001

The sampling temperature, which controls the diversity of the generated content. A higher temperature results in more diverse content, while a lower temperature results in more deterministic content.

Value range: [0, 2)

For translation accuracy, we recommend that you do not change this value.

top_p float (Optional) Defaults to:0.8

The probability threshold for nucleus sampling, which controls the diversity of the generated content.

A higher top_p value results in more diverse content. A lower value results in more deterministic content.

Value range: (0, 1.0]

For translation accuracy, we recommend that you do not change this value.

presence_penalty float (Optional) Defaults to:0

Controls the repetition of content in the generated text.

Value range: [-2.0, 2.0]. A positive value reduces repetition, and a negative value increases repetition. For translation accuracy, we recommend that you do not change this value.

top_k integer (Optional) Defaults to:1

The size of the candidate set for sampling during generation. For example, if you set this parameter to 50, only the 50 tokens with the highest scores in a single generation are used to form the candidate set for random sampling. A larger value increases randomness, and a smaller value increases determinism. If the value is None or greater than 100, the top_k policy is not enabled, and only the top_p policy takes effect.

The value must be greater than or equal to 0. For translation accuracy, we recommend that you do not change this value.

This parameter is not a standard OpenAI parameter. When you use the Python SDK, place this parameter in the extra_body object. For example: extra_body={"top_k": xxx}. When you use the Node.js SDK or make an HTTP call, pass this parameter at the top level.

repetition_penalty float (Optional) Defaults to:1.05

The degree of repetition in consecutive sequences during model generation. A higher repetition_penalty value reduces repetition. A value of 1.0 indicates no penalty. The value must be greater than 0. For translation accuracy, we recommend that you do not change this value.

This parameter is not a standard OpenAI parameter. When you use the Python SDK, place this parameter in the extra_body object. For example: extra_body={"repetition_penalty": xxx}. When you use the Node.js SDK or make an HTTP call, pass this parameter at the top level.

translation_options object (Required)

The translation parameters.

Properties

source_lang string (Optional)

The full English name of the source language, see Supported languages. If you do not set this parameter, the model automatically detects the input language.

target_lang string (Required)

The full English name of the target language, see Supported languages.

This parameter is not a standard OpenAI parameter. When you use the Python SDK, place this parameter in the extra_body object. For example: extra_body={"translation_options": xxx}. When you use the Node.js SDK or make an HTTP call, pass this parameter at the top level.

Chat response chunk object (streaming output)	Text output chunk `{ "id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f", "choices": [ { "delta": { "content": " of", "function_call": null, "refusal": null, "role": null, "tool_calls": null }, "finish_reason": null, "index": 0, "logprobs": null } ], "created": 1764755440, "model": "qwen3-livetranslate-flash", "object": "chat.completion.chunk", "service_tier": null, "system_fingerprint": null, "usage": null }` Audio output chunk { "id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f", "choices": [ { "delta": { "content": null, "function_call": null, "refusal": null, "role": null, "tool_calls": null, "audio": { "data": "///+//7////+////////////AAAAAAAAAAABA......", "expires_at": 1764755440, "id": "audio_c22a54b8-40cc-4a1d-988b-f84cdf86868f" } }, "finish_reason": null, "index": 0, "logprobs": null } ], "created": 1764755440, "model": "qwen3-livetranslate-flash", "object": "chat.completion.chunk", "service_tier": null, "system_fingerprint": null, "usage": null } Token usage chunk { "id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f", "choices": [], "created": 1764755440, "model": "qwen3-livetranslate-flash", "object": "chat.completion.chunk", "service_tier": null, "system_fingerprint": null, "usage": { "completion_tokens": 242, "prompt_tokens": 415, "total_tokens": 657, "completion_tokens_details": { "accepted_prediction_tokens": null, "audio_tokens": 191, "reasoning_tokens": null, "rejected_prediction_tokens": null, "text_tokens": 51 }, "prompt_tokens_details": { "audio_tokens": 415, "cached_tokens": null, "text_tokens": 0 } } }
id `string` The unique identifier for this call. Each chunk object has the same ID.
choices `array` An array of content generated by the model. If you set the `include_usage` parameter to `true`, `choices` is an empty array in the last chunk. Properties delta `object` The incremental object that was requested. Properties content `string` The incremental message content. reasoning_content `string` This value is fixed to `null`. function_call `object` This value is fixed to `null`. audio `object` The output audio information. Properties data `string` The incremental Base64-encoded audio data. expires_at `integer` The timestamp when the request was created. id `string` The unique identifier for the output audio. refusal `object` This value is fixed to `null`. role `string` The role of the incremental message object. This field has a value only in the first chunk. tool_calls `array` This value is fixed to `null`. finish_reason `string` The reason why the model stopped generating content. The possible values are: If the output stops naturally, the value is `stop`. If generation is not finished, the value is `null`. If generation stops because the output is too long, the value is `length`. index `integer` The index of the current response in the `choices` array. The value is fixed to 0. logprobs `object` This value is fixed to `null`.
created `integer` The timestamp when this request was created. Each chunk has the same timestamp.
model `string` The model used for this request.
object `string` The value is always `chat.completion.chunk`.
service_tier `string` This value is fixed to `null`.
system_fingerprint `string` This value is fixed to `null`.
usage `object` The tokens consumed by this request. This field is displayed in the last chunk only when the `include_usage` parameter is set to `true`. Properties completion_tokens `integer` The number of tokens in the model output. prompt_tokens `integer` The number of input tokens. total_tokens `integer` The total number of tokens, which is the sum of `prompt_tokens` and `completion_tokens`. completion_tokens_details `object` Detailed information about the output tokens. Properties audio_tokens `integer` The number of output audio tokens. reasoning_tokens `integer` This value is fixed to `null`. text_tokens `integer` The number of output text tokens. prompt_tokens_details `object` A fine-grained classification of input tokens. Properties audio_tokens `integer` The number of input audio tokens. This parameter returns the number of audio tokens in a video file. text_tokens `integer` The number of input text tokens. This value is fixed to 0. video_tokens `integer` The number of input video tokens.

OpenAI compatibility

Singapore region

Beijing region

Request body

Chat response chunk object (streaming output)

Text output chunk

Audio output chunk

Token usage chunk