All Products
Search
Document Center

Alibaba Cloud Model Studio:Audio and video translation - Qwen API reference

Last Updated:Dec 29, 2025

This topic describes the input and output parameters for qwen3-livetranslate-flash through an OpenAI-compatible interface.

Reference: Audio and video translation - Qwen
The DashScope interface is not supported.

OpenAI compatibility

Singapore region

The base_url for SDK: https://dashscope-intl.aliyuncs.com/compatible-mode/v1

The endpoint for HTTP: POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions

Beijing region

The base_url for SDK: https://dashscope.aliyuncs.com/compatible-mode/v1

The endpoint for HTTP: POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions

First create an API key and export the API key as an environment variable. To use the OpenAI SDK, install the SDK.

Request body

import os
from openai import OpenAI

client = OpenAI(
    # If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

# ---------------- Audio input ----------------
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "input_audio",
                "input_audio": {
                    "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
                    "format": "wav",
                },
            }
        ],
    }
]

# ---------------- Video input (uncomment to use) ----------------
# messages = [
#     {
#         "role": "user",
#         "content": [
#             {
#                 "type": "video_url",
#                 "video_url": {
#                     "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
#                 },
#             }
#         ],
#     },
# ]

completion = client.chat.completions.create(
    model="qwen3-livetranslate-flash",
    messages=messages,
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    stream=True,
    stream_options={"include_usage": True},
    extra_body={"translation_options": {"source_lang": "zh", "target_lang": "en"}},
)

for chunk in completion:
    print(chunk)
import OpenAI from "openai";

const client = new OpenAI({
    // If you have not configured the environment variable, replace the following line with your Model Studio API key: apiKey: "sk-xxx",
    apiKey: process.env.DASHSCOPE_API_KEY,
    // The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1
    baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
});

// ---------------- Audio input ----------------
const messages = [
    {
        role: "user",
        content: [
            {
                type: "input_audio",
                input_audio: {
                    data: "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
                    format: "wav",
                },
            },
        ],
    },
];

// ---------------- Video input (uncomment to use) ----------------
// const messages = [
//     {
//         role: "user",
//         content: [
//             {
//                 type: "video_url",
//                 video_url: {
//                     url: "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4",
//                 },
//             },
//         ],
//     },
// ];

async function main() {
    const completion = await client.chat.completions.create({
        model: "qwen3-livetranslate-flash",
        messages: messages,
        modalities: ["text", "audio"],
        audio: { voice: "Cherry", format: "wav" },
        stream: true,
        stream_options: { include_usage: true },
        translation_options: { source_lang: "zh", target_lang: "en" },
    });

    for await (const chunk of completion) {
        console.log(JSON.stringify(chunk));
    }
}

main();
# ======= Important =======
# The following is an example for the Singapore region. If you use a model in the Beijing region, replace the request URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-livetranslate-flash",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
                        "format": "wav"
                    }
                }
            ]
        }
    ],
    "modalities": ["text", "audio"],
    "audio": {
        "voice": "Cherry",
        "format": "wav"
    },
    "stream": true,
    "stream_options": {
        "include_usage": true
    },
    "translation_options": {
        "source_lang": "zh",
        "target_lang": "en"
    }
}'

model string (Required)

The model name. The supported models are qwen3-livetranslate-flash and qwen3-livetranslate-flash-2025-12-01.

messages array (Required)

An array of messages that provides context to the large language model. Only one user message is supported.

Message type

User message object (Required)

The user message.

Properties

content array (Required)

The message content.

Properties

type string (Required)

Valid values:

  • input_audio

    Set the value to input_audio for audio input.

  • video_url

    Set the value to video_url for video file input.

input_audio object

The input audio information. This parameter is required when type is input_audio.

Properties

data string (Required)

The URL of the audio file or a Base64 data URL. For more information about how to pass a local file, see Input a Base64-encoded local file.

format string (Required)

The format of the input audio, such as mp3 or wav.

video_url object

The input video file information. This parameter is required when type is video_url.

Properties

url string (Required)

The public URL of the video file or a Base64 data URL. For more information about how to input a local video file, see Input a Base64-encoded local file.

role string (Required)

The role of the user message. The value is fixed to user.

stream boolean (Required) Defaults to:false

Specifies whether to return the output in a streaming manner. The model supports only calls with streaming output. You must set this parameter to true.

stream_options object (Optional)

The configuration items for streaming output. This parameter takes effect only when the stream parameter is set to true.

Properties

include_usage boolean (Optional) Defaults to:false

Specifies whether to include token usage information in the last data block.

Valid values:

  • true

  • false

modalities array (Optional) Defaults to:["text"]

The modality of the output data. The valid values are:

  • ["text","audio"]: Outputs text and audio.

  • ["text"]: Outputs only text.

audio object (Optional)

The voice and format of the output audio. The modalities parameter must be set to ["text","audio"].

Properties

voice string (Required)

The voice of the output audio. For more information, see Supported voices.

format string (Required)

The format of the output audio. Only wav is supported.

max_tokens integer (Optional)

The maximum number of tokens to generate. If the generated content exceeds this value, the response is truncated.

The default and maximum values are both the maximum output length of the model. For more information, see Model selection.

seed integer (Optional)

The random number seed. This parameter ensures that results are reproducible when you use the same input and parameters. If you use the same seed and other parameters for a call, the model returns the same result as much as possible.

Value range: [0, 231-1].

temperature float (Optional) Defaults to:0.000001

The sampling temperature, which controls the diversity of the generated content. A higher temperature results in more diverse content, while a lower temperature results in more deterministic content.

Value range: [0, 2)

For translation accuracy, we recommend that you do not change this value.

top_p float (Optional) Defaults to:0.8

The probability threshold for nucleus sampling, which controls the diversity of the generated content.

A higher top_p value results in more diverse content. A lower value results in more deterministic content.

Value range: (0, 1.0]

For translation accuracy, we recommend that you do not change this value.

presence_penalty float (Optional) Defaults to:0

Controls the repetition of content in the generated text.

Value range: [-2.0, 2.0]. A positive value reduces repetition, and a negative value increases repetition. For translation accuracy, we recommend that you do not change this value.

top_k integer (Optional) Defaults to:1

The size of the candidate set for sampling during generation. For example, if you set this parameter to 50, only the 50 tokens with the highest scores in a single generation are used to form the candidate set for random sampling. A larger value increases randomness, and a smaller value increases determinism. If the value is None or greater than 100, the top_k policy is not enabled, and only the top_p policy takes effect.

The value must be greater than or equal to 0. For translation accuracy, we recommend that you do not change this value.

This parameter is not a standard OpenAI parameter. When you use the Python SDK, place this parameter in the extra_body object. For example: extra_body={"top_k": xxx}. When you use the Node.js SDK or make an HTTP call, pass this parameter at the top level.

repetition_penalty float (Optional) Defaults to:1.05

The degree of repetition in consecutive sequences during model generation. A higher repetition_penalty value reduces repetition. A value of 1.0 indicates no penalty. The value must be greater than 0. For translation accuracy, we recommend that you do not change this value.

This parameter is not a standard OpenAI parameter. When you use the Python SDK, place this parameter in the extra_body object. For example: extra_body={"repetition_penalty": xxx}. When you use the Node.js SDK or make an HTTP call, pass this parameter at the top level.

translation_options object (Required)

The translation parameters.

Properties

source_lang string (Optional)

The full English name of the source language, see Supported languages. If you do not set this parameter, the model automatically detects the input language.

target_lang string (Required)

The full English name of the target language, see Supported languages.

This parameter is not a standard OpenAI parameter. When you use the Python SDK, place this parameter in the extra_body object. For example: extra_body={"translation_options": xxx}. When you use the Node.js SDK or make an HTTP call, pass this parameter at the top level.

Chat response chunk object (streaming output)

Text output chunk

{
  "id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
  "choices": [
    {
      "delta": {
        "content": " of",
        "function_call": null,
        "refusal": null,
        "role": null,
        "tool_calls": null
      },
      "finish_reason": null,
      "index": 0,
      "logprobs": null
    }
  ],
  "created": 1764755440,
  "model": "qwen3-livetranslate-flash",
  "object": "chat.completion.chunk",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": null
}

Audio output chunk

{
  "id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
  "choices": [
    {
      "delta": {
        "content": null,
        "function_call": null,
        "refusal": null,
        "role": null,
        "tool_calls": null,
        "audio": {
          "data": "///+//7////+////////////AAAAAAAAAAABA......",
          "expires_at": 1764755440,
          "id": "audio_c22a54b8-40cc-4a1d-988b-f84cdf86868f"
        }
      },
      "finish_reason": null,
      "index": 0,
      "logprobs": null
    }
  ],
  "created": 1764755440,
  "model": "qwen3-livetranslate-flash",
  "object": "chat.completion.chunk",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": null
}

Token usage chunk

{
  "id": "chatcmpl-c22a54b8-40cc-4a1d-988b-f84cdf86868f",
  "choices": [],
  "created": 1764755440,
  "model": "qwen3-livetranslate-flash",
  "object": "chat.completion.chunk",
  "service_tier": null,
  "system_fingerprint": null,
  "usage": {
    "completion_tokens": 242,
    "prompt_tokens": 415,
    "total_tokens": 657,
    "completion_tokens_details": {
      "accepted_prediction_tokens": null,
      "audio_tokens": 191,
      "reasoning_tokens": null,
      "rejected_prediction_tokens": null,
      "text_tokens": 51
    },
    "prompt_tokens_details": {
      "audio_tokens": 415,
      "cached_tokens": null,
      "text_tokens": 0
    }
  }
}

id string

The unique identifier for this call. Each chunk object has the same ID.

choices array

An array of content generated by the model. If you set the include_usage parameter to true, choices is an empty array in the last chunk.

Properties

delta object

The incremental object that was requested.

Properties

content string

The incremental message content.

reasoning_content string

This value is fixed to null.

function_call object

This value is fixed to null.

audio object

The output audio information.

Properties

data string

The incremental Base64-encoded audio data.

expires_at integer

The timestamp when the request was created.

id string

The unique identifier for the output audio.

refusal object

This value is fixed to null.

role string

The role of the incremental message object. This field has a value only in the first chunk.

tool_calls array

This value is fixed to null.

finish_reason string

The reason why the model stopped generating content. The possible values are:

  • If the output stops naturally, the value is stop.

  • If generation is not finished, the value is null.

  • If generation stops because the output is too long, the value is length.

index integer

The index of the current response in the choices array. The value is fixed to 0.

logprobs object

This value is fixed to null.

created integer

The timestamp when this request was created. Each chunk has the same timestamp.

model string

The model used for this request.

object string

The value is always chat.completion.chunk.

service_tier string

This value is fixed to null.

system_fingerprint string

This value is fixed to null.

usage object

The tokens consumed by this request. This field is displayed in the last chunk only when the include_usage parameter is set to true.

Properties

completion_tokens integer

The number of tokens in the model output.

prompt_tokens integer

The number of input tokens.

total_tokens integer

The total number of tokens, which is the sum of prompt_tokens and completion_tokens.

completion_tokens_details object

Detailed information about the output tokens.

Properties

audio_tokens integer

The number of output audio tokens.

reasoning_tokens integer

This value is fixed to null.

text_tokens integer

The number of output text tokens.

prompt_tokens_details object

A fine-grained classification of input tokens.

Properties

audio_tokens integer

The number of input audio tokens.

This parameter returns the number of audio tokens in a video file.

text_tokens integer

The number of input text tokens. This value is fixed to 0.

video_tokens integer

The number of input video tokens.