All Products
Search
Document Center

Alibaba Cloud Model Studio:Qwen-Omni

Last Updated:Mar 27, 2026

The Qwen-Omni model accepts text combined with one other modality, such as an image, audio, or video, and generates responses in text or speech. It offers multiple human-like voices and supports multilingual and dialectal speech output. You can use it for applications such as text creation, visual recognition, and voice assistants.

QuickStart

Prerequisites

Call method: Qwen-Omni supports only streaming output. You must set the stream parameter to True. Non-streaming calls will fail.

The following example sends a text prompt to the Qwen-Omni API and returns a streaming response that contains both text and audio.

import os
import base64
import soundfile as sf
import numpy as np
from openai import OpenAI

# 1. Initialize the client
client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),  # Confirm the environment variable is set
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

# 2. Send the request
try:
    completion = client.chat.completions.create(
        model="qwen3-omni-flash",
        messages=[{"role": "user", "content": "Who are you?"}],
        modalities=["text", "audio"],  # Specify text and audio output
        audio={"voice": "Cherry", "format": "wav"},
        stream=True,  # Must be set to True
        stream_options={"include_usage": True},
    )

    # 3. Process the streaming response and decode the audio
    print("Model response:")
    audio_base64_string = ""
    for chunk in completion:
        # Process the text part
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="")

        # Collect the audio part
        if chunk.choices and hasattr(chunk.choices[0].delta, "audio") and chunk.choices[0].delta.audio:
            audio_base64_string += chunk.choices[0].delta.audio.get("data", "")

    # 4. Save the audio file
    if audio_base64_string:
        wav_bytes = base64.b64decode(audio_base64_string)
        audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
        sf.write("audio_assistant.wav", audio_np, samplerate=24000)
        print("\nAudio file saved to: audio_assistant.wav")

except Exception as e:
    print(f"Request failed: {e}")
// Setup instructions:
// Universal for Windows/Mac/Linux:
// 1. Ensure Node.js is installed (version >= 14 recommended)
// 2. Run the following command to install required dependencies:
//    npm install openai wav

import OpenAI from "openai";
import { createWriteStream } from 'node:fs';
import { Writer } from 'wav';

// Define an audio conversion function: convert a Base64 string and save it as a standard WAV audio file
async function convertAudio(audioString, audioPath) {
    try {
        // Decode the Base64 string into a Buffer
        const wavBuffer = Buffer.from(audioString, 'base64');
        // Create a WAV file write stream
        const writer = new Writer({
            sampleRate: 24000,  // Sample rate
            channels: 1,        // Mono
            bitDepth: 16        // 16-bit depth
        });
        // Create an output file stream and establish a pipeline connection
        const outputStream = createWriteStream(audioPath);
        writer.pipe(outputStream);

        // Write PCM data and end writing
        writer.write(wavBuffer);
        writer.end();

        // Use a Promise to wait for the file to finish writing
        await new Promise((resolve, reject) => {
            outputStream.on('finish', resolve);
            outputStream.on('error', reject);
        });

        // Add extra wait time to ensure audio integrity
        await new Promise(resolve => setTimeout(resolve, 800));

        console.log(`\nAudio file saved to: ${audioPath}`);
    } catch (error) {
        console.error('Error during audio processing:', error);
    }
}

// 1. Initialize the client
const openai = new OpenAI(
    {
        // The API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following URL is for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
// 2. Send the request
const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash",  
    messages: [
        {
            "role": "user",
            "content": "Who are you?"
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

let audioString = "";
console.log("Large language model response:")

// 3. Process the streaming response and decode the audio
for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        // Process text content
        if (chunk.choices[0].delta.content) {
            process.stdout.write(chunk.choices[0].delta.content);
        }
        // Process audio content
        if (chunk.choices[0].delta.audio) {
            if (chunk.choices[0].delta.audio["data"]) {
                audioString += chunk.choices[0].delta.audio["data"];
            }
        }
    }
}
// 4. Save the audio file
convertAudio(audioString, "audio_assistant.wav");
# ======= Important note =======
# API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# The following URL is for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-omni-flash",
    "messages": [
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    },
    "modalities":["text","audio"],
    "audio":{"voice":"Cherry","format":"wav"}
}'

Response

After you run the Python or Node.js code, the model's text response appears in the console, and an audio file named audio_assistant.wav is created in the same directory as your code file.

Large language model response:
I am a large language model developed by Alibaba Cloud. My name is Qwen. How can I help you?

Running the HTTP code directly returns text and Base64-encoded audio data in the audio field.

data: {"choices":[{"delta":{"content":"I"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757647879,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-a68eca3b-c67e-4666-a72f-73c0b4919860"}
data: {"choices":[{"delta":{"content":" am"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757647879,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-a68eca3b-c67e-4666-a72f-73c0b4919860"}
......
data: {"choices":[{"delta":{"audio":{"data":"/v8AAAAAAAAAAAAAAA...","expires_at":1757647879,"id":"audio_a68eca3b-c67e-4666-a72f-73c0b4919860"}},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757647879,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-a68eca3b-c67e-4666-a72f-73c0b4919860"}
data: {"choices":[{"finish_reason":"stop","delta":{"content":""},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1764763585,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-e8c82e9e-073e-4289-a786-a20eb444ac9c"}
data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":207,"completion_tokens":103,"total_tokens":310,"completion_tokens_details":{"audio_tokens":83,"text_tokens":20},"prompt_tokens_details":{"text_tokens":207}},"created":1757940330,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-9cdd5a26-f9e9-4eff-9dcc-93a878165afc"}

Applicable scope

Supported regions

  • Singapore: Use the API key for this region.

  • Beijing: Use the API key for this region.

Supported models

Compared to the Qwen-VL models, the Qwen-Omni model can:

  • Understand visual and audio information in video files.

  • Understand multimodal data.

  • Output audio.

It also performs well in visual and audio understanding.

We recommend using qwen3-omni-flash. Compared to qwen-omni-turbo, which is no longer updated, its capabilities have improved significantly:

  • Supports thinking and non-thinking modes. You can switch between them using the enable_thinking parameter. The thinking mode is disabled by default.

  • In non-thinking mode, for audio output:

    • qwen3-omni-flash-2025-12-01 supports up to 49 voices. qwen3-omni-flash-2025-09-15 and qwen3-omni-flash support up to 17 voices. Qwen-Omni-Turbo supports only 4 voices.

    • The number of supported languages has increased to 10. Qwen-Omni-Turbo supports only 2.

International (Singapore)

Commercial models

Commercial models offer newer features and improvements over open-source versions.

Model name

Version

Mode

Context length

Maximum input

Longest Chain of Thought

Maximum output

Free quota

(Note)

(Tokens)

qwen3-omni-flash

Capabilities are the same as qwen3-omni-flash-2025-12-01.

Stable version

Thinking mode

65,536

16,384

32,768

16,384

1 million tokens per model, regardless of modality.

Valid for 90 days after Model Studio activation.

Non-thinking mode

49,152

-

qwen3-omni-flash-2025-12-01

Snapshot version

Thinking mode

65,536

16,384

32,768

16,384

Non-thinking mode

49,152

-

qwen3-omni-flash-2025-09-15

Also known as qwen3-omni-flash-0915.

Snapshot version

Thinking mode

65,536

16,384

32,768

16,384

Non-thinking mode

49,152

-

More models

Model name

Version

Context length

Maximum input

Maximum output

Free quota

(Note)

(Tokens)

qwen-omni-turbo

Matches the capabilities of qwen-omni-turbo-2025-03-26.

Stable version

32,768

30,720

2,048

1 million tokens (not distinguished by modality)

Valid for 90 days after Model Studio activation.

qwen-omni-turbo-latest

Always matches the latest snapshot version.
The capabilities are identical.

Latest version

qwen-omni-turbo-2025-03-26

Also known as qwen-omni-turbo-0326.

Snapshot version

Open source models

Model

Context window

Max input

Max output

Free quota

Note

(tokens)

qwen2.5-omni-7b

32,768

30,720

2,048

1 million tokens (no modality distinction)

Valid for 90 days after activating Model Studio

Mainland China (Beijing)

Commercial models

Model name

Version

Mode

Context length

Maximum input

Maximum chain-of-thought

Maximum output

Free quota

(Note)

(Tokens)

qwen3-omni-flash

Currently matches qwen3-omni-flash-2025-12-01 in capability.

Stable version

Thinking mode

65,536

16,384

32,768

16,384

No free quota

Non-thinking mode

49,152

-

qwen3-omni-flash-2025-12-01

Snapshot version

Thinking mode

65,536

16,384

32,768

16,384

Non-thinking mode

49,152

-

qwen3-omni-flash-2025-09-15

Also known as qwen3-omni-flash-0915.

Snapshot version

Thinking mode

65,536

16,384

32,768

16,384

Non-thinking mode

49,152

-

More models

Model name

Version

Context length

Maximum input

Maximum output

Free quota

(Note)

(Tokens)

qwen-omni-turbo

Equivalent to qwen-omni-turbo-2025-03-26.

Stable version

32,768

30,720

2,048

No free quota

qwen-omni-turbo-latest

Always matches the latest snapshot version.
The capabilities are identical.

Latest version

qwen-omni-turbo-2025-03-26

Also known as qwen-omni-turbo-0326.

Snapshot version

qwen-omni-turbo-2025-01-19

Also known as qwen-omni-turbo-0119.

Open source models

Model name

Context window

Max input

Max output

Free quota

(Note)

(tokens)

qwen2.5-omni-7b

32,768

30,720

2,048

No free quota

Usage

Input

In a single user message, the content array can contain text and only one other modality, such as an image, audio, or video. Multiple non-text modalities are not supported.

Output

  • Supported output modalities: Audio output is Base64-encoded data. For more information about how to convert it to an audio file, see Work with audio output.

    Output modality

    modalitiesparameter value

    Response style

    Text

    ["text"] (default)

    Formal and written style.

    Text + audio

    ["text","audio"]

    Qwen3-Omni-Flash does not support audio output in thinking mode.

    Conversational style with filler words and interactive prompts.

    Qwen-Omni-Turbo does not support setting a system message when audio is included in the output modality.
  • Supported audio output languages:

    • Qwen-Omni-Turbo: Chinese (Mandarin) and English only.

    • Qwen3-Omni-Flash (non-thinking mode): Chinese (Mandarin and some dialects), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, and Korean.

  • Supported voices: You can configure the voice and file format using the audio parameter. For example, audio={"voice": "Cherry", "format": "wav"}:

    • File format (format): This parameter must be set to "wav".

    • Voice (voice): For a list of supported voices, see Voice list.

Limitations

  • Streaming output is mandatory: All requests to Qwen-Omni must have stream=True set.

  • Only qwen3-omni-flash is a hybrid thinking model. For instructions on how to call this model, see Enable or disable thinking mode. In thinking mode, audio output is not supported.

Enable or disable thinking mode

Qwen3-Omni-Flash is a hybrid thinking model. You can control the thinking mode using the enable_thinking parameter:

  • true: Enable thinking mode

  • false (default): Disable thinking mode

Qwen-Omni-Turbo is not a thinking model.

OpenAI compatible

import os
from openai import OpenAI

client = OpenAI(
    # API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-omni-flash",
    messages=[{"role": "user", "content": "Who are you?"}],

    # Enable or disable thinking mode. Audio output is not supported in thinking mode. qwen-omni-turbo does not support enable_thinking.
    extra_body={'enable_thinking': True},

    # Set the output modality. Currently supported in non-thinking mode: ["text","audio"] and ["text"]. Only ["text"] is supported in thinking mode.
    modalities=["text"],

    # Set the voice. The audio parameter is not supported in thinking mode.
    # audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True. Otherwise, an error occurs.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash",
    messages: [
        { role: "user", content: "Who are you?" }
    ],

    // stream must be set to True. Otherwise, an error occurs.
    stream: true,
    stream_options: {
        include_usage: true
    },
    // Enable or disable thinking mode. Audio output is not supported in thinking mode. qwen-omni-turbo does not support enable_thinking.
    extra_body:{'enable_thinking': true},
    // Set the output modality. Currently supported in non-thinking mode: ["text","audio"] and ["text"]. Only ["text"] is supported in thinking mode.
    modalities: ["text"],
    // Set the voice. The audio parameter is not supported in thinking mode.
    //audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= Important note =======
# API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-omni-flash",
    "messages": [
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    },
    "modalities":["text"],
    "enable_thinking": true
}'

Response

data: {"choices":[{"delta":{"content":null,"role":"assistant","reasoning_content":""},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1757937336,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
data: {"choices":[{"finish_reason":null,"logprobs":null,"delta":{"content":null,"reasoning_content":"Hmm"},"index":0}],"object":"chat.completion.chunk","usage":null,"reated":1757937336,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
data: {"choices":[{"delta":{"content":null,"reasoning_content":","},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"reated":1757937336,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
......
data: {"choices":[{"delta":{"content":"Tell me"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757937336,"tem_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
data: {"choices":[{"delta":{"content":"!"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757937336,"systm_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757937336,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":11,"completion_tokens":363,"total_tokens":374,"completion_tokens_details":{"reasoning_tokens":195,"text_tokens":168},"prompt_tokens_details":{"text_tokens":11}},"created":1757937336,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}

Image + text input

Qwen-Omni supports multiple images per request. The image requirements are as follows:

  • Each image file must be no larger than 10 MB.

  • Number of images: Up to 2048 images when using public network URLs or local paths. Up to 250 images when using Base64 encoding.

    The total number of tokens for images and text must not exceed the model's maximum input limit.
  • Both the width and height must exceed 10 pixels. The aspect ratio must not exceed 200:1 or 1:200.

  • For a list of supported image types, see Visual and video understanding.

The following examples use a public network image URL. To use a local image instead, see Send local files with Base64 encoding. Streaming output is required for all calls.

OpenAI compatible

import os
from openai import OpenAI

client = OpenAI(
    # API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When using qwen3-omni-flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
                    },
                },
                {"type": "text", "text": "What scene is depicted in the image?"},
            ],
        },
    ],
    # Set the output modality. Currently supported: ["text","audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True. Otherwise, an error occurs.
    stream=True,
    stream_options={
        "include_usage": True
    }
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", // When using qwen3-omni-flash, run in non-thinking mode.
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "image_url",
                "image_url": { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg" },
            },
            { "type": "text", "text": "What scene is depicted in the image?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= Important note =======
# API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===


curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-omni-flash",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
          }
        },
        {
          "type": "text",
          "text": "What scene is depicted in the image?"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    },
    "modalities":["text","audio"],
    "audio":{"voice":"Cherry","format":"wav"}
}'

Audio + text input

  • You can send only one audio file per request.

  • File size:

    • qwen3-omni-flash: Up to 100 MB. Maximum duration: 20 minutes.

    • qwen-omni-turbo: Up to 10 MB. Maximum duration: 3 minutes.

  • Supported formats: AMR, WAV, 3GP, 3GPP, AAC, and MP3.

The following examples use a public network audio URL. To use a local audio file instead, see Send local files with Base64 encoding. Streaming output is required for all calls.

OpenAI compatible

import os
from openai import OpenAI

client = OpenAI(
    # API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-omni-flash",# When using qwen3-omni-flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
                        "format": "wav",
                    },
                },
                {"type": "text", "text": "What is this audio about?"},
            ],
        },
    ],
    # Set the output modality. Currently supported: ["text","audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True. Otherwise, an error occurs.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", // When using qwen3-omni-flash, run in non-thinking mode.
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "input_audio",
                "input_audio": { "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav", "format": "wav" },
            },
            { "type": "text", "text": "What is this audio about?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= Important note =======
# API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-omni-flash",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_audio",
          "input_audio": {
            "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
            "format": "wav"
          }
        },
        {
          "type": "text",
          "text": "What is this audio about?"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    },
    "modalities":["text","audio"],
    "audio":{"voice":"Cherry","format":"wav"}
}'

Video + text input

The video input method can be an image list format or a video file format, which can interpret audio in the video.

The following examples use a public network video URL. To use a local video instead, see Send local files with Base64 encoding. Streaming output is required for all calls.

Image list format

Number of images

  • qwen3-omni-flash: Minimum: 2 images. Maximum: 128 images.

  • qwen-omni-turbo: Minimum: 4 images. Maximum: 80 images.

OpenAI compatible

import os
from openai import OpenAI

client = OpenAI(
    # API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When using qwen3-omni-flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "video",
                    "video": [
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg",
                    ],
                },
                {"type": "text", "text": "Describe the process shown in this video."},
            ],
        }
    ],
    # Set the output modality. Currently supported: ["text","audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True. Otherwise, an error occurs.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", // When using qwen3-omni-flash, run in non-thinking mode.
    messages: [{
        role: "user",
        content: [
            {
                type: "video",
                video: [
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
                ]
            },
            {
                type: "text",
                text: "Describe the process shown in this video."
            }
        ]
    }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= Important note =======
# API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-omni-flash",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "video",
                    "video": [
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
                    ]
                },
                {
                    "type": "text",
                    "text": "Describe the process shown in this video."
                }
            ]
        }
    ],
    "stream": true,
    "stream_options": {
        "include_usage": true
    },
    "modalities": ["text", "audio"],
    "audio": {
        "voice": "Cherry",
        "format": "wav"
    }
}'

Video file format (processes audio in the video)

  • You can send only one video file per request.

  • File size:

    • qwen3-omni-flash: Max 256 MB. Max duration: 150 seconds.

    • qwen-omni-turbo: Max 150 MB. Max duration: 40 seconds.

  • Supported formats: MP4, AVI, MKV, MOV, FLV, WMV, and more.

  • Visual and audio information in the video file are billed separately.

OpenAI compatible

import os
from openai import OpenAI

client = OpenAI(
    # API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When using qwen3-omni-flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "video_url",
                    "video_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
                    },
                },
                {"type": "text", "text": "What is the content of the video?"},
            ],
        },
    ],
    # Set the output modality. Currently supported: ["text","audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True. Otherwise, an error occurs.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", // When using qwen3-omni-flash, run in non-thinking mode.
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "video_url",
                "video_url": { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4" },
            },
            { "type": "text", "text": "What is the content of the video?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});


for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= Important note =======
# API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-omni-flash",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "video_url",
          "video_url": {
            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
          }
        },
        {
          "type": "text",
          "text": "What is the content of the video?"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options": {
        "include_usage": true
    },
    "modalities":["text","audio"],
    "audio":{"voice":"Cherry","format":"wav"}
}'

Multi-turn conversation

When you use Qwen-Omni for multi-turn conversations, you must follow these rules:

  • Assistant Message

    An Assistant Message, added to the messages array, can contain only text data.

  • User message

    A user message can contain only text or data of a single modality. In a multi-turn conversation, you can use different modalities across different user messages.

OpenAI compatible

import os
from openai import OpenAI

client = OpenAI(
    # API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When using qwen3-omni-flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3",
                        "format": "mp3",
                    },
                },
                {"type": "text", "text": "What is this audio about?"},
            ],
        },
        {
            "role": "assistant",
            "content": [{"type": "text", "text": "This audio says: Welcome to Alibaba Cloud"}],
        },
        {
            "role": "user",
            "content": [{"type": "text", "text": "Can you tell me about this company?"}],
        },
    ],
    # Set the output modality. Currently supported: ["text","audio"] and ["text"].
    modalities=["text"],
    # stream must be set to True. Otherwise, an error occurs.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", // When using qwen3-omni-flash, run in non-thinking mode.
    messages: [
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3",
                        "format": "mp3",
                    },
                },
                { "type": "text", "text": "What is this audio about?" }
            ],
        },
        {
            "role": "assistant",
            "content": [{ "type": "text", "text": "This audio says: Welcome to Alibaba Cloud" }],
        },
        {
            "role": "user",
            "content": [{ "type": "text", "text": "Can you tell me about this company?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text"]
});


for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= Important note =======
# API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "qwen3-omni-flash",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_audio",
          "input_audio": {
            "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
          }
        },
        {
          "type": "text",
          "text": "What is this audio about?"
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "This audio says: Welcome to Alibaba Cloud"
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Can you tell me about this company?"
        }
      ]
    }
  ],
  "stream": true,
  "stream_options": {
    "include_usage": true
  },
  "modalities": ["text"]
}'

Parsing Base64-encoded Audio Data Output

Method 1: Decode after generation completes

The Qwen-Omni model outputs audio as Base64-encoded streaming output. You can maintain a string variable during model generation and append the Base64-encoded data from each returned chunk to it. After generation completes, Base64-decode the full string to obtain the audio file. Alternatively, you can decode and play the Base64-encoded data from each returned chunk in real time.

# Installation instructions for pyaudio:
# APPLE Mac OS X
#   brew install portaudio
#   pip install pyaudio
# Debian/Ubuntu
#   sudo apt-get install python-pyaudio python3-pyaudio
#   or
#   pip install pyaudio
# CentOS
#   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
#   python -m pip install pyaudio

import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf

client = OpenAI(
    # API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When using qwen3-omni-flash, run in non-thinking mode.
    messages=[{"role": "user", "content": "Who are you?"}],
    # Set the output modality. Currently supported: ["text","audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True. Otherwise, an error occurs.
    stream=True,
    stream_options={"include_usage": True},
)

# Method 1: Decode after generation completes
audio_string = ""
for chunk in completion:
    if chunk.choices:
        if hasattr(chunk.choices[0].delta, "audio"):
            try:
                audio_string += chunk.choices[0].delta.audio["data"]
            except Exception as e:
                print(chunk.choices[0].delta.content)
    else:
        print(chunk.usage)

wav_bytes = base64.b64decode(audio_string)
audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
sf.write("audio_assistant_py.wav", audio_np, samplerate=24000)

# Method 2: Decode while generating (comment out Method 1 code to use Method 2)
# # Initialize PyAudio
# import pyaudio
# import time
# p = pyaudio.PyAudio()
# # Create an audio stream
# stream = p.open(format=pyaudio.paInt16,
#                 channels=1,
#                 rate=24000,
#                 output=True)

# for chunk in completion:
#     if chunk.choices:
#         if hasattr(chunk.choices[0].delta, "audio"):
#             try:
#                 audio_string = chunk.choices[0].delta.audio["data"]
#                 wav_bytes = base64.b64decode(audio_string)
#                 audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
#                 # Play audio data directly
#                 stream.write(audio_np.tobytes())
#             except Exception as e:
#                 print(chunk.choices[0].delta.content)

# time.sleep(0.8)
# # Clean up resources
# stream.stop_stream()
# stream.close()
# p.terminate()
// Setup instructions:
// Universal for Windows/Mac/Linux:
// 1. Ensure Node.js is installed (version >= 14 recommended)
// 2. Run the following command to install required dependencies:
//    npm install openai wav
// 
// To use real-time playback (Method 2), also install:
// Windows:
//    npm install speaker
// Mac:
//    brew install portaudio
//    npm install speaker
// Linux (Ubuntu/Debian):
//    sudo apt-get install libasound2-dev
//    npm install speaker

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", // When using qwen3-omni-flash, run in non-thinking mode.
    messages: [
        {
            "role": "user",
            "content": "Who are you?"
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

// Method 1: Decode after generation completes
// Requires installation: npm install wav
import { createWriteStream } from 'node:fs';  // node:fs is a built-in Node.js module, no installation required
import { Writer } from 'wav';

async function convertAudio(audioString, audioPath) {
    try {
        // Decode the Base64 string into a Buffer
        const wavBuffer = Buffer.from(audioString, 'base64');
        // Create a WAV file write stream
        const writer = new Writer({
            sampleRate: 24000,  // Sample rate
            channels: 1,        // Mono
            bitDepth: 16        // 16-bit depth
        });
        // Create an output file stream and establish a pipeline connection
        const outputStream = createWriteStream(audioPath);
        writer.pipe(outputStream);

        // Write PCM data and end writing
        writer.write(wavBuffer);
        writer.end();

        // Use a Promise to wait for the file to finish writing
        await new Promise((resolve, reject) => {
            outputStream.on('finish', resolve);
            outputStream.on('error', reject);
        });

        // Add extra wait time to ensure audio integrity
        await new Promise(resolve => setTimeout(resolve, 800));

        console.log(`Audio file saved to: ${audioPath}`);
    } catch (error) {
        console.error('Error during audio processing:', error);
    }
}

let audioString = "";
for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        if (chunk.choices[0].delta.audio) {
            if (chunk.choices[0].delta.audio["data"]) {
                audioString += chunk.choices[0].delta.audio["data"];
            }
        }
    } else {
        console.log(chunk.usage);
    }
}
// Execute conversion
convertAudio(audioString, "audio_assistant_mjs.wav");


// Method 2: Generate and play in real time
// Install required components per system instructions above first.
// import Speaker from 'speaker'; // Import audio playback library

// // Create a speaker instance (configuration matches WAV file parameters)
// const speaker = new Speaker({
//     sampleRate: 24000,  // Sample rate
//     channels: 1,        // Number of sound channels
//     bitDepth: 16,       // Bit depth
//     signed: true        // Signed PCM
// });
// for await (const chunk of completion) {
//     if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
//         if (chunk.choices[0].delta.audio) {
//             if (chunk.choices[0].delta.audio["data"]) {
//                 const pcmBuffer = Buffer.from(chunk.choices[0].delta.audio.data, 'base64');
//                 // Write directly to speaker for playback
//                 speaker.write(pcmBuffer);
//             }
//         }
//     } else {
//         console.log(chunk.usage);
//     }
// }
// speaker.on('finish', () => console.log('Playback complete'));
// speaker.end(); // Call based on actual API stream end

Input a Base64-encoded local file

Images

This example uses the locally saved file eagle.png.

import os
from openai import OpenAI
import base64

client = OpenAI(
    # API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)


# Base64 encoding format
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


base64_image = encode_image("eagle.png")

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When using qwen3-omni-flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{base64_image}"},
                },
                {"type": "text", "text": "What scene is depicted in the image?"},
            ],
        },
    ],
    # Set the output modality. Currently supported: ["text","audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True. Otherwise, an error occurs.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeImage = (imagePath) => {
    const imageFile = readFileSync(imagePath);
    return imageFile.toString('base64');
};
const base64Image = encodeImage("eagle.png")

const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash",// When using qwen3-omni-flash, run in non-thinking mode.
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "image_url",
                "image_url": { "url": `data:image/png;base64,${base64Image}` },
            },
            { "type": "text", "text": "What scene is depicted in the image?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

Audio

This example uses the locally saved file welcome.mp3.

import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf
import requests

client = OpenAI(
    # API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)


def encode_audio(audio_path):
    with open(audio_path, "rb") as audio_file:
        return base64.b64encode(audio_file.read()).decode("utf-8")


base64_audio = encode_audio("welcome.mp3")

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When using qwen3-omni-flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": f"data:;base64,{base64_audio}",
                        "format": "mp3",
                    },
                },
                {"type": "text", "text": "What is this audio about?"},
            ],
        },
    ],
    # Set the output modality. Currently supported: ["text","audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True. Otherwise, an error occurs.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeAudio = (audioPath) => {
    const audioFile = readFileSync(audioPath);
    return audioFile.toString('base64');
};
const base64Audio = encodeAudio("welcome.mp3")

const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", // When using qwen3-omni-flash, run in non-thinking mode.
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "input_audio",
                "input_audio": { "data": `data:;base64,${base64Audio}`, "format": "mp3" },
            },
            { "type": "text", "text": "What is this audio about?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

Video

Video file

This example uses the local file spring_mountain.mp4.

import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf

client = OpenAI(
    # API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

# Base64 encoding format
def encode_video(video_path):
    with open(video_path, "rb") as video_file:
        return base64.b64encode(video_file.read()).decode("utf-8")


base64_video = encode_video("spring_mountain.mp4")

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When using qwen3-omni-flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "video_url",
                    "video_url": {"url": f"data:;base64,{base64_video}"},
                },
                {"type": "text", "text": "What is she singing?"},
            ],
        },
    ],
    # Set the output modality. Currently supported: ["text","audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True. Otherwise, an error occurs.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeVideo = (videoPath) => {
    const videoFile = readFileSync(videoPath);
    return videoFile.toString('base64');
};
const base64Video = encodeVideo("spring_mountain.mp4")

const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", // When using qwen3-omni-flash, run in non-thinking mode.
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "video_url",
                "video_url": { "url": `data:;base64,${base64Video}` },
            },
            { "type": "text", "text": "What is she singing?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

Image list

Take the locally stored files football1.jpg, football2.jpg, football3.jpg, and football4.jpg as an example.

import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf

client = OpenAI(
    # API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)


# Base64 encoding format
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


base64_image_1 = encode_image("football1.jpg")
base64_image_2 = encode_image("football2.jpg")
base64_image_3 = encode_image("football3.jpg")
base64_image_4 = encode_image("football4.jpg")

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When using qwen3-omni-flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "video",
                    "video": [
                        f"data:image/jpeg;base64,{base64_image_1}",
                        f"data:image/jpeg;base64,{base64_image_2}",
                        f"data:image/jpeg;base64,{base64_image_3}",
                        f"data:image/jpeg;base64,{base64_image_4}",
                    ],
                },
                {"type": "text", "text": "Describe the procedure in this video."},
            ],
        }
    ],
    # Set the output modality. Currently supported: ["text","audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True. Otherwise, an error occurs.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // API keys for Singapore and Beijing regions differ. To get an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If using a Beijing-region model, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeImage = (imagePath) => {
    const imageFile = readFileSync(imagePath);
    return imageFile.toString('base64');
  };
const base64Image1 = encodeImage("football1.jpg")
const base64Image2 = encodeImage("football2.jpg")
const base64Image3 = encodeImage("football3.jpg")
const base64Image4 = encodeImage("football4.jpg")

const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", // When using qwen3-omni-flash, run in non-thinking mode.
    messages: [{
        role: "user",
        content: [
            {
                type: "video",
                video: [
                    `data:image/jpeg;base64,${base64Image1}`,
                    `data:image/jpeg;base64,${base64Image2}`,
                    `data:image/jpeg;base64,${base64Image3}`,
                    `data:image/jpeg;base64,${base64Image4}`
                ]
            },
            {
                type: "text",
                text: "Describe the procedure in this video."
            }
        ]
    }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }

API reference

For the complete specifications of the input and output parameters for the Qwen-Omni model, see Qwen.

Billing and rate limits

Billing rules

Qwen-Omni billing is based on the tokens consumed across different modalities, such as audio, image, and video. For more information about pricing, see Models.

Token conversion rules for audio, images, and video

Audio

  • qwen3-omni-flash: Total tokens = Audio duration (seconds) × 12.5

  • qwen-omni-turbo: Total tokens = Audio duration (seconds) × 25. Audio shorter than 1 second is billed as 1 second.

Images

  • qwen3-omni-flash models: One token per 32×32 pixels.

  • qwen-omni-turbo models: One token per 28×28 pixels.

A single image requires a minimum of 4 tokens and a maximum of 1280 tokens. You can use the following code to estimate the token count for a given image:

import math
# Install Pillow: pip install Pillow
from PIL import Image

# For qwen-omni-turbo, factor is 28.
# factor = 28
# For qwen3-omni-flash, factor is 32.
factor = 32

def token_calculate(image_path=''):
    """
    param image_path: Path to the image.
    return: Token count for a single image.
    """
    if len(image_path) > 0:
        # Open the specified PNG image file.
        image = Image.open(image_path)
        # Get original dimensions.
        height = image.height
        width = image.width
        print(f"Original image dimensions: Height={height}, Width={width}")
        # Adjust height to multiple of factor.
        h_bar = round(height / factor) * factor
        # Adjust width to multiple of factor.
        w_bar = round(width / factor) * factor
        # Minimum tokens for an image: 4 tokens.
        min_pixels = 4 * factor * factor
        # Maximum tokens for an image: 1280 tokens.
        max_pixels = 1280 * factor * factor
        # Scale image to fit within [min_pixels, max_pixels] range.
        if h_bar * w_bar > max_pixels:
            # Calculate scaling factor beta so total pixels do not exceed max_pixels.
            beta = math.sqrt((height * width) / max_pixels)
            # Recalculate adjusted height to ensure multiple of factor.
            h_bar =math.floor(height / beta / factor) * factor
            # Recalculate adjusted width to ensure it is a multiple of the factor.
            w_bar = math.floor(width / beta / factor) * factor
        elif h_bar * w_bar < min_pixels:
            # Calculate scaling factor beta so that the total pixels of the scaled image are not less than min_pixels.
            beta = math.sqrt(min_pixels / (height * width))
            # Recalculate adjusted height to ensure it is a multiple of the factor.
            h_bar = math.ceil(height * beta / factor) * factor
            # Recalculate adjusted width to ensure it is a multiple of the factor.
            w_bar = math.ceil(width * beta / factor) * factor
        print(f"Image dimensions after scaling: Height={h_bar}, Width={w_bar}")
        # Calculate image tokens: total pixels / (factor * factor).
        token = int((h_bar * w_bar) / (factor * factor)) + 2
        print(f"Token count after scaling: {token}")
        return token
    else:
        raise ValueError("Image path cannot be empty. Provide a valid image file path.")
    
if __name__ == "__main__":
    token = token_calculate(image_path="xxx/test.jpg")

Video

Video file tokens are divided into video_tokens (visual) and audio_tokens (audio).

  • video_tokens

    The calculation is complex. See the following code:

    # Before use, install: pip install opencv-python
    import math
    import os
    import logging
    import cv2
    
    # Fixed parameters
    FRAME_FACTOR = 2
    
    # For qwen3-omni-flash, IMAGE_FACTOR is 32
    IMAGE_FACTOR = 32
    
    # For qwen-omni-turbo, IMAGE_FACTOR is 28
    # IMAGE_FACTOR = 28
    
    # Video frame aspect ratio
    MAX_RATIO = 200
    
    # Minimum video frame pixels. For qwen3-omni-flash: 128 * 32 * 32
    VIDEO_MIN_PIXELS = 128 * 32 * 32
    # For qwen-omni-turbo
    # VIDEO_MIN_PIXELS = 128 * 28 * 28
    
    # Maximum video frame pixels. For qwen3-omni-flash: 768 * 32 * 32
    VIDEO_MAX_PIXELS = 768 * 32 * 32
    # For qwen-omni-turbo:
    # VIDEO_MAX_PIXELS = 768 * 28 * 28
    
    FPS = 2
    # Minimum extracted frames
    FPS_MIN_FRAMES = 4
    
    # Maximum extracted frames
    # Maximum extracted frames for qwen3-omni-flash: 128
    # Maximum extracted frames for qwen-omni-turbo: 80
    FPS_MAX_FRAMES = 128
    
    # Maximum pixel value for video input. For qwen3-omni-flash: 16384 * 32 * 32
    VIDEO_TOTAL_PIXELS = 16384 * 32 * 32
    # For qwen-omni-turbo:
    # VIDEO_TOTAL_PIXELS = 16384 * 28 * 28
    
    def round_by_factor(number, factor):
        return round(number / factor) * factor
    
    def ceil_by_factor(number, factor):
        return math.ceil(number / factor) * factor
    
    def floor_by_factor(number, factor):
        return math.floor(number / factor) * factor
    
    def get_video(video_path):
        cap = cv2.VideoCapture(video_path)
        frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
        total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
        video_fps = cap.get(cv2.CAP_PROP_FPS)
        cap.release()
        return frame_height, frame_width, total_frames, video_fps
    
    def smart_nframes(total_frames, video_fps):
        min_frames = ceil_by_factor(FPS_MIN_FRAMES, FRAME_FACTOR)
        max_frames = floor_by_factor(min(FPS_MAX_FRAMES, total_frames), FRAME_FACTOR)
        duration = total_frames / video_fps if video_fps != 0 else 0
        if duration - int(duration) > (1 / FPS):
            total_frames = math.ceil(duration * video_fps)
        else:
            total_frames = math.ceil(int(duration) * video_fps)
        nframes = total_frames / video_fps * FPS
        nframes = int(min(min(max(nframes, min_frames), max_frames), total_frames))
        if not (FRAME_FACTOR <= nframes <= total_frames):
            raise ValueError(f"nframes should be in interval [{FRAME_FACTOR}, {total_frames}], but got {nframes}.")
        return nframes
    
    def smart_resize(height, width, nframes, factor=IMAGE_FACTOR):
        min_pixels = VIDEO_MIN_PIXELS
        total_pixels = VIDEO_TOTAL_PIXELS
        max_pixels = max(min(VIDEO_MAX_PIXELS, total_pixels / nframes * FRAME_FACTOR), int(min_pixels * 1.05))
        if max(height, width) / min(height, width) > MAX_RATIO:
            raise ValueError(f"absolute aspect ratio must be smaller than {MAX_RATIO}, got {max(height, width) / min(height, width)}")
        h_bar = max(factor, round_by_factor(height, factor))
        w_bar = max(factor, round_by_factor(width, factor))
        if h_bar * w_bar > max_pixels:
            beta = math.sqrt((height * width) / max_pixels)
            h_bar = floor_by_factor(height / beta, factor)
            w_bar = floor_by_factor(width / beta, factor)
        elif h_bar * w_bar < min_pixels:
            beta = math.sqrt(min_pixels / (height * width))
            h_bar = ceil_by_factor(height * beta, factor)
            w_bar = ceil_by_factor(width * beta, factor)
        return h_bar, w_bar
    
    def video_token_calculate(video_path):
        height, width, total_frames, video_fps = get_video(video_path)
        nframes = smart_nframes(total_frames, video_fps)
        resized_height, resized_width = smart_resize(height, width, nframes)
        video_token = int(math.ceil(nframes / FPS) * resized_height / 32 * resized_width / 32)
        video_token += 2  # Visual markers
        return video_token
    
    if __name__ == "__main__":
        video_path = "spring_mountain.mp4"  # Your video path
        video_token = video_token_calculate(video_path)
        print("video_tokens:", video_token)
  • audio_tokens

    • qwen3-omni-flash: Total tokens = Audio duration (seconds) × 12.5

    • qwen-omni-turbo: Total tokens = Audio duration (seconds) × 25

    Audio shorter than 1 second is billed as 1 second.

Free quota

For more information about how to claim, query, and use your free quota, see Free quota for new users.

Rate limits

For more information about model rate limit rules and frequently asked questions, see Rate limits.

Error codes

If the model call fails and returns an error message, see Error messages for resolution.

Voice list

When you use this feature, set the voice request parameter to the corresponding value in the "voice parameter" column of the table below:

qwen3-omni-flash-2025-12-01 model

Voice name

voice parameter

Voice effect

Description

Languages supported

Cherry

Cherry

A sunny, positive, friendly, and natural young woman

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Serena

Serena

A gentle young woman

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Ethan

Ethan

Standard Mandarin with a slight northern accent. Sunny, warm, energetic, and vibrant

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Chelsie

Chelsie

A two-dimensional virtual girlfriend

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Momo

Momo

Playful and mischievous, cheering you up

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Vivian

Vivian

Confident, cute, and slightly feisty

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Moon

Moon

Effortlessly cool Moon White

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Maia

Maia

A blend of intellect and gentleness

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Kai

Kai

A soothing audio spa for your ears

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Nofish

Nofish

A designer who cannot pronounce retroflex sounds

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Bella

Bella

A little girl who drinks but never throws punches when drunk

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Jennifer

Jennifer

A premium, cinematic-quality American English female voice

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Ryan

Ryan

Full of rhythm, bursting with dramatic flair, balancing authenticity and tension

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Katerina

Katerina

A mature-woman voice with rich, memorable rhythm

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Aiden

Aiden

An American English young man skilled in cooking

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Eldric Sage

Eldric Sage

A calm and wise elder—weathered like a pine tree, yet clear-minded as a mirror

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Mia

Mia

Gentle as spring water, obedient as fresh snow

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Mochi

Mochi

A clever, quick-witted young adult—childlike innocence remains, yet wisdom shines through

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Bellona

Bellona

A powerful, clear voice that brings characters to life—so stirring it makes your blood boil.

With heroic grandeur and perfect diction, this voice captures the full spectrum of human expression.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Vincent

Vincent

A uniquely raspy, smoky voice—just one line evokes armies and heroic tales

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Bunny

Bunny

A little girl overflowing with "cuteness"

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Neil

Neil

A flat baseline intonation with precise, clear pronunciation—the most professional news anchor

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Elias

Elias

Maintains academic rigor while using storytelling techniques to turn complex knowledge into digestible learning modules

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Arthur

Arthur

A simple, earthy voice steeped in time and tobacco smoke—slowly unfolding village stories and curiosities

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Nini

Nini

A soft, clingy voice like sweet rice cakes—those drawn-out calls of “Big Brother” are so sweet they melt your bones

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Ebona

Ebona

Her whisper is like a rusty key slowly turning in the darkest corner of your mind—where childhood shadows and unknown fears hide

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Seren

Seren

A gentle, soothing voice to help you fall asleep faster. Good night, sweet dreams

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Pip

Pip

A playful, mischievous boy full of childlike wonder—is this your memory of Shin-chan?

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Stella

Stella

Normally a cloyingly sweet, dazed teenage-girl voice—but when shouting “I represent the moon to defeat you!”, she instantly radiates unwavering love and justice

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Bodega

Bodega

A passionate Spanish man

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sonrisa

Sonrisa

A cheerful, outgoing Latin American woman

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Alek

Alek

Cold like the Russian spirit, yet warm like wool coat lining

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Dolce

Dolce

A laid-back Italian man

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sohee

Sohee

A warm, cheerful, emotionally expressive Korean unnie

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Ono Anna

Ono Anna

A clever, spirited childhood friend

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Lenn

Lenn

Rational at heart, rebellious in detail—a German youth who wears suits and listens to post-punk

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Emilien

Emilien

A romantic French big brother

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Andre

Andre

A magnetic, natural, and steady male voice

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Radio Gol

Radio Gol

Football poet Radio Gol! Today I’ll commentate on football using my name.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Shanghai - Jada

Jada

A fast-paced, energetic Shanghai auntie

Shanghainese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Beijing - Dylan

Dylan

A young man raised in Beijing’s hutongs

Beijing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Nanjing - Li

Li

A patient yoga teacher

Nanjing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Shaanxi - Marcus

Marcus

Broad face, few words, sincere heart, deep voice—the authentic Shaanxi flavor

Shaanxi dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Southern Min - Roy

Roy

A humorous, straightforward, lively Taiwanese guy

Southern Min, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Tianjin - Peter

Peter

Tianjin-style crosstalk, professional foil

Tianjin dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sichuan - Sunny

Sunny

A Sichuan girl sweet enough to melt your heart

Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sichuan - Eric

Eric

A Sichuanese man from Chengdu who stands out in everyday life

Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Cantonese - Rocky

Rocky

A humorous, witty A Qiang providing live chat

Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Cantonese - Kiki

Kiki

A sweet Hong Kong girl best friend

Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

qwen3-omni-flash and qwen3-omni-flash-2025-09-15 models

Voice name

voice parameter

Voice effect

Description

Languages supported

Cherry

Cherry

A sunny, positive, friendly, and natural young woman

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Ethan

Ethan

Standard Mandarin with a slight northern accent. Sunny, warm, energetic, and vibrant

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Nofish

Nofish

A designer who cannot pronounce retroflex sounds

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Jennifer

Jennifer

A premium, cinematic-quality American English female voice

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Ryan

Ryan

Full of rhythm, bursting with dramatic flair, balancing authenticity and tension

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Katerina

Katerina

A mature-woman voice with rich, memorable rhythm

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Elias

Elias

Maintains academic rigor while using storytelling techniques to turn complex knowledge into digestible learning modules

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Shanghai - Jada

Jada

A fast-paced, energetic Shanghai auntie

Shanghainese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Beijing - Dylan

Dylan

A young man raised in Beijing’s hutongs

Beijing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sichuan - Sunny

Sunny

A Sichuan girl sweet enough to melt your heart

Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Nanjing - Li

Li

A patient yoga teacher

Nanjing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Shaanxi - Marcus

Marcus

Broad face, few words, sincere heart, deep voice—the authentic Shaanxi flavor

Shaanxi dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Southern Min - Roy

Roy

A humorous, straightforward, lively Taiwanese guy

Southern Min, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Tianjin - Peter

Peter

Tianjin-style crosstalk, professional foil

Tianjin dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Cantonese - Rocky

Rocky

A humorous, witty A Qiang providing live chat

Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Cantonese - Kiki

Kiki

A sweet Hong Kong girl best friend

Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sichuan - Eric

Eric

A Sichuanese man from Chengdu who stands out in everyday life

Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Qwen-Omni-Turbo model

Voice name

voice parameter

Voice Effect

Description

Languages supported

Cherry

Cherry

A sunny, positive, friendly, and natural young woman

Chinese, English

Serena

Serena

A gentle young woman

Chinese, English

Ethan

Ethan

Standard Mandarin with a slight northern accent. Sunny, warm, energetic, and vibrant

Chinese, English

Chelsie

Chelsie

A two-dimensional virtual girlfriend

Chinese, English

Qwen-Omni open-source models

Voice name

voice parameter

Voice Effects

Description

Languages supported

Ethan

Ethan

Standard Mandarin with a slight northern accent. Sunny, warm, energetic, and vibrant

Chinese, English

Chelsie

Chelsie

A two-dimensional virtual girlfriend

Chinese, English