All Products
Search
Document Center

Alibaba Cloud Model Studio:Omni-modal (Qwen-Omni)

Last Updated:Dec 15, 2025

Qwen-Omni accepts a combination of text and another modality, such as an image, audio, or video, as input. It generates responses in text or speech. The model offers a variety of human-like voices and supports speech output in multiple languages and dialects. You can use it for text creation, visual recognition, voice assistants, and other scenarios.

Getting started

Prerequisites

Invocation method: Qwen-Omni currently supports only streaming output. The stream parameter must be set to True to prevent errors.

The following example sends text to the Qwen-Omni API operation and receives a streaming response that contains text and audio.

import os
import base64
import soundfile as sf
import numpy as np
from openai import OpenAI

# 1. Initialize the client
client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),  # Make sure the environment variable is configured
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

# 2. Initiate the request
try:
    completion = client.chat.completions.create(
        model="qwen3-omni-flash",
        messages=[{"role": "user", "content": "Who are you"}],
        modalities=["text", "audio"],  # Specify text and audio output
        audio={"voice": "Cherry", "format": "wav"},
        stream=True,  # Must be set to True
        stream_options={"include_usage": True},
    )

    # 3. Process the streaming response and decode the audio
    print("Model response:")
    audio_base64_string = ""
    for chunk in completion:
        # Process the text part
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="")

        # Collect the audio part
        if chunk.choices and hasattr(chunk.choices[0].delta, "audio") and chunk.choices[0].delta.audio:
            audio_base64_string += chunk.choices[0].delta.audio.get("data", "")

    # 4. Save the audio file
    if audio_base64_string:
        wav_bytes = base64.b64decode(audio_base64_string)
        audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
        sf.write("audio_assistant.wav", audio_np, samplerate=24000)
        print("\nAudio file saved to: audio_assistant.wav")

except Exception as e:
    print(f"Request failed: {e}")
// Preparations before running:
// Universal for Windows/Mac/Linux:
// 1. Make sure Node.js is installed (version >= 14 is recommended)
// 2. Run the following command to install the necessary dependencies:
//    npm install openai wav

import OpenAI from "openai";
import { createWriteStream } from 'node:fs';
import { Writer } from 'wav';

// Define an audio conversion function: convert a Base64 string and save it as a standard WAV audio file
async function convertAudio(audioString, audioPath) {
    try {
        // Decode the Base64 string into a Buffer
        const wavBuffer = Buffer.from(audioString, 'base64');
        // Create a WAV file write stream
        const writer = new Writer({
            sampleRate: 24000,  // Sample rate
            channels: 1,        // Single channel
            bitDepth: 16        // 16-bit depth
        });
        // Create an output file stream and establish a pipeline connection
        const outputStream = createWriteStream(audioPath);
        writer.pipe(outputStream);

        // Write PCM data and end writing
        writer.write(wavBuffer);
        writer.end();

        // Use a Promise to wait for the file to be written
        await new Promise((resolve, reject) => {
            outputStream.on('finish', resolve);
            outputStream.on('error', reject);
        });

        // Add extra wait time to ensure audio integrity
        await new Promise(resolve => setTimeout(resolve, 800));

        console.log(`\nAudio file successfully saved as ${audioPath}`);
    } catch (error) {
        console.error('An error occurred during processing:', error);
    }
}

//  1. Initialize the client
const openai = new OpenAI(
    {
        // API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
// 2. Initiate the request
const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash",  
    messages: [
        {
            "role": "user",
            "content": "Who are you?"
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

let audioString = "";
console.log("Model response:")

// 3. Process the streaming response and decode the audio
for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        // Process the text content
        if (chunk.choices[0].delta.content) {
            process.stdout.write(chunk.choices[0].delta.content);
        }
        // Process the audio content
        if (chunk.choices[0].delta.audio) {
            if (chunk.choices[0].delta.audio["data"]) {
                audioString += chunk.choices[0].delta.audio["data"];
            }
        }
    }
}
// 4. Save the audio file
convertAudio(audioString, "audio_assistant.wav");
# ======= Important Note =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-omni-flash",
    "messages": [
        {
            "role": "user", 
            "content": "Who are you?"
        }
    ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    },
    "modalities":["text","audio"],
    "audio":{"voice":"Cherry","format":"wav"}
}'

Response

After you run the Python and Node.js code, the model's text response is displayed in the console. An audio file named audio_assistant.wav is generated in the same directory as your code file.

Model response:
I am a large language model developed by Alibaba Cloud. My name is Qwen. How can I help you?

Running the HTTP code returns text, with Base64-encoded audio data in the audio field.

data: {"choices":[{"delta":{"content":"I"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757647879,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-a68eca3b-c67e-4666-a72f-73c0b4919860"}
data: {"choices":[{"delta":{"content":" am"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757647879,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-a68eca3b-c67e-4666-a72f-73c0b4919860"}
......
data: {"choices":[{"delta":{"audio":{"data":"/v8AAAAAAAAAAAAAAA...","expires_at":1757647879,"id":"audio_a68eca3b-c67e-4666-a72f-73c0b4919860"}},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757647879,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-a68eca3b-c67e-4666-a72f-73c0b4919860"}
data: {"choices":[{"finish_reason":"stop","delta":{"content":""},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1764763585,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-e8c82e9e-073e-4289-a786-a20eb444ac9c"}
data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":207,"completion_tokens":103,"total_tokens":310,"completion_tokens_details":{"audio_tokens":83,"text_tokens":20},"prompt_tokens_details":{"text_tokens":207}},"created":1757940330,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-9cdd5a26-f9e9-4eff-9dcc-93a878165afc"}

Model availability

Compared to Qwen-VL , Qwen-Omni can:

  • Understand visual and audio information in video files.

  • Understand data in multiple modalities.

  • Output audio.

It also performs well in visual and audio understanding.

Use Qwen3-Omni-Flash for the best performance. Compared to Qwen-Omni-Turbo, which is no longer updated, Qwen3-Omni-Flash offers significant improvements:

  • Supports both thinking and non-thinking modes. Switch between modes using the enable_thinking parameter. By default, thinking mode is disabled.

  • For audio output in non-thinking mode:

    • qwen3-omni-flash-2025-12-01 supports up to 49 voices. qwen3-omni-flash-2025-09-15 and qwen3-omni-flash support up to 17 voices. Qwen-Omni-Turbo supports only 4 voices.

    • The number of supported languages has increased to 10. Qwen-Omni-Turbo supports only 2.

International (Singapore)

Commercial models

Compared to open source versions, commercial models offer the latest features and improvements.

Model

Version

Mode

Context window

Max input

Max chain-of-thought

Max output

Free quota

(Note)

(Tokens)

qwen3-omni-flash

Currently has the same capabilities as qwen3-omni-flash-2025-09-15

Stable

Thinking mode

65,536

16,384

32,768

16,384

1 million tokens each (modality-agnostic)

Valid for 90 days after you activate Model Studio

Non-thinking mode

49,152

-

qwen3-omni-flash-2025-09-15

Also known as qwen3-omni-flash-0915

Snapshot

Thinking mode

65,536

16,384

32,768

16,384

Non-thinking mode

49,152

-

qwen3-omni-flash-2025-12-01

Snapshot

Thinking mode

65,536

16,384

32,768

16,384

Non-thinking mode

49,152

-

More models

Model

Version

Context window

Max input

Max output

Free quota

(Note)

(Tokens)

qwen-omni-turbo

This version has the same capabilities as qwen-omni-turbo-2025-03-26.

Stable

32,768

30,720

2,048

1 million tokens each (modality-agnostic)

Valid for 90 days after activating Model Studio.

qwen-omni-turbo-latest

Always points to the latest snapshot version.
Equivalent capabilities

Latest

qwen-omni-turbo-2025-03-26

Also known as qwen-omni-turbo-0326.

Snapshot

Open source models

Model

Context window

Max input

Max output

Free quota

(Note)

(Tokens)

qwen2.5-omni-7b

32,768

30,720

2,048

1 million tokens (regardless of modality)

Valid for 90 days after activating Alibaba Cloud Model Studio.

China (Beijing)

Commercial models

Model

Version

Mode

Context window

Max input

Max chain-of-thought

Max output

Free quota

(Note)

(Tokens)

qwen3-omni-flash

Currently has the same capabilities as qwen3-omni-flash-2025-09-15

Stable

Thinking mode

65,536

16,384

32,768

16,384

No free quota

Non-thinking mode

49,152

-

qwen3-omni-flash-2025-09-15

Also known as qwen3-omni-flash-0915

Snapshot

Thinking mode

65,536

16,384

32,768

16,384

Non-thinking mode

49,152

-

qwen3-omni-flash-2025-12-01

Snapshot

Thinking mode

65,536

16,384

32,768

16,384

Non-thinking mode

49,152

-

More models

Model

Version

Context window

Max input

Max output

Free quota

(Note)

(Tokens)

qwen-omni-turbo

This model currently has the same capabilities as qwen-omni-turbo-2025-03-26.

Stable

32,768

30,720

2,048

No free quota

qwen-omni-turbo-latest

Always aligned with the latest snapshot
Identical capabilities

Latest

qwen-omni-turbo-2025-03-26

Also known as qwen-omni-turbo-0326.

Snapshot

qwen-omni-turbo-2025-01-19

Also known as qwen-omni-turbo-0119.

Open source models

Model

Context window

Max input

Max output

Free quota

(Note)

(Tokens)

qwen2.5-omni-7b

32,768

30,720

2,048

No free quota

Usage notes

Input

In a single user message, the content array can contain text and one other modality (image, audio, or video). It cannot contain multiple other modalities.

Output

  • Supported output modalities: The audio output is Base64-encoded data. For more information about converting it to an audio file, see Parse Base64-encoded audio data output.

    Output modality

    modalities parameter value

    Response style

    Text

    ["text"] (default)

    More formal and written in style.

    Text and audio

    ["text","audio"]

    Qwen3-Omni-Flash does not support audio output in thinking mode.

    More conversational. The response includes filler words and encourages further interaction.

    Qwen-Omni-Turbo does not support setting a system message when the output modality includes audio.
  • Supported audio output languages:

    • Qwen-Omni-Turbo: Supports only Chinese (Mandarin) and English.

    • Qwen3-Omni-Flash (non-thinking mode): Supports Chinese (Mandarin and some dialects), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, and Korean.

  • Supported voices: Configure the voice and file format of the audio output using the audio parameter, such as audio={"voice": "Cherry", "format": "wav"}:

    • File format (format): Can only be set to "wav".

    • Audio voice (voice): For a list of voices supported by each model, see Voice list.

Limitations

  • Streaming output is mandatory: All requests to Qwen-Omni models must set stream=True.

  • Only Qwen3-Omni-Flash is a hybrid thinking model. For information about how to call it, see Enable or disable thinking mode. In thinking mode, audio output is not supported.

Enable or disable thinking mode

Qwen3-Omni-Flash is a hybrid thinking model. Use the enable_thinking parameter to enable or disable thinking mode:

  • true

  • false (default)

Qwen-Omni-Turbo is not a thinking model.

OpenAI compatible

import os
from openai import OpenAI

client = OpenAI(
    # API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-omni-flash",
    messages=[{"role": "user", "content": "Who are you"}],

    # Enable or disable thinking mode. Audio output is not supported in thinking mode. qwen-omni-turbo does not support setting enable_thinking.
    extra_body={'enable_thinking': True},

    # Set the output data modality. Two are currently supported in non-thinking mode: ["text","audio"] and ["text"]. Only ["text"] is supported in thinking mode.
    modalities=["text"],

    # Set the voice. The audio parameter is not supported in thinking mode.
    # audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True to avoid errors.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash",
    messages: [
        { role: "user", content: "Who are you?" }
    ],

    // stream must be set to True to avoid errors.
    stream: true,
    stream_options: {
        include_usage: true
    },
    // Enable or disable thinking mode. Audio output is not supported in thinking mode. qwen-omni-turbo does not support setting enable_thinking.
    extra_body:{'enable_thinking': true},
    //  Set the output data modality. Two are currently supported in non-thinking mode: ["text","audio"] and ["text"]. Only ["text"] is supported in thinking mode.
    modalities: ["text"],
    // Set the voice. The audio parameter is not supported in thinking mode.
    //audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= Important Note =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-omni-flash",
    "messages": [
        {
            "role": "user",
            "content": "Who are you?"
        }
    ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    },
    "modalities":["text"],
    "enable_thinking": true
}'

Return value

data: {"choices":[{"delta":{"content":null,"role":"assistant","reasoning_content":""},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1757937336,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
data: {"choices":[{"finish_reason":null,"logprobs":null,"delta":{"content":null,"reasoning_content":"Hmm"},"index":0}],"object":"chat.completion.chunk","usage":null,"reated":1757937336,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
data: {"choices":[{"delta":{"content":null,"reasoning_content":","},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"reated":1757937336,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
......
data: {"choices":[{"delta":{"content":"Tell me"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757937336,"tem_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
data: {"choices":[{"delta":{"content":"!"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757937336,"systm_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757937336,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":11,"completion_tokens":363,"total_tokens":374,"completion_tokens_details":{"reasoning_tokens":195,"text_tokens":168},"prompt_tokens_details":{"text_tokens":11}},"created":1757937336,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}

Image and text input

Qwen-Omni models support multiple image inputs. The requirements for input images are as follows:

  • The size of a single image file cannot exceed 10 MB.

  • The number of images is limited by the model's token limit. The total number of tokens for all images and text must not exceed the model's maximum input token limit.

  • The width and height of the image must both be greater than 10 pixels. The aspect ratio must not exceed 200:1 or 1:200.

  • For supported image types, see Visual understanding.

The following sample code uses an image URL from the internet as an example. To input a local image, see Input Base64-encoded local file. Streaming output is required for all calls.

OpenAI compatible

import os
from openai import OpenAI

client = OpenAI(
    # API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When the model is Qwen3-Omni-Flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
                    },
                },
                {"type": "text", "text": "What scene is depicted in the image?"},
            ],
        },
    ],
    # Set the output data modality. Two are currently supported: ["text","audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True to avoid errors.
    stream=True,
    stream_options={
        "include_usage": True
    }
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", // When the model is Qwen3-Omni-Flash, run in non-thinking mode.
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "image_url",
                "image_url": { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg" },
            },
            { "type": "text", "text": "What scene is depicted in the image?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= Important Note =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===


curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-omni-flash",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
          }
        },
        {
          "type": "text",
          "text": "What scene is depicted in the image?"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    },
    "modalities":["text","audio"],
    "audio":{"voice":"Cherry","format":"wav"}
}'

Audio and text input

  • Only one audio file can be input.

  • File size

    • Qwen3-Omni-Flash: Cannot exceed 100 MB, with a maximum duration of 20 minutes.

    • Qwen-Omni-Turbo: Cannot exceed 10 MB, with a maximum duration of 3 minutes.

The following sample code uses an audio URL from the internet as an example. To input a local audio file, see Input Base64-encoded local files. Streaming output is required for all calls.

OpenAI compatible

import os
from openai import OpenAI

client = OpenAI(
    # API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-omni-flash",# When the model is Qwen3-Omni-Flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
                        "format": "wav",
                    },
                },
                {"type": "text", "text": "What is this audio about"},
            ],
        },
    ],
    # Set the output data modality. Two are currently supported: ["text","audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True to avoid errors.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", // When the model is Qwen3-Omni-Flash, run in non-thinking mode.
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "input_audio",
                "input_audio": { "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav", "format": "wav" },
            },
            { "type": "text", "text": "What is this audio about" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= Important Note =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-omni-flash",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_audio",
          "input_audio": {
            "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
            "format": "wav"
          }
        },
        {
          "type": "text",
          "text": "What is this audio about"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    },
    "modalities":["text","audio"],
    "audio":{"voice":"Cherry","format":"wav"}
}'

Video and text input

You can input video as an image list or as a video file. If you input a video file, the model can also understand the audio in the video.

The following sample code uses a video URL from the internet as an example. To input a local video, see Input Base64-encoded local files. Streaming output is required for all calls.

Image list format

Number of images

  • Qwen3-Omni-Flash: A minimum of 2 images and a maximum of 128 images.

  • Qwen-Omni-Turbo: A minimum of 4 images and a maximum of 80 images.

OpenAI compatible

import os
from openai import OpenAI

client = OpenAI(
    # API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When the model is Qwen3-Omni-Flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "video",
                    "video": [
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg",
                    ],
                },
                {"type": "text", "text": "Describe the process shown in this video"},
            ],
        }
    ],
    # Set the output data modality. Two are currently supported: ["text","audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True to avoid errors.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", //When the model is Qwen3-Omni-Flash, run in non-thinking mode.
    messages: [{
        role: "user",
        content: [
            {
                type: "video",
                video: [
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
                ]
            },
            {
                type: "text",
                text: "Describe the process shown in this video"
            }
        ]
    }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= Important Note =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-omni-flash",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "video",
                    "video": [
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
                    ]
                },
                {
                    "type": "text",
                    "text": "Describe the process shown in this video"
                }
            ]
        }
    ],
    "stream": true,
    "stream_options": {
        "include_usage": true
    },
    "modalities": ["text", "audio"],
    "audio": {
        "voice": "Cherry",
        "format": "wav"
    }
}'

Video file format (can understand audio in the video)

  • Only one video file can be input.

  • File size:

    • Qwen3-Omni-Flash: Limited to 256 MB, with a duration limit of 150 s.

    • Qwen-Omni-Turbo: Limited to 150 MB, with a duration limit of 40 s.

  • The visual and audio information in the video file are billed separately.

OpenAI compatible

import os
from openai import OpenAI

client = OpenAI(
    # API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When the model is Qwen3-Omni-Flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "video_url",
                    "video_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
                    },
                },
                {"type": "text", "text": "What is the content of the video?"},
            ],
        },
    ],
    # Set the output data modality. Two are currently supported: ["text","audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True to avoid errors.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", // When the model is Qwen3-Omni-Flash, run in non-thinking mode.
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "video_url",
                "video_url": { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4" },
            },
            { "type": "text", "text": "What is the content of the video?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});


for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= Important Note =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-omni-flash",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "video_url",
          "video_url": {
            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
          }
        },
        {
          "type": "text",
          "text": "What is the content of the video"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options": {
        "include_usage": true
    },
    "modalities":["text","audio"],
    "audio":{"voice":"Cherry","format":"wav"}
}'

Multi-turn conversation

When you use the multi-turn conversation feature of Qwen-Omni models, note the following:

  • Assistant message

    Assistant messages in the messages array support only text data.

  • User message

    A user message can contain text and data from only one other modality. In a multi-turn conversation, you can use different modalities in separate user messages.

OpenAI compatible

import os
from openai import OpenAI

client = OpenAI(
    # API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When the model is Qwen3-Omni-Flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3",
                        "format": "mp3",
                    },
                },
                {"type": "text", "text": "What is this audio about"},
            ],
        },
        {
            "role": "assistant",
            "content": [{"type": "text", "text": "This audio says: Welcome to Alibaba Cloud"}],
        },
        {
            "role": "user",
            "content": [{"type": "text", "text": "Can you tell me about this company?"}],
        },
    ],
    # Set the output data modality. Two are currently supported: ["text","audio"] and ["text"].
    modalities=["text"],
    # stream must be set to True to avoid errors.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", // When the model is Qwen3-Omni-Flash, run in non-thinking mode.
    messages: [
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3",
                        "format": "mp3",
                    },
                },
                { "type": "text", "text": "What is this audio about" },
            ],
        },
        {
            "role": "assistant",
            "content": [{ "type": "text", "text": "This audio says: Welcome to Alibaba Cloud" }],
        },
        {
            "role": "user",
            "content": [{ "type": "text", "text": "Can you tell me about this company?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text"]
});


for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= Important Note =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "qwen3-omni-flash",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_audio",
          "input_audio": {
            "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
          }
        },
        {
          "type": "text",
          "text": "What is this audio about"
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "This audio says: Welcome to Alibaba Cloud"
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Can you tell me about this company?"
        }
      ]
    }
  ],
  "stream": true,
  "stream_options": {
    "include_usage": true
  },
  "modalities": ["text"]
}'

Parse Base64-encoded audio data output

The audio output from Qwen-Omni models is Base64-encoded data delivered in a stream. You can use a string variable to accumulate the Base64 data from each fragment as it arrives. After the stream is complete, decode the final string to create the audio file. Alternatively, decode and play each fragment in real time as it is received.

# Installation instructions for pyaudio:
# APPLE Mac OS X
#   brew install portaudio
#   pip install pyaudio
# Debian/Ubuntu
#   sudo apt-get install python-pyaudio python3-pyaudio
#   or
#   pip install pyaudio
# CentOS
#   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
#   python -m pip install pyaudio

import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf

client = OpenAI(
    # API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When the model is Qwen3-Omni-Flash, run in non-thinking mode.
    messages=[{"role": "user", "content": "Who are you"}],
    # Set the output data modality. Two are currently supported: ["text","audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True to avoid errors.
    stream=True,
    stream_options={"include_usage": True},
)

# Method 1: Decode after the generation is complete
audio_string = ""
for chunk in completion:
    if chunk.choices:
        if hasattr(chunk.choices[0].delta, "audio"):
            try:
                audio_string += chunk.choices[0].delta.audio["data"]
            except Exception as e:
                print(chunk.choices[0].delta.content)
    else:
        print(chunk.usage)

wav_bytes = base64.b64decode(audio_string)
audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
sf.write("audio_assistant_py.wav", audio_np, samplerate=24000)

# Method 2: Decode while generating (comment out the code for Method 1 to use Method 2)
# # Initialize PyAudio
# import pyaudio
# import time
# p = pyaudio.PyAudio()
# # Create an audio stream
# stream = p.open(format=pyaudio.paInt16,
#                 channels=1,
#                 rate=24000,
#                 output=True)

# for chunk in completion:
#     if chunk.choices:
#         if hasattr(chunk.choices[0].delta, "audio"):
#             try:
#                 audio_string = chunk.choices[0].delta.audio["data"]
#                 wav_bytes = base64.b64decode(audio_string)
#                 audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
#                 # Play the audio data directly
#                 stream.write(audio_np.tobytes())
#             except Exception as e:
#                 print(chunk.choices[0].delta.content)

# time.sleep(0.8)
# # Clean up resources
# stream.stop_stream()
# stream.close()
# p.terminate()
// Preparations before running:
// Universal for Windows/Mac/Linux:
// 1. Make sure Node.js is installed (version >= 14 is recommended)
// 2. Run the following command to install the necessary dependencies:
//    npm install openai wav
// 
// To use the real-time playback feature (Method 2), you also need:
// Windows:
//    npm install speaker
// Mac:
//    brew install portaudio
//    npm install speaker
// Linux (Ubuntu/Debian):
//    sudo apt-get install libasound2-dev
//    npm install speaker

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", // When the model is Qwen3-Omni-Flash, run in non-thinking mode.
    messages: [
        {
            "role": "user",
            "content": "Who are you?"
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

// Method 1: Decode after the generation is complete
// Requires installation: npm install wav
import { createWriteStream } from 'node:fs';  // node:fs is a built-in Node.js module, no installation required
import { Writer } from 'wav';

async function convertAudio(audioString, audioPath) {
    try {
        // Decode the Base64 string into a Buffer
        const wavBuffer = Buffer.from(audioString, 'base64');
        // Create a WAV file write stream
        const writer = new Writer({
            sampleRate: 24000,  // Sample rate
            channels: 1,        // Single channel
            bitDepth: 16        // 16-bit depth
        });
        // Create an output file stream and establish a pipeline connection
        const outputStream = createWriteStream(audioPath);
        writer.pipe(outputStream);

        // Write PCM data and end writing
        writer.write(wavBuffer);
        writer.end();

        // Use a Promise to wait for the file to be written
        await new Promise((resolve, reject) => {
            outputStream.on('finish', resolve);
            outputStream.on('error', reject);
        });

        // Add extra wait time to ensure audio integrity
        await new Promise(resolve => setTimeout(resolve, 800));

        console.log(`Audio file successfully saved as ${audioPath}`);
    } catch (error) {
        console.error('An error occurred during processing:', error);
    }
}

let audioString = "";
for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        if (chunk.choices[0].delta.audio) {
            if (chunk.choices[0].delta.audio["data"]) {
                audioString += chunk.choices[0].delta.audio["data"];
            }
        }
    } else {
        console.log(chunk.usage);
    }
}
// Execute the conversion
convertAudio(audioString, "audio_assistant_mjs.wav");


// Method 2: Generate and play in real time
// You must first install the necessary components according to the instructions for your system above.
// import Speaker from 'speaker'; // Import the audio playback library

// // Create a speaker instance (configuration matches WAV file parameters)
// const speaker = new Speaker({
//     sampleRate: 24000,  // Sample rate
//     channels: 1,        // Number of sound channels
//     bitDepth: 16,       // Bit depth
//     signed: true        // Signed PCM
// });
// for await (const chunk of completion) {
//     if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
//         if (chunk.choices[0].delta.audio) {
//             if (chunk.choices[0].delta.audio["data"]) {
//                 const pcmBuffer = Buffer.from(chunk.choices[0].delta.audio.data, 'base64');
//                 // Write directly to the speaker for playback
//                 speaker.write(pcmBuffer);
//             }
//         }
//     } else {
//         console.log(chunk.usage);
//     }
// }
// speaker.on('finish', () => console.log('Playback complete'));
// speaker.end(); // Call based on the actual end of the API stream

Input Base64-encoded local files

Images

This example uses the locally saved file eagle.png.

import os
from openai import OpenAI
import base64

client = OpenAI(
    # API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)


#  Base64 encoding format
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


base64_image = encode_image("eagle.png")

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When the model is Qwen3-Omni-Flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{base64_image}"},
                },
                {"type": "text", "text": "What scene is depicted in the image?"},
            ],
        },
    ],
    # Set the output data modality. Two are currently supported: ["text","audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True to avoid errors.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeImage = (imagePath) => {
    const imageFile = readFileSync(imagePath);
    return imageFile.toString('base64');
};
const base64Image = encodeImage("eagle.png")

const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash",// When the model is Qwen3-Omni-Flash, run in non-thinking mode.
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "image_url",
                "image_url": { "url": `data:image/png;base64,${base64Image}` },
            },
            { "type": "text", "text": "What scene is depicted in the image?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

Audio

This example uses the locally saved file welcome.mp3.

import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf
import requests

client = OpenAI(
    # API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)


def encode_audio(audio_path):
    with open(audio_path, "rb") as audio_file:
        return base64.b64encode(audio_file.read()).decode("utf-8")


base64_audio = encode_audio("welcome.mp3")

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When the model is Qwen3-Omni-Flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": f"data:;base64,{base64_audio}",
                        "format": "mp3",
                    },
                },
                {"type": "text", "text": "What is this audio about"},
            ],
        },
    ],
    # Set the output data modality. Two are currently supported: ["text","audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True to avoid errors.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // The API keys for the Singapore and Beijing regions are different. To obtain an API key, visit https://www.alibabacloud.com/help/en/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeAudio = (audioPath) => {
    const audioFile = readFileSync(audioPath);
    return audioFile.toString('base64');
};
const base64Audio = encodeAudio("welcome.mp3")

const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", // If you use Qwen3-Omni-Flash, run it in non-thinking mode.
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "input_audio",
                "input_audio": { "data": `data:;base64,${base64Audio}`, "format": "mp3" },
            },
            { "type": "text", "text": "What is this audio about?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

Video

Video file

This example uses the local file spring_mountain.mp4.

import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf

client = OpenAI(
    # API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

#  Base64 encoding format
def encode_video(video_path):
    with open(video_path, "rb") as video_file:
        return base64.b64encode(video_file.read()).decode("utf-8")


base64_video = encode_video("spring_mountain.mp4")

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When using Qwen3-Omni-Flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "video_url",
                    "video_url": {"url": f"data:;base64,{base64_video}"},
                },
                {"type": "text", "text": "What is she singing?"},
            ],
        },
    ],
    # Set the output data modality. Supported modalities are ["text", "audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True. Otherwise, an error occurs.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeVideo = (videoPath) => {
    const videoFile = readFileSync(videoPath);
    return videoFile.toString('base64');
};
const base64Video = encodeVideo("spring_mountain.mp4")

const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", // When using Qwen3-Omni-Flash, run in non-thinking mode.
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "video_url",
                "video_url": { "url": `data:;base64,${base64Video}` },
            },
            { "type": "text", "text": "What is she singing?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

Image list

This example uses the local files football1.jpg, football2.jpg, football3.jpg, and football4.jpg.

import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf

client = OpenAI(
    # API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)


#  Base64 encoding format
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


base64_image_1 = encode_image("football1.jpg")
base64_image_2 = encode_image("football2.jpg")
base64_image_3 = encode_image("football3.jpg")
base64_image_4 = encode_image("football4.jpg")

completion = client.chat.completions.create(
    model="qwen3-omni-flash", # When using Qwen3-Omni-Flash, run in non-thinking mode.
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "video",
                    "video": [
                        f"data:image/jpeg;base64,{base64_image_1}",
                        f"data:image/jpeg;base64,{base64_image_2}",
                        f"data:image/jpeg;base64,{base64_image_3}",
                        f"data:image/jpeg;base64,{base64_image_4}",
                    ],
                },
                {"type": "text", "text": "Describe the procedure in this video."},
            ],
        }
    ],
    # Set the output data modality. Supported modalities are ["text", "audio"] and ["text"].
    modalities=["text", "audio"],
    audio={"voice": "Cherry", "format": "wav"},
    # stream must be set to True. Otherwise, an error occurs.
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeImage = (imagePath) => {
    const imageFile = readFileSync(imagePath);
    return imageFile.toString('base64');
  };
const base64Image1 = encodeImage("football1.jpg")
const base64Image2 = encodeImage("football2.jpg")
const base64Image3 = encodeImage("football3.jpg")
const base64Image4 = encodeImage("football4.jpg")

const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash", // When using Qwen3-Omni-Flash, run in non-thinking mode.
    messages: [{
        role: "user",
        content: [
            {
                type: "video",
                video: [
                    `data:image/jpeg;base64,${base64Image1}`,
                    `data:image/jpeg;base64,${base64Image2}`,
                    `data:image/jpeg;base64,${base64Image3}`,
                    `data:image/jpeg;base64,${base64Image4}`
                ]
            },
            {
                type: "text",
                text: "Describe the procedure in this video."
            }
        ]
    }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Cherry", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }

API reference

For the input and output parameters of Qwen-Omni, see Qwen.

Billing and rate limiting

Billing rules

Qwen-Omni is billed based on the number of tokens for different modalities, such as audio, image, and video, see Models.

Rules for converting audio, images, and videos to tokens

Audio

  • Qwen3-Omni-Flash: Total tokens = Audio duration (in seconds) × 12.5

  • Qwen-Omni-Turbo: Total tokens = Audio duration (in seconds) × 25. If the audio duration is less than 1 second, it is calculated as 1 second.

Images

  • Qwen3-Omni-Flash: 1 token per 32 × 32 pixels.

  • Qwen-Omni-Turbo: 1 token per 28 × 28 pixels.

Each image requires a minimum of 4 tokens and supports a maximum of 1280 tokens. You can use the following code to estimate the total tokens for an image by providing its path:

import math
# Install the Pillow library: pip install Pillow
from PIL import Image

# For Qwen-Omni-Turbo, the factor is 28.
# factor = 28
# For Qwen3-Omni-Flash, the factor is 32.
factor = 32

def token_calculate(image_path=''):
    """
    param image_path: The path of the image.
    return: The number of tokens for a single image.
    """
    if len(image_path) > 0:
        # Open the specified image file.
        image = Image.open(image_path)
        # Get the original dimensions of the image.
        height = image.height
        width = image.width
    print(f"Image dimensions before scaling: Height={height}, Width={width}")
    # Adjust the height to be a multiple of the factor.
    h_bar = round(height / factor) * factor
    # Adjust the width to be a multiple of the factor.
    w_bar = round(width / factor) * factor
    # Lower limit for image tokens: 4 tokens.
    min_pixels = 4 * factor * factor
    # Upper limit for image tokens: 1280 tokens.
    max_pixels = 1280 * factor * factor
    # Scale the image to adjust the total number of pixels to be within the [min_pixels, max_pixels] range.
    if h_bar * w_bar > max_pixels:
        # Calculate the scaling factor beta so that the total pixels of the scaled image do not exceed max_pixels.
        beta = math.sqrt((height * width) / max_pixels)
        # Recalculate the adjusted height to ensure it is a multiple of the factor.
        h_bar = math.floor(height / beta / factor) * factor
        # Recalculate the adjusted width to ensure it is a multiple of the factor.
        w_bar = math.floor(width / beta / factor) * factor
    elif h_bar * w_bar < min_pixels:
        # Calculate the scaling factor beta so that the total pixels of the scaled image are not less than min_pixels.
        beta = math.sqrt(min_pixels / (height * width))
        # Recalculate the adjusted height to ensure it is a multiple of the factor.
        h_bar = math.ceil(height * beta / factor) * factor
        # Recalculate the adjusted width to ensure it is a multiple of the factor.
        w_bar = math.ceil(width * beta / factor) * factor
    print(f"Image dimensions after scaling: Height={h_bar}, Width={w_bar}")
    # Calculate the number of tokens for the image: total pixels / (factor * factor).
    token = int((h_bar * w_bar) / (factor * factor)) + 2
    print(f"Number of tokens after scaling: {token}")
    return token
    
if __name__ == "__main__":
    token = token_calculate(image_path="path/to/your/test.jpg")

Video

Video files generate two types of tokens: video_tokens (visual) and audio_tokens (audio).

  • video_tokens

    The calculation procedure is complex. For more information, see the following code:

    # Before use, install: pip install opencv-python
    import math
    import os
    import logging
    import cv2
    
    # Fixed parameters
    FRAME_FACTOR = 2
    
    # For Qwen3-Omni-Flash, IMAGE_FACTOR is 32
    IMAGE_FACTOR = 32
    
    # For Qwen-Omni-Turbo, IMAGE_FACTOR is 28
    # IMAGE_FACTOR = 28
    
    # Aspect ratio of video frames
    MAX_RATIO = 200
    
    # Lower limit for video frame pixels. For Qwen3-Omni-Flash: 128 * 32 * 32
    VIDEO_MIN_PIXELS = 128 * 32 * 32
    # For Qwen-Omni-Turbo
    # VIDEO_MIN_PIXELS = 128 * 28 * 28
    
    # Upper limit for video frame pixels. For Qwen3-Omni-Flash: 768 * 32 * 32
    VIDEO_MAX_PIXELS = 768 * 32 * 32
    # For Qwen-Omni-Turbo:
    # VIDEO_MAX_PIXELS = 768 * 28 * 28
    
    FPS = 2
    # Minimum number of extracted frames
    FPS_MIN_FRAMES = 4
    
    # Maximum number of extracted frames
    # Maximum number of extracted frames for Qwen3-Omni-Flash: 128
    # Maximum number of extracted frames for Qwen-Omni-Turbo: 80
    FPS_MAX_FRAMES = 128
    
    # Maximum pixel value for video input. For Qwen3-Omni-Flash: 16384 * 32 * 32
    VIDEO_TOTAL_PIXELS = 16384 * 32 * 32
    # For Qwen-Omni-Turbo:
    # VIDEO_TOTAL_PIXELS = 16384 * 28 * 28
    
    def round_by_factor(number, factor):
        return round(number / factor) * factor
    
    def ceil_by_factor(number, factor):
        return math.ceil(number / factor) * factor
    
    def floor_by_factor(number, factor):
        return math.floor(number / factor) * factor
    
    def get_video(video_path):
        cap = cv2.VideoCapture(video_path)
        frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
        total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
        video_fps = cap.get(cv2.CAP_PROP_FPS)
        cap.release()
        return frame_height, frame_width, total_frames, video_fps
    
    def smart_nframes(total_frames, video_fps):
        min_frames = ceil_by_factor(FPS_MIN_FRAMES, FRAME_FACTOR)
        max_frames = floor_by_factor(min(FPS_MAX_FRAMES, total_frames), FRAME_FACTOR)
        duration = total_frames / video_fps if video_fps != 0 else 0
        if duration - int(duration) > (1 / FPS):
            total_frames = math.ceil(duration * video_fps)
        else:
            total_frames = math.ceil(int(duration) * video_fps)
        nframes = total_frames / video_fps * FPS
        nframes = int(min(min(max(nframes, min_frames), max_frames), total_frames))
        if not (FRAME_FACTOR <= nframes <= total_frames):
            raise ValueError(f"nframes should in interval [{FRAME_FACTOR}, {total_frames}], but got {nframes}.")
        return nframes
    
    def smart_resize(height, width, nframes, factor=IMAGE_FACTOR):
        min_pixels = VIDEO_MIN_PIXELS
        total_pixels = VIDEO_TOTAL_PIXELS
        max_pixels = max(min(VIDEO_MAX_PIXELS, total_pixels / nframes * FRAME_FACTOR), int(min_pixels * 1.05))
        if max(height, width) / min(height, width) > MAX_RATIO:
            raise ValueError(f"absolute aspect ratio must be smaller than {MAX_RATIO}, got {max(height, width) / min(height, width)}")
        h_bar = max(factor, round_by_factor(height, factor))
        w_bar = max(factor, round_by_factor(width, factor))
        if h_bar * w_bar > max_pixels:
            beta = math.sqrt((height * width) / max_pixels)
            h_bar = floor_by_factor(height / beta, factor)
            w_bar = floor_by_factor(width / beta, factor)
        elif h_bar * w_bar < min_pixels:
            beta = math.sqrt(min_pixels / (height * width))
            h_bar = ceil_by_factor(height * beta, factor)
            w_bar = ceil_by_factor(width * beta, factor)
        return h_bar, w_bar
    
    def video_token_calculate(video_path):
        height, width, total_frames, video_fps = get_video(video_path)
        nframes = smart_nframes(total_frames, video_fps)
        resized_height, resized_width = smart_resize(height, width, nframes)
        video_token = int(math.ceil(nframes / FPS) * resized_height / 32 * resized_width / 32)
        video_token += 2  # Visual marks
        return video_token
    
    if __name__ == "__main__":
        video_path = "spring_mountain.mp4"  # Your video path
        video_token = video_token_calculate(video_path)
        print("video_tokens:", video_token)
  • audio_tokens

    • Qwen3-Omni-Flash: Total tokens = Audio duration (in seconds) × 12.5

    • Qwen-Omni-Turbo: Total tokens = Audio duration (in seconds) × 25

    If the audio duration is less than 1 second, it is calculated as 1 second.

Free quota

For more information about how to claim, query, and use your free quota, see Free quota for new users.

Rate limiting

For model rate limiting rules and FAQ, see Rate limits.

Error codes

If a call fails, see Error messages for troubleshooting.

Voice list

To use a voice, set the voice request parameter to the corresponding value in the voice parameter column of the tables below.

qwen3-omni-flash-2025-12-01

Name

voice parameter

Sample

Description

Supported languages

Cherry

Cherry

A cheerful, friendly, and natural young woman's voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Serena

Serena

Gentle young woman

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Serena

Ethan

Standard Mandarin with a slight northern accent. A bright, warm, and energetic voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Chelsie

Chelsie

An anime-style virtual girlfriend voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Momo

Momo

A playful and cute voice to cheer you up.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Vivian

Vivian

A cool, cute, and slightly grumpy voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Moon

Moon

The dashing and carefree Yuebai

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Maia

Maia

A voice that blends intelligence and gentleness.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Kai

Kai

A spa for your ears.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Nofish

Nofish

A designer who does not use retroflex consonants.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Bella

Bella

A loli who drinks but does not practice Drunken Fist.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Jennifer

Jennifer

A premium, cinematic American English female voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Ryan

Ryan

A rhythmic, dramatic voice with realism and tension.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Katerina

Katerina

A mature and rhythmic female voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Aiden

Aiden

An American young man who is a great cook.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Eldric Sage

Eldric Sage

A calm, wise, and weathered old man's voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Mia

Mia

As gentle as spring water, as quiet as the first snow

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Mochi

Mochi

A smart and precocious child's voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Bellona

Bellona

A powerful and sonorous voice with clear articulation that brings characters to life and stirs passion in the listener.

The clash of swords and the thunder of hooves echo in your mind, as a world of a thousand voices unfolds in perfectly clear and resonant tones.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Vincent

Vincent

A unique, raspy voice that evokes epic tales of heroism.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Meng Xiaoji

Bunny

A little loli brimming with moe appeal.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Neil

Neil

A professional news anchor's voice with a clear, steady tone.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Elias

Elias

Explains complex topics with academic rigor and clear storytelling.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Arthur

Arthur

A rustic, weathered voice of an old storyteller.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Nini

Nini

A voice as soft and sweet as a mochi. Its drawn-out calls of 'older brother' are heart-meltingly sweet.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Ebona

Ebona

A whispering voice that unlocks your deepest childhood fears.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Seren

Seren

A gentle and soothing voice to help you fall asleep. Good night and sweet dreams.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Pip

Pip

Mischievous yet full of childlike innocence. Is this the Shin-chan you remember?

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Stella

Stella

A sweet magical girl voice that is both ditsy and powerful.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Bodega

Bodega

Enthusiastic Spanish uncle

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sonisha

Sonrisa

A warm and cheerful Latin American lady

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Alek

Alek

A Russian voice that is both cool and warm.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Dolce

Dolce

Lazy Italian uncle

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sohee

Sohee

A gentle, cheerful, and expressive Korean older sister's voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Ono Anna

Ono Anna

A quirky and clever childhood friend's voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Lenn

Lenn

Rational at the core and rebellious in the details-a young German who wears a suit and listens to post-punk.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Emilien

Emilien

A romantic French gentleman

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Andre

Andre

A magnetic, natural, and calm male voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Radio Gol

Radio Gol

Football Poet Rádio Gol! Today, I will use names to call the football game for you.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Shanghai-Jada

Jada

A lively woman from Shanghai.

Chinese (Shanghainese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Beijing-Dylan

Dylan

A teenager who grew up in the hutongs of Beijing.

Chinese (Beijing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Nanjing-Li

Li

A patient yoga teacher.

Chinese (Nanjing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Shaanxi-Marcus

Marcus

A sincere and deep voice from Shaanxi.

Chinese (Shaanxi dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Minnan-Roy

Roy

A humorous, straightforward, and lively young man from Taiwan.

Chinese (Min Nan), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Tianjin-Peter

Peter

A voice for the straight man in Tianjin crosstalk.

Chinese (Tianjin dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sichuan-Sunny

Sunny

A sweet Sichuan girl's voice that will melt your heart.

Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sichuan-Eric

Eric

A man from Chengdu, Sichuan, who has risen above the mundane.

Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Cantonese-Rocky

Rocky

A witty and humorous male voice for online chats.

Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Cantonese-Kiki

Kiki

A sweet best friend from Hong Kong.

Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

qwen3-omni-flash and qwen3-omni-flash-2025-09-15

Name

voice parameter

Sample

Description

Supported languages

Cherry

Cherry

A cheerful, friendly, and natural young woman's voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Ethan

Ethan

Standard Mandarin with a slight northern accent. A bright, warm, and energetic voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Nofish

Nofish

A designer who does not use retroflex consonants.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Jennifer

Jennifer

A premium, cinematic American English female voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Ryan

Ryan

A rhythmic, dramatic voice with realism and tension.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Katerina

Katerina

A mature and rhythmic female voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Elias

Elias

Explains complex topics with academic rigor and clear storytelling.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Shanghai-Jada

Jada

A lively woman from Shanghai.

Chinese (Shanghainese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Beijing-Dylan

Dylan

A teenager who grew up in the hutongs of Beijing.

Chinese (Beijing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sichuan-Sunny

Sunny

A sweet Sichuan girl's voice that will melt your heart.

Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Nanjing-Li

Li

A patient yoga teacher.

Chinese (Nanjing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Shaanxi-Marcus

Marcus

A sincere and deep voice from Shaanxi.

Chinese (Shaanxi dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Minnan-Roy

Roy

A humorous, straightforward, and lively young man from Taiwan.

Chinese (Min Nan), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Tianjin-Peter

Peter

A voice for the straight man in Tianjin crosstalk.

Chinese (Tianjin dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Cantonese-Rocky

Rocky

A witty and humorous male voice for online chats.

Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Cantonese-Kiki

Kiki

A sweet best friend from Hong Kong.

Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Sichuan - Eric

Eric

An extraordinary man from Chengdu, Sichuan

Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

Qwen-Omni-Turbo

Name

voice parameter

Sample

Description

Supported languages

Cherry

Cherry

A cheerful, friendly, and natural young woman's voice.

Chinese, English

Serena

Serena

A gentle young woman's voice.

Chinese, English

Ethan

Ethan

Standard Mandarin with a slight northern accent. A bright, warm, and energetic voice.

Chinese, English

Chelsie

Chelsie

An anime-style virtual girlfriend voice.

Chinese, English

Open-source Qwen-Omni models

Name

voice parameter

Sample

Description

Supported languages

Ethan

Ethan

Standard Mandarin with a slight northern accent. Sounds bright, warm, and energetic.

Chinese, English

Chelsie

Chelsie

Anime-style virtual girlfriend.

Chinese, English