Qwen-Omni accepts a combination of text and a single other modality (image, audio, or video) as input and generates responses in text or speech. The model provides a variety of human-like voices and supports speech output in multiple languages and dialects. You can use it for scenarios such as text creation, visual recognition, and voice assistants.
Getting started
Prerequisites
Obtain your API key and set the API key as an environment variable.
Qwen-Omni supports calls made only through OpenAI-compatible methods. Install the latest version of the SDK. The minimum required version for the OpenAI Python SDK is 1.52.0. The minimum required version for the Node.js SDK is 4.68.0.
Invocation method: Qwen-Omni currently supports only streaming output. The stream parameter must be set to True to prevent errors.
The following example sends text to the Qwen-Omni API operation and receives a streaming response that contains text and audio.
import os
import base64
import soundfile as sf
import numpy as np
from openai import OpenAI
# 1. Initialize the client
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"), # Make sure the environment variable is configured
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# 2. Initiate the request
try:
completion = client.chat.completions.create(
model="qwen3-omni-flash",
messages=[{"role": "user", "content": "Who are you"}],
modalities=["text", "audio"], # Specify text and audio output
audio={"voice": "Cherry", "format": "wav"},
stream=True, # Must be set to True
stream_options={"include_usage": True},
)
# 3. Process the streaming response and decode the audio
print("Model response:")
audio_base64_string = ""
for chunk in completion:
# Process the text part
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
# Collect the audio part
if chunk.choices and hasattr(chunk.choices[0].delta, "audio") and chunk.choices[0].delta.audio:
audio_base64_string += chunk.choices[0].delta.audio.get("data", "")
# 4. Save the audio file
if audio_base64_string:
wav_bytes = base64.b64decode(audio_base64_string)
audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
sf.write("audio_assistant.wav", audio_np, samplerate=24000)
print("\nAudio file saved to: audio_assistant.wav")
except Exception as e:
print(f"Request failed: {e}")// Preparations before running:
// Universal for Windows/Mac/Linux:
// 1. Make sure Node.js is installed (version >= 14 is recommended)
// 2. Run the following command to install the necessary dependencies:
// npm install openai wav
import OpenAI from "openai";
import { createWriteStream } from 'node:fs';
import { Writer } from 'wav';
// Define an audio conversion function: convert a Base64 string and save it as a standard WAV audio file
async function convertAudio(audioString, audioPath) {
try {
// Decode the Base64 string into a Buffer
const wavBuffer = Buffer.from(audioString, 'base64');
// Create a WAV file write stream
const writer = new Writer({
sampleRate: 24000, // Sample rate
channels: 1, // Single channel
bitDepth: 16 // 16-bit depth
});
// Create an output file stream and establish a pipeline connection
const outputStream = createWriteStream(audioPath);
writer.pipe(outputStream);
// Write PCM data and end writing
writer.write(wavBuffer);
writer.end();
// Use a Promise to wait for the file to be written
await new Promise((resolve, reject) => {
outputStream.on('finish', resolve);
outputStream.on('error', reject);
});
// Add extra wait time to ensure audio integrity
await new Promise(resolve => setTimeout(resolve, 800));
console.log(`\nAudio file successfully saved as ${audioPath}`);
} catch (error) {
console.error('An error occurred during processing:', error);
}
}
// 1. Initialize the client
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
// 2. Initiate the request
const completion = await openai.chat.completions.create({
model: "qwen3-omni-flash",
messages: [
{
"role": "user",
"content": "Who are you?"
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Cherry", format: "wav" }
});
let audioString = "";
console.log("Model response:")
// 3. Process the streaming response and decode the audio
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
// Process the text content
if (chunk.choices[0].delta.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
// Process the audio content
if (chunk.choices[0].delta.audio) {
if (chunk.choices[0].delta.audio["data"]) {
audioString += chunk.choices[0].delta.audio["data"];
}
}
}
}
// 4. Save the audio file
convertAudio(audioString, "audio_assistant.wav");# ======= Important Note =======
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-omni-flash",
"messages": [
{
"role": "user",
"content": "Who are you?"
}
],
"stream":true,
"stream_options":{
"include_usage":true
},
"modalities":["text","audio"],
"audio":{"voice":"Cherry","format":"wav"}
}'
Model list
Compared to Qwen-VL , Qwen-Omni can:
Understand visual and audio information in video files.
Understand data in multiple modalities.
Output audio.
It also performs well in visual and audio understanding.
Use Qwen3-Omni-Flash for the best performance. Compared to Qwen-Omni-Turbo (which is no longer updated), Qwen3-Omni-Flash offers significant improvements:
Supports both thinking and non-thinking modes. You can switch between modes using the
enable_thinkingparameter. By default, thinking mode is disabled.For audio output in non-thinking mode:
The number of supported voices has increased to 17. Qwen-Omni-Turbo supports only 4.
The number of supported languages has increased to 10. Qwen-Omni-Turbo supports only 2.
International (Singapore)
Commercial models
Compared to open source versions, commercial models offer the latest features and improvements.
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Free quota |
(Tokens) | |||||||
qwen3-omni-flash Currently has the same capabilities as qwen3-omni-flash-2025-09-15 | Stable | Thinking mode | 65,536 | 16,384 | 32,768 | 16,384 | 1 million tokens each (modality-agnostic) Valid for 90 days after you activate Model Studio |
Non-thinking mode | 49,152 | - | |||||
qwen3-omni-flash-2025-09-15 Also known as qwen3-omni-flash-0915 | Snapshot | Thinking mode | 65,536 | 16,384 | 32,768 | 16,384 | |
Non-thinking mode | 49,152 | - | |||||
Open source models
Model | Context window | Max input | Max output | Free quota |
(Tokens) | ||||
qwen2.5-omni-7b | 32,768 | 30,720 | 2,048 | 1 million tokens (regardless of modality) Valid for 90 days after activating Alibaba Cloud Model Studio. |
Mainland China (Beijing)
Commercial models
Model | Version | Mode | Context window | Max input | Max chain-of-thought | Max output | Free quota |
(Tokens) | |||||||
qwen3-omni-flash Currently has the same capabilities as qwen3-omni-flash-2025-09-15 | Stable | Thinking mode | 65,536 | 16,384 | 32,768 | 16,384 | No free quota |
Non-thinking mode | 49,152 | - | |||||
qwen3-omni-flash-2025-09-15 Also known as qwen3-omni-flash-0915 | Snapshot | Thinking mode | 65,536 | 16,384 | 32,768 | 16,384 | |
Non-thinking mode | 49,152 | - | |||||
Open source models
Model | Context window | Max input | Max output | Free quota |
(Tokens) | ||||
qwen2.5-omni-7b | 32,768 | 30,720 | 2,048 | No free quota |
Usage notes
Input
Supported input modalities:
Theusermessage in thecontentarray can contain text and only one other modality, such as an image, audio, or video.
Methods for providing multimodal input:
An Internet URL
Base64 encoding. For more information, see Input Base64-encoded local files.
Output
Supported output modalities: The audio output is
Base64-encoded data. For more information about how to convert it to an audio file, see Parse Base64-encoded audio data output.Output modality
modalitiesparameter valueResponse style
Text
["text"] (default)
More formal and written in style.
Text and audio
["text","audio"]
Qwen3-Omni-Flash does not support audio output in thinking mode.
More conversational. The response includes filler words and encourages further interaction.
Qwen-Omni-Turbo does not support setting a System Message when the output modality includes audio.
Supported audio output languages:
Qwen-Omni-Turbo: Supports only Chinese (Mandarin) and English.
Qwen3-Omni-Flash (non-thinking mode): Supports Chinese (Mandarin and some dialects), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, and Korean.
Supported audio voices: You can configure the voice and file format of the audio output using the
audioparameter. For example: `audio={"voice": "Cherry", "format": "wav"}`.File format (
format): Can be set only to"wav".Audio voice (
voice): For a list of voices that each model supports, see Voice list.
Limitations
Streaming output is mandatory: All requests to the Qwen-Omni model must set
stream=True.Only the Qwen3-Omni-Flash model is a hybrid thinking model. For information about how to call it, see Enable or disable thinking mode. In thinking mode, audio output is not supported.
Enable or disable thinking mode
The Qwen3-Omni-Flash model is a hybrid thinking model. You can use the enable_thinking parameter to enable or disable thinking mode:
true: enables thinking modefalse(default): disables thinking mode
Qwen-Omni-Turbo is not a thinking model.OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-omni-flash",
messages=[{"role": "user", "content": "Who are you"}],
# Enable or disable thinking mode. Audio output is not supported in thinking mode. qwen-omni-turbo does not support setting enable_thinking.
extra_body={'enable_thinking': True},
# Set the output data modality. Two are currently supported in non-thinking mode: ["text","audio"] and ["text"]. Only ["text"] is supported in thinking mode.
modalities=["text"],
# Set the voice. The audio parameter is not supported in thinking mode.
# audio={"voice": "Cherry", "format": "wav"},
# stream must be set to True to avoid errors.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3-omni-flash",
messages: [
{ role: "user", content: "Who are you?" }
],
// stream must be set to True to avoid errors.
stream: true,
stream_options: {
include_usage: true
},
// Enable or disable thinking mode. Audio output is not supported in thinking mode. qwen-omni-turbo does not support setting enable_thinking.
extra_body:{'enable_thinking': true},
// Set the output data modality. Two are currently supported in non-thinking mode: ["text","audio"] and ["text"]. Only ["text"] is supported in thinking mode.
modalities: ["text"],
// Set the voice. The audio parameter is not supported in thinking mode.
//audio: { voice: "Cherry", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}# ======= Important Note =======
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-omni-flash",
"messages": [
{
"role": "user",
"content": "Who are you?"
}
],
"stream":true,
"stream_options":{
"include_usage":true
},
"modalities":["text"],
"enable_thinking": true
}'
Image and text input
Qwen-Omni supports multiple image inputs. The requirements for input images are as follows:
The size of a single image file cannot exceed 10 MB.
The number of images is limited by the model's maximum input token limit. The total number of tokens for all images and text must not exceed this limit.
The width and height of the image must both be greater than 10 pixels. The aspect ratio must not exceed 200:1 or 1:200.
For a list of supported image types, see Supported images.
The following sample code uses an image URL from the Internet as an example. For information about how to input a local image, see Input Base64-encoded local files. Note that only calls that use streaming output are currently supported.
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-omni-flash", # When the model is Qwen3-Omni-Flash, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
},
},
{"type": "text", "text": "What scene is depicted in the image?"},
],
},
],
# Set the output data modality. Two are currently supported: ["text","audio"] and ["text"].
modalities=["text", "audio"],
audio={"voice": "Cherry", "format": "wav"},
# stream must be set to True to avoid errors.
stream=True,
stream_options={
"include_usage": True
}
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3-omni-flash", // When the model is Qwen3-Omni-Flash, run in non-thinking mode.
messages: [
{
"role": "user",
"content": [{
"type": "image_url",
"image_url": { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg" },
},
{ "type": "text", "text": "What scene is depicted in the image?" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Cherry", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}# ======= Important Note =======
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-omni-7b",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
}
},
{
"type": "text",
"text": "What scene is depicted in the image?"
}
]
}
],
"stream":true,
"stream_options":{
"include_usage":true
},
"modalities":["text","audio"],
"audio":{"voice":"Cherry","format":"wav"}
}'
Audio and text input
You can input only one audio file.
File size
Qwen3-Omni-Flash: Cannot exceed 100 MB, with a maximum duration of 20 minutes.
Qwen-Omni-Turbo: Cannot exceed 10 MB, with a maximum duration of 3 minutes.
The following sample code uses an audio URL from the Internet as an example. For information about how to input a local audio file, see Input Base64-encoded local files. Note that only calls that use streaming output are currently supported.
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-omni-flash",# When the model is Qwen3-Omni-Flash, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
"format": "wav",
},
},
{"type": "text", "text": "What is this audio about"},
],
},
],
# Set the output data modality. Two are currently supported: ["text","audio"] and ["text"].
modalities=["text", "audio"],
audio={"voice": "Cherry", "format": "wav"},
# stream must be set to True to avoid errors.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3-omni-flash", // When the model is Qwen3-Omni-Flash, run in non-thinking mode.
messages: [
{
"role": "user",
"content": [{
"type": "input_audio",
"input_audio": { "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav", "format": "wav" },
},
{ "type": "text", "text": "What is this audio about" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Cherry", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}# ======= Important Note =======
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-omni-7b",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
"format": "wav"
}
},
{
"type": "text",
"text": "What is this audio about"
}
]
}
],
"stream":true,
"stream_options":{
"include_usage":true
},
"modalities":["text","audio"],
"audio":{"voice":"Cherry","format":"wav"}
}'
Video and text input
You can input a video as an image list or as a video file. If you input a video file, the model can also understand the audio in the video.
The following sample code uses a video URL from the Internet as an example. For information about how to input a local video, see Input Base64-encoded local files. Note that only calls that use streaming output are currently supported.
Image list format
Number of images
Qwen3-Omni-Flash: A minimum of 2 images and a maximum of 128 images.
Qwen-Omni-Turbo: A minimum of 4 images and a maximum of 80 images.
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-omni-flash", # When the model is Qwen3-Omni-Flash, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "video",
"video": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg",
],
},
{"type": "text", "text": "Describe the process shown in this video"},
],
}
],
# Set the output data modality. Two are currently supported: ["text","audio"] and ["text"].
modalities=["text", "audio"],
audio={"voice": "Cherry", "format": "wav"},
# stream must be set to True to avoid errors.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3-omni-flash", //When the model is Qwen3-Omni-Flash, run in non-thinking mode.
messages: [{
role: "user",
content: [
{
type: "video",
video: [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
]
},
{
type: "text",
text: "Describe the process shown in this video"
}
]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Cherry", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}# ======= Important Note =======
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-omni-7b",
"messages": [
{
"role": "user",
"content": [
{
"type": "video",
"video": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
]
},
{
"type": "text",
"text": "Describe the process shown in this video"
}
]
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"modalities": ["text", "audio"],
"audio": {
"voice": "Cherry",
"format": "wav"
}
}'
Video file format (can understand audio in the video)
You can input only one video file.
File size:
Qwen3-Omni-Flash: Limited to 256 MB, with a duration limit of 150 s.
Qwen-Omni-Turbo: Limited to 150 MB, with a duration limit of 40 s.
The visual and audio information in the video file are billed separately.
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-omni-flash", # When the model is Qwen3-Omni-Flash, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
},
},
{"type": "text", "text": "What is the content of the video?"},
],
},
],
# Set the output data modality. Two are currently supported: ["text","audio"] and ["text"].
modalities=["text", "audio"],
audio={"voice": "Cherry", "format": "wav"},
# stream must be set to True to avoid errors.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3-omni-flash", // When the model is Qwen3-Omni-Flash, run in non-thinking mode.
messages: [
{
"role": "user",
"content": [{
"type": "video_url",
"video_url": { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4" },
},
{ "type": "text", "text": "What is the content of the video?" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Cherry", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}# ======= Important Note =======
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-omni-7b",
"messages": [
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
}
},
{
"type": "text",
"text": "What is the content of the video"
}
]
}
],
"stream":true,
"stream_options": {
"include_usage": true
},
"modalities":["text","audio"],
"audio":{"voice":"Cherry","format":"wav"}
}'
Multi-turn conversation
When you use the multi-turn conversation feature of Qwen-Omni, note the following:
Assistant Message
Assistant messages in the messages array support only text data.
User Message
A user message can contain text and data from only one other modality. In a multi-turn conversation, you can use different modalities in separate user messages.
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-omni-flash", # When the model is Qwen3-Omni-Flash, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3",
"format": "mp3",
},
},
{"type": "text", "text": "What is this audio about"},
],
},
{
"role": "assistant",
"content": [{"type": "text", "text": "This audio says: Welcome to Alibaba Cloud"}],
},
{
"role": "user",
"content": [{"type": "text", "text": "Can you tell me about this company?"}],
},
],
# Set the output data modality. Two are currently supported: ["text","audio"] and ["text"].
modalities=["text"],
# stream must be set to True to avoid errors.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3-omni-flash", // When the model is Qwen3-Omni-Flash, run in non-thinking mode.
messages: [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3",
"format": "mp3",
},
},
{ "type": "text", "text": "What is this audio about" },
],
},
{
"role": "assistant",
"content": [{ "type": "text", "text": "This audio says: Welcome to Alibaba Cloud" }],
},
{
"role": "user",
"content": [{ "type": "text", "text": "Can you tell me about this company?" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text"]
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}# ======= Important Note =======
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen2.5-omni-7b",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
},
{
"type": "text",
"text": "What is this audio about"
}
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "This audio says: Welcome to Alibaba Cloud"
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Can you tell me about this company?"
}
]
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"modalities": ["text"]
}'Parse Base64-encoded audio data output
The audio output from Qwen-Omni is Base64-encoded data delivered in a stream. To reconstruct the audio file, you can use a string variable to accumulate the Base64 data from each fragment as you receive it. After the stream is complete, you can decode the final string to create the audio file. Alternatively, you can decode and play each fragment in real time as you receive it.
# Installation instructions for pyaudio:
# APPLE Mac OS X
# brew install portaudio
# pip install pyaudio
# Debian/Ubuntu
# sudo apt-get install python-pyaudio python3-pyaudio
# or
# pip install pyaudio
# CentOS
# sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
# python -m pip install pyaudio
import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-omni-flash", # When the model is Qwen3-Omni-Flash, run in non-thinking mode.
messages=[{"role": "user", "content": "Who are you"}],
# Set the output data modality. Two are currently supported: ["text","audio"] and ["text"].
modalities=["text", "audio"],
audio={"voice": "Cherry", "format": "wav"},
# stream must be set to True to avoid errors.
stream=True,
stream_options={"include_usage": True},
)
# Method 1: Decode after the generation is complete
audio_string = ""
for chunk in completion:
if chunk.choices:
if hasattr(chunk.choices[0].delta, "audio"):
try:
audio_string += chunk.choices[0].delta.audio["data"]
except Exception as e:
print(chunk.choices[0].delta.audio["transcript"])
else:
print(chunk.usage)
wav_bytes = base64.b64decode(audio_string)
audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
sf.write("audio_assistant_py.wav", audio_np, samplerate=24000)
# Method 2: Decode while generating (comment out the code for Method 1 to use Method 2)
# # Initialize PyAudio
# import pyaudio
# import time
# p = pyaudio.PyAudio()
# # Create an audio stream
# stream = p.open(format=pyaudio.paInt16,
# channels=1,
# rate=24000,
# output=True)
# for chunk in completion:
# if chunk.choices:
# if hasattr(chunk.choices[0].delta, "audio"):
# try:
# audio_string = chunk.choices[0].delta.audio["data"]
# wav_bytes = base64.b64decode(audio_string)
# audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
# # Play the audio data directly
# stream.write(audio_np.tobytes())
# except Exception as e:
# print(chunk.choices[0].delta.audio["transcript"])
# time.sleep(0.8)
# # Clean up resources
# stream.stop_stream()
# stream.close()
# p.terminate()// Preparations before running:
// Universal for Windows/Mac/Linux:
// 1. Make sure Node.js is installed (version >= 14 is recommended)
// 2. Run the following command to install the necessary dependencies:
// npm install openai wav
//
// To use the real-time playback feature (Method 2), you also need:
// Windows:
// npm install speaker
// Mac:
// brew install portaudio
// npm install speaker
// Linux (Ubuntu/Debian):
// sudo apt-get install libasound2-dev
// npm install speaker
import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3-omni-flash", // When the model is Qwen3-Omni-Flash, run in non-thinking mode.
messages: [
{
"role": "user",
"content": "Who are you?"
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Cherry", format: "wav" }
});
// Method 1: Decode after the generation is complete
// Requires installation: npm install wav
import { createWriteStream } from 'node:fs'; // node:fs is a built-in Node.js module, no installation required
import { Writer } from 'wav';
async function convertAudio(audioString, audioPath) {
try {
// Decode the Base64 string into a Buffer
const wavBuffer = Buffer.from(audioString, 'base64');
// Create a WAV file write stream
const writer = new Writer({
sampleRate: 24000, // Sample rate
channels: 1, // Single channel
bitDepth: 16 // 16-bit depth
});
// Create an output file stream and establish a pipeline connection
const outputStream = createWriteStream(audioPath);
writer.pipe(outputStream);
// Write PCM data and end writing
writer.write(wavBuffer);
writer.end();
// Use a Promise to wait for the file to be written
await new Promise((resolve, reject) => {
outputStream.on('finish', resolve);
outputStream.on('error', reject);
});
// Add extra wait time to ensure audio integrity
await new Promise(resolve => setTimeout(resolve, 800));
console.log(`Audio file successfully saved as ${audioPath}`);
} catch (error) {
console.error('An error occurred during processing:', error);
}
}
let audioString = "";
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
if (chunk.choices[0].delta.audio) {
if (chunk.choices[0].delta.audio["data"]) {
audioString += chunk.choices[0].delta.audio["data"];
}
}
} else {
console.log(chunk.usage);
}
}
// Execute the conversion
convertAudio(audioString, "audio_assistant_mjs.wav");
// Method 2: Generate and play in real time
// You must first install the necessary components according to the instructions for your system above.
// import Speaker from 'speaker'; // Import the audio playback library
// // Create a speaker instance (configuration matches WAV file parameters)
// const speaker = new Speaker({
// sampleRate: 24000, // Sample rate
// channels: 1, // Number of sound channels
// bitDepth: 16, // Bit depth
// signed: true // Signed PCM
// });
// for await (const chunk of completion) {
// if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
// if (chunk.choices[0].delta.audio) {
// if (chunk.choices[0].delta.audio["data"]) {
// const pcmBuffer = Buffer.from(chunk.choices[0].delta.audio.data, 'base64');
// // Write directly to the speaker for playback
// speaker.write(pcmBuffer);
// }
// }
// } else {
// console.log(chunk.usage);
// }
// }
// speaker.on('finish', () => console.log('Playback complete'));
// speaker.end(); // Call based on the actual end of the API streamInput Base64-encoded local files
Images
This example uses the local file eagle.png.
import os
from openai import OpenAI
import base64
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# Base64 encoding format
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
base64_image = encode_image("eagle.png")
completion = client.chat.completions.create(
model="qwen3-omni-flash", # When the model is Qwen3-Omni-Flash, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{base64_image}"},
},
{"type": "text", "text": "What scene is depicted in the image?"},
],
},
],
# Set the output data modality. Two are currently supported: ["text","audio"] and ["text"].
modalities=["text", "audio"],
audio={"voice": "Cherry", "format": "wav"},
# stream must be set to True to avoid errors.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeImage = (imagePath) => {
const imageFile = readFileSync(imagePath);
return imageFile.toString('base64');
};
const base64Image = encodeImage("eagle.png")
const completion = await openai.chat.completions.create({
model: "qwen3-omni-flash",// When the model is Qwen3-Omni-Flash, run in non-thinking mode.
messages: [
{
"role": "user",
"content": [{
"type": "image_url",
"image_url": { "url": `data:image/png;base64,${base64Image}` },
},
{ "type": "text", "text": "What scene is depicted in the image?" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Cherry", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}Audio
This example uses the local file welcome.mp3.
import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf
import requests
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
def encode_audio(audio_path):
with open(audio_path, "rb") as audio_file:
return base64.b64encode(audio_file.read()).decode("utf-8")
base64_audio = encode_audio("welcome.mp3")
completion = client.chat.completions.create(
model="qwen3-omni-flash", # When the model is Qwen3-Omni-Flash, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": f"data:;base64,{base64_audio}",
"format": "mp3",
},
},
{"type": "text", "text": "What is this audio about"},
],
},
],
# Set the output data modality. Two are currently supported: ["text","audio"] and ["text"].
modalities=["text", "audio"],
audio={"voice": "Cherry", "format": "wav"},
# stream must be set to True to avoid errors.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeAudio = (audioPath) => {
const audioFile = readFileSync(audioPath);
return audioFile.toString('base64');
};
const base64Audio = encodeAudio("welcome.mp3")
const completion = await openai.chat.completions.create({
model: "qwen3-omni-flash", // When the model is Qwen3-Omni-Flash, run in non-thinking mode.
messages: [
{
"role": "user",
"content": [{
"type": "input_audio",
"input_audio": { "data": `data:;base64,${base64Audio}`, "format": "mp3" },
},
{ "type": "text", "text": "What is this audio about" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Cherry", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}Video
Video file
This example uses the local file spring_mountain.mp4.
import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# Base64 encoding format
def encode_video(video_path):
with open(video_path, "rb") as video_file:
return base64.b64encode(video_file.read()).decode("utf-8")
base64_video = encode_video("spring_mountain.mp4")
completion = client.chat.completions.create(
model="qwen3-omni-falsh", # When using the qwen3-omni-flash model, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {"url": f"data:;base64,{base64_video}"},
},
{"type": "text", "text": "What is she singing?"},
],
},
],
# Set the output data modality. Supported modalities are ["text", "audio"] and ["text"].
modalities=["text", "audio"],
audio={"voice": "Cherry", "format": "wav"},
# stream must be set to True. Otherwise, an error occurs.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeVideo = (videoPath) => {
const videoFile = readFileSync(videoPath);
return videoFile.toString('base64');
};
const base64Video = encodeVideo("spring_mountain.mp4")
const completion = await openai.chat.completions.create({
model: "qwen3-omni-flash", // When using the qwen3-omni-flash model, run in non-thinking mode.
messages: [
{
"role": "user",
"content": [{
"type": "video_url",
"video_url": { "url": `data:;base64,${base64Video}` },
},
{ "type": "text", "text": "What is she singing?" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Cherry", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}Image list
This example uses the local files football1.jpg, football2.jpg, football3.jpg, and football4.jpg.
import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# Base64 encoding format
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
base64_image_1 = encode_image("football1.jpg")
base64_image_2 = encode_image("football2.jpg")
base64_image_3 = encode_image("football3.jpg")
base64_image_4 = encode_image("football4.jpg")
completion = client.chat.completions.create(
model="qwen3-omni-flash", # When using the qwen3-omni-flash model, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "video",
"video": [
f"data:image/jpeg;base64,{base64_image_1}",
f"data:image/jpeg;base64,{base64_image_2}",
f"data:image/jpeg;base64,{base64_image_3}",
f"data:image/jpeg;base64,{base64_image_4}",
],
},
{"type": "text", "text": "Describe the procedure in this video."},
],
}
],
# Set the output data modality. Supported modalities are ["text", "audio"] and ["text"].
modalities=["text", "audio"],
audio={"voice": "Cherry", "format": "wav"},
# stream must be set to True. Otherwise, an error occurs.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. For more information, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeImage = (imagePath) => {
const imageFile = readFileSync(imagePath);
return imageFile.toString('base64');
};
const base64Image1 = encodeImage("football1.jpg")
const base64Image2 = encodeImage("football2.jpg")
const base64Image3 = encodeImage("football3.jpg")
const base64Image4 = encodeImage("football4.jpg")
const completion = await openai.chat.completions.create({
model: "qwen3-omni-flash", // When using the qwen3-omni-flash model, run in non-thinking mode.
messages: [{
role: "user",
content: [
{
type: "video",
video: [
`data:image/jpeg;base64,${base64Image1}`,
`data:image/jpeg;base64,${base64Image2}`,
`data:image/jpeg;base64,${base64Image3}`,
`data:image/jpeg;base64,${base64Image4}`
]
},
{
type: "text",
text: "Describe the procedure in this video."
}
]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Cherry", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
API reference
For more information about the input and output parameters of Qwen-Omni, see Qwen.
Billing and rate limiting
Billing rules
Qwen-Omni is billed based on the number of tokens for different modalities, such as audio, image, and video. For billing details, see Model List.
Free quota
For more information about how to claim, query, and use your free quota, see Free quota for new users.
Rate limiting
For model rate limiting rules and FAQ, see Rate limiting.
Error codes
If a call fails, see 429-Error messages for troubleshooting.
Voice list
The models support different voices. To use a voice, you can set the voice request parameter to the corresponding value in the voice parameter column of the tables below.
Qwen3-Omni-Flash (non-thinking mode)
The Qwen3-Omni-Flash model lets you set the voice using the voice parameter only in non-thinking mode. In thinking mode, the model supports only text outputs.
Name |
| Voice effects | Description | Supported languages |
Cherry | Cherry | A cheerful, friendly, and natural young woman's voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Ethan | Ethan | Standard Mandarin with a slight northern accent. A bright, warm, and energetic voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Nofish | Nofish | A designer who does not use retroflex consonants. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Jennifer | Jennifer | A premium, cinematic American English female voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Ryan | Ryan | A rhythmic, dramatic voice with realism and tension. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Katerina | Katerina | A mature and rhythmic female voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Elias | Elias | Explains complex topics with academic rigor and clear storytelling. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Shanghai-Jada | Jada | A lively woman from Shanghai. | Chinese (Shanghainese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Beijing-Dylan | Dylan | A teenager who grew up in the hutongs of Beijing. | Chinese (Beijing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Sichuan-Sunny | Sunny | A sweet female voice from Sichuan. | Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Nanjing-Li | Li | A patient yoga teacher. | Chinese (Nanjing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Shaanxi-Marcus | Marcus | A sincere and deep voice from Shaanxi. | Chinese (Shaanxi dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Man Nan-Roy | Roy | A humorous and lively young male voice with a Minnan accent. | Chinese (Min Nan), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Tianjin-Peter | Peter | A voice for the straight man in Tianjin crosstalk. | Chinese (Tianjin dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Cantonese-Rocky | Rocky | A witty and humorous male voice for online chats. | Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Cantonese-Kiki | Kiki | A sweet best friend from Hong Kong. | Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Sichuan-Eric | Eric | An unconventional and refined male voice from Chengdu, Sichuan. | Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai |
Qwen-Omni-Turbo
Name |
| Voice effects | Description | Supported languages |
Cherry | Cherry | A sunny, friendly, and genuine young woman. | Chinese, English | |
Serena | Serena | Kind young woman. | Chinese, English | |
Ethan | Ethan | Standard Mandarin with a slight northern accent. A bright, warm, and energetic voice. | Chinese, English | |
Chelsie | Chelsie | An anime-style virtual girlfriend voice. | Chinese, English |
Qwen-Omni open-source model
Name |
| Voice effects | Description | Supported languages |
Ethan | Ethan | Standard Mandarin with a slight northern accent. A bright, warm, and energetic voice. | Chinese, English | |
Chelsie | Chelsie | An anime-style virtual girlfriend voice. | Chinese, English |