Qwen-Omni accepts text combined with a single additional modality, image, audio, or video, and generates text or speech responses. It delivers human-like voices and supports multilingual and dialect-based speech output. You can use it for content moderation, text creation, visual recognition, and audio-video interaction assistants.
Supported regions: Singapore, Beijing. Use the API key for your region.
Getting started
Prerequisites
You have configured the API key and set the API key as an environment variable.
The Qwen-Omni model supports only OpenAI-compatible invocation. Install the latest SDK: OpenAI Python SDK 1.52.0 or later, Node.js SDK 4.68.0 or later.
This example sends a text prompt to the Qwen-Omni API and returns a streaming response with both text and audio.
import os
import base64
import soundfile as sf
import numpy as np
from openai import OpenAI
# 1. Initialize the client
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"), # Confirm that the environment variable is set
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# 2. Send the request
try:
completion = client.chat.completions.create(
model="qwen3.5-omni-plus",
messages=[{"role": "user", "content": "Who are you?"}],
modalities=["text", "audio"], # Specify text and audio output
audio={"voice": "Tina", "format": "wav"},
stream=True, # Must be set to True
stream_options={"include_usage": True},
)
# 3. Process the streaming response and decode the audio
print("Model response:")
audio_base64_string = ""
for chunk in completion:
# Process the text part
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
# Collect the audio part
if chunk.choices and hasattr(chunk.choices[0].delta, "audio") and chunk.choices[0].delta.audio:
audio_base64_string += chunk.choices[0].delta.audio.get("data", "")
# 4. Save the audio file
if audio_base64_string:
wav_bytes = base64.b64decode(audio_base64_string)
audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
sf.write("audio_assistant.wav", audio_np, samplerate=24000)
print("\nAudio file saved to: audio_assistant.wav")
except Exception as e:
print(f"Request failed: {e}")// Before running:
// 1. Ensure Node.js version >= 14 is installed.
// 2. Run: npm install openai wav
import OpenAI from "openai";
import { createWriteStream } from 'node:fs';
import { Writer } from 'wav';
// Convert a Base64 string and save it as a WAV audio file
async function convertAudio(audioString, audioPath) {
try {
// Decode the Base64 string into a Buffer
const wavBuffer = Buffer.from(audioString, 'base64');
// Create a WAV file write stream
const writer = new Writer({
sampleRate: 24000, // Sample rate
channels: 1, // Mono
bitDepth: 16 // 16-bit depth
});
// Create an output file stream and establish a pipe connection
const outputStream = createWriteStream(audioPath);
writer.pipe(outputStream);
// Write PCM data and end writing
writer.write(wavBuffer);
writer.end();
// Wait for the file to finish writing
await new Promise((resolve, reject) => {
outputStream.on('finish', resolve);
outputStream.on('error', reject);
});
// Add extra wait time to ensure audio integrity
await new Promise(resolve => setTimeout(resolve, 800));
console.log(`\nAudio file saved to: ${audioPath}`);
} catch (error) {
console.error('Error during audio processing:', error);
}
}
// 1. Initialize the client
const openai = new OpenAI(
{
// If no environment variable is set, replace the next line with your Alibaba Cloud Model Studio API key: apiKey: "sk-xxx"
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
// 2. Send the request
const completion = await openai.chat.completions.create({
model: "qwen3.5-omni-plus",
messages: [
{
"role": "user",
"content": "Who are you?"
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Tina", format: "wav" }
});
let audioString = "";
console.log("Model response:")
// 3. Process the streaming response and decode the audio
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
// Process text content
if (chunk.choices[0].delta.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
// Process audio content
if (chunk.choices[0].delta.audio) {
if (chunk.choices[0].delta.audio["data"]) {
audioString += chunk.choices[0].delta.audio["data"];
}
}
}
}
// 4. Save the audio file
convertAudio(audioString, "audio_assistant.wav");# ======= Important note =======
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-omni-plus",
"messages": [
{
"role": "user",
"content": "Who are you?"
}
],
"stream":true,
"stream_options":{
"include_usage":true
},
"modalities":["text","audio"],
"audio":{"voice":"Tina","format":"wav"}
}'Model selection
Qwen3.5-Omni is in preview. Model invocation is temporarily free, but tool calling still incurs fees, see Billing details.
Qwen3.5-Omni series: Best for long video analysis, meeting summaries, caption generation, content moderation, and audio-video interaction.
Input limits: Up to 3 hours of audio or 1 hour of video
Audio control: Supports adjusting volume, speaking rate, and emotion via instructions
Visual capability: Matches Qwen3.5's level. Understands images, speech, ambient sounds, and other multimodal information
Qwen3-Omni-Flash series: Best for short video analysis and cost-sensitive scenarios.
Input limits: Audio-video input under 150 seconds
Thinking mode: The only Qwen-Omni series that supports thinking mode
Qwen-Omni-Turbo series
This series is no longer updated and has limited features. Migrate to the Qwen3.5-Omni or Qwen3-Omni-Flash series.
Series | Audio-video description | Deep thinking | Web search | Input languages | Output audio languages | Supported voices |
Qwen3.5-Omni Latest-generation omni-modal model | Strong | Not supported | Supported | 113 | 36 | 55 |
Qwen3-Omni-Flash Hybrid thinking model | Weaker | Supported | Not supported | 19 | 19 | 17 to 49 Varies by version |
Qwen-Omni-Turbo No longer updated | None | Not supported | Not supported | Chinese, English | Chinese, English | 4 |
For model names, context windows, pricing, and snapshot versions, see the Model list. For rate limits, see Rate limiting.
Model performance
Audio and video content analysis
Generate a comprehensive timestamped description of this video. | 00:00.000 – 00:02.500 A rain-soaked city street fills the widescreen frame. Long-exposure photography creates streaks of red and blue car lights on wet pavement. A solitary man in a dark knee-length trench coat and light shirt with a tie walks toward the camera along the right sidewalk. Raindrops cling to his shoulders and hair; each step makes a muffled plop on damp concrete. A low, ominous electronic hum underscores ambient sounds, while steady rain crackles all around. Colorful graffiti covers the brick wall beside him, and neon signs glow in the distance—one clearly displays orange-red cursive script reading "Nice," while another vertical sign shows the letters "AT" at its bottom. ...... 00:12.300 – 00:14.533 The camera tilts down. Inside a flashlight beam, a yellowed, aged single sheet lies in a shallow puddle, edges charred and corners curled. Inked paragraphs cover most of its surface, but moisture and scorch marks make the text unreadable. The man kneels, extending gloved fingers toward the document; ripples spread outward as his hand nears. Sirens peak then begin to fade slightly, while the ambient hum remains steady. The scene freezes just before the trembling paper is touched, ending the clip. Only partial timestamps shown. |
Multidialect understanding
Describe the vocal characteristics, emotional state, and full lyrics for singers in different Chinese dialects. | This audio features rap performances by singers from various regions of China, showcasing rich dialect diversity. Below are detailed descriptions of each singer's vocal traits, emotional state, and corresponding lyrics: Hangzhou dialect (Wu) Vocal traits: The male singer's voice is soft and gentle, with the distinctive smoothness and warmth of Wu dialect. His intonation rises and falls gently, and his rhythm is light and quick, creating a relaxed storytelling effect. Emotional state: Relaxed, comfortable, and full of everyday life. He describes Hangzhou's slow-paced lifestyle and urban changes, expressing affection for his hometown and reflection on modern development. Overall, he feels cozy and familiar. Lyrics: > Hangzhou dialect, listen up! Our culture includes local erhua pronunciation. > Like flowers blooming by West Lake, no cultural survey needed—we understand it best. > Outsiders don't get it. "Ge," "lao'er," "fen'er," "ya'er"—can outsiders tell them apart? > Let's go! Transport is fast—subways everywhere. We'll check it out. > Welcoming Southeast Asia and Thailand, watching "Thirteen Ta" shows. Proud of our city, direct in personality. > So you only came for the "bridge-crossing" experience, and because the flavor is different, you're not returning? Only partial results shown. |
Lyric caption generation
Transcribe the song lyrics and provide timestamps for each line in this format:[00:00:15,020 --> 00:00:28,085] : When you walk through a storm, hold your head up high.[00:00:28,085 --> 00:00:40,200] And don't be afraid of the dark....... | [00:00:12,680 --> 00:00:16,960] Cat thread sways past moonlight on trees. [00:00:18,400 --> 00:00:22,800] Radiators hum 1998 chart hits. [00:00:24.160 → 00:00:28.080] Time parts the mist-like heat waves. [00:00:28,920 --> 00:00:33,000] Neon from the screen shines on my nose bridge. ...... [00:03:16,720 --> 00:03:21,680] We nestle in the softest ring of the tree trunk. [00:03:22,400 --> 00:03:27,000] Breathing turns residual warmth into honey-sugar. [00:03:28,160 --> 00:03:33,200] The sofa sinks into cloud-fluff shape. [00:03:34,000 --> 00:03:38,800] Every pore soaks in sunshine. [00:04:09,000 --> 00:04:10,020] (End) Only partial results shown. |
Audio-video programming
Usage
Streaming output
All requests to Qwen-Omni must set stream=True.
Model configuration
Configure parameters, prompts, and audio-video lengths based on your use case to balance cost, speed, and quality.
Audio-video understanding
Use case | Recommended video length | Recommended prompt | Recommended max_pixels |
Fast review, low cost | ≤60 minutes | Simple prompt within 50 words | 230,400 |
Content extraction (long video segmentation) | ≤60 minutes | 921,600–2,073,600 | |
Standard analysis (short video tagging) | ≤4 minutes | Use the structured prompt below | 921,600–2,073,600 |
Fine-grained analysis (multiple speakers/complex scenes) | ≤2 minutes | 2,073,600 |
For fine-grained descriptions of long videos, segment them first.
Audio understanding
Balance cost and quality by controlling audio length and prompt complexity.
Use case | Recommended audio length | Recommended prompt |
Fast review, low cost | ≤60 minutes | Simple prompt within 50 words |
Content extraction (segment long audio) | ≤60 minutes | |
Standard analysis (audio tagging) | ≤2 minutes | Use a structured prompt |
Fine-grained analysis (multiple speakers/complex scenes) | ≤1 minute |
For fine-grained descriptions of long audio, segment it first.
Multimodal input
Video and text input
Provide video as an image list or a video file (with audio support).
Video file (supports audio in the video)
Number of files:
Qwen3.5-Omni series: Up to 512 files using public URLs; up to 250 files using Base64 encoding.
Qwen3-Omni-Flash and Qwen-Omni-Turbo series: Only one file allowed.
File size:
Qwen3.5-Omni: Up to 2 GB, up to 1 hour duration.
Qwen3-Omni-Flash: Up to 256 MB, up to 150 seconds duration.
Qwen-Omni-Turbo: Up to 150 MB, up to 40 seconds duration.
File formats: MP4, AVI, MKV, MOV, FLV, WMV, etc.
Visual and audio information in the video file are billed separately.
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3.5-omni-plus", # For Qwen3-Omni-Flash, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
},
},
{"type": "text", "text": "What is the video about?"},
],
},
],
# Set the output modality. Two options are currently supported: ["text","audio"] and ["text"]
modalities=["text", "audio"],
audio={"voice": "Tina", "format": "wav"},
# stream must be set to True, otherwise an error occurs.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3.5-omni-plus", // For Qwen3-Omni-Flash, run in non-thinking mode.
messages: [
{
"role": "user",
"content": [{
"type": "video_url",
"video_url": { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4" },
},
{ "type": "text", "text": "What is the video about?" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Tina", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}# ======= Important note =======
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-omni-plus",
"messages": [
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
}
},
{
"type": "text",
"text": "What is the video about"
}
]
}
],
"stream":true,
"stream_options": {
"include_usage": true
},
"modalities":["text","audio"],
"audio":{"voice":"Tina","format":"wav"}
}'
Image list format
Number of images
Qwen3.5-Omni: Minimum 2 images, maximum 2048 images
Qwen3-Omni-Flash: Minimum 2 images, maximum 128 images
Qwen-Omni-Turbo: Minimum 4 images, maximum 80 images
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3.5-omni-plus", # For Qwen3-Omni-Flash, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "video",
"video": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg",
],
},
{"type": "text", "text": "Describe the process shown in this video"},
],
}
],
# Set the output modality. Two options are currently supported: ["text","audio"] and ["text"]
modalities=["text", "audio"],
audio={"voice": "Tina", "format": "wav"},
# stream must be set to True, otherwise an error occurs.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3.5-omni-plus", // For Qwen3-Omni-Flash, run in non-thinking mode.
messages: [{
role: "user",
content: [
{
type: "video",
video: [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
]
},
{
type: "text",
text: "Describe the process shown in this video"
}
]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Tina", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}# ======= Important note =======
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-omni-plus",
"messages": [
{
"role": "user",
"content": [
{
"type": "video",
"video": [
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
"https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
]
},
{
"type": "text",
"text": "Describe the process shown in this video"
}
]
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"modalities": ["text", "audio"],
"audio": {
"voice": "Tina",
"format": "wav"
}
}'Audio and text input
Number of files:
Qwen3.5-Omni series: Up to 2048 files using public URLs; up to 250 files using Base64 encoding.
Qwen3-Omni-Flash and Qwen-Omni-Turbo series: Only one file allowed.
File size:
Qwen3.5-Omni: Up to 2 GB, up to 3 hours duration
Qwen3-Omni-Flash: Up to 100 MB, up to 20 minutes duration
Qwen-Omni-Turbo: Up to 10 MB, up to 3 minutes duration
File formats: AMR, WAV, 3GP, 3GPP, AAC, MP3, etc.
These examples use a public audio URL. To use a local file, see Send local files with Base64 encoding. Streaming output is required.
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3.5-omni-plus",# For Qwen3-Omni-Flash, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
"format": "wav",
},
},
{"type": "text", "text": "What is this audio about"},
],
},
],
# Set the output modality. Two options are currently supported: ["text","audio"] and ["text"]
modalities=["text", "audio"],
audio={"voice": "Tina", "format": "wav"},
# stream must be set to True, otherwise an error occurs.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3.5-omni-plus", // For Qwen3-Omni-Flash, run in non-thinking mode.
messages: [
{
"role": "user",
"content": [{
"type": "input_audio",
"input_audio": { "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav", "format": "wav" },
},
{ "type": "text", "text": "What is this audio about" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Tina", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}# ======= Important note =======
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-omni-plus",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
"format": "wav"
}
},
{
"type": "text",
"text": "What is this audio about"
}
]
}
],
"stream":true,
"stream_options":{
"include_usage":true
},
"modalities":["text","audio"],
"audio":{"voice":"Tina","format":"wav"}
}'
Image and text input
Qwen-Omni supports multiple images per request. Image requirements:
Number of images:
Public URL: Up to 2048 images
Base64 encoding: Up to 250 images
Image size:
Qwen3.5 series: Each image file must be ≤20 MB
Qwen3-Omni-Flash and Qwen-Omni-Turbo series: Each image file must be ≤10 MB
Both width and height must exceed 10 pixels. Aspect ratio must not exceed 200:1 or 1:200.
Supported image types: See Image and video understanding.
These examples use a public image URL. To use a local file, see Send local files with Base64 encoding. Streaming output is required.
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3.5-omni-plus", # For Qwen3-Omni-Flash, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
},
},
{"type": "text", "text": "What scene is depicted in the image?"},
],
},
],
# Set the output modality. Two options are currently supported: ["text","audio"] and ["text"]
modalities=["text", "audio"],
audio={"voice": "Tina", "format": "wav"},
# stream must be set to True, otherwise an error occurs.
stream=True,
stream_options={
"include_usage": True
}
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3.5-omni-plus", // For Qwen3-Omni-Flash, run in non-thinking mode.
messages: [
{
"role": "user",
"content": [{
"type": "image_url",
"image_url": { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg" },
},
{ "type": "text", "text": "What scene is depicted in the image?" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Tina", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}# ======= Important note =======
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-omni-plus",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
}
},
{
"type": "text",
"text": "What scene is depicted in the image?"
}
]
}
],
"stream":true,
"stream_options":{
"include_usage":true
},
"modalities":["text","audio"],
"audio":{"voice":"Tina","format":"wav"}
}'
Web search
The Qwen3.5-Omni series supports web search to retrieve real-time information and perform reasoning.
Enable web search using the enable_search parameter and set search_strategy to agent.
OpenAI compatible
# Before you run this code:
# pip install openai
import os
from openai import OpenAI
# Initialize the client
client = OpenAI(
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# Make the request (enable web search)
try:
completion = client.chat.completions.create(
model="qwen3.5-omni-plus",
messages=[{
"role": "user",
"content": "Please look up today's date and day of the week, and tell me what major holidays fall on this date."
}],
stream=True,
stream_options={"include_usage": True},
# Enable web search
extra_body={
"enable_search": True,
"search_options": {
# Web search strategy. Only "agent" is supported.
"search_strategy": "agent"
}
}
)
print("Model response (includes real-time information):")
for chunk in completion:
if chunk.choices and chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
print()
except Exception as e:
print(f"Request failed: {e}")
// Before you run this code:
// npm install openai
import OpenAI from "openai";
// Initialize the client
const openai = new OpenAI({
// API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
});
// Make the request (enable web search)
const completion = await openai.chat.completions.create({
model: "qwen3.5-omni-plus",
messages: [{
"role": "user",
"content": "Please look up today's date and day of the week, and tell me what major holidays fall on this date."
}],
stream: true,
stream_options: {
include_usage: true
},
// Enable web search
extra_body: {
enable_search: true,
search_options: {
// Web search strategy. Only "agent" is supported.
search_strategy: "agent"
}
}
});
console.log("Model response (includes real-time information):");
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
if (chunk.choices[0].delta.content) {
process.stdout.write(chunk.choices[0].delta.content);
}
}
}
console.log();
# ======= Important note =======
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-omni-plus",
"messages": [
{
"role": "user",
"content": "Please look up today's date and day of the week, and tell me what major holidays fall on this date."
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"enable_search": true,
"search_options": {
"search_strategy": "agent"
}
}'Notes
Web search is supported only in the Qwen3.5-Omni series. The
search_strategyparameter only acceptsagent.See Billing details for billing rules related to the
agentstrategy.
Enable/disable thinking mode
Only the Qwen3-Omni-Flash model supports hybrid thinking. Control thinking mode using the enable_thinking parameter:
truefalse(default)
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-omni-flash",
messages=[{"role": "user", "content": "Who are you?"}],
# Enable or disable thinking mode. Audio output is not supported in thinking mode. Qwen-Omni-Turbo does not support enable_thinking.
extra_body={'enable_thinking': True},
# Set the output modality. Two options are supported in non-thinking mode: ["text","audio"] and ["text"]. Only ["text"] is supported in thinking mode.
modalities=["text"],
# Set the voice. The audio parameter is not supported in thinking mode.
# audio={"voice": "Tina", "format": "wav"},
# stream must be set to True, otherwise an error occurs.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3-omni-flash",
messages: [
{ role: "user", content: "Who are you?" }
],
// stream must be set to True, otherwise an error occurs.
stream: true,
stream_options: {
include_usage: true
},
// Enable or disable thinking mode. Audio output is not supported in thinking mode. Qwen-Omni-Turbo does not support enable_thinking.
extra_body:{'enable_thinking': true},
// Set the output modality. Two options are supported in non-thinking mode: ["text","audio"] and ["text"]. Only ["text"] is supported in thinking mode.
modalities: ["text"],
// Set the voice. The audio parameter is not supported in thinking mode.
//audio: { voice: "Tina", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}# ======= Important note =======
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-omni-flash",
"messages": [
{
"role": "user",
"content": "Who are you?"
}
],
"stream":true,
"stream_options":{
"include_usage":true
},
"modalities":["text"],
"enable_thinking": true
}'
Multi-turn conversation
When using Qwen-Omni for multi-turn conversations, note the following:
Assistant Message
Assistant messages added to the messages array can contain only text data.
User Message
A user message can contain text and one other modality. In multi-turn conversations, you can input different modalities in different user messages.
OpenAI compatible
import os
from openai import OpenAI
client = OpenAI(
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3.5-omni-plus", # For Qwen3-Omni-Flash, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3",
"format": "mp3",
},
},
{"type": "text", "text": "What is this audio about"},
],
},
{
"role": "assistant",
"content": [{"type": "text", "text": "This audio says: Welcome to Alibaba Cloud"}],
},
{
"role": "user",
"content": [{"type": "text", "text": "Tell me about this company."}],
},
],
# Set the output modality. Two options are currently supported: ["text","audio"] and ["text"]
modalities=["text"],
# stream must be set to True, otherwise an error occurs.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3.5-omni-plus", // For Qwen3-Omni-Flash, run in non-thinking mode.
messages: [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3",
"format": "mp3",
},
},
{ "type": "text", "text": "What is this audio about" },
],
},
{
"role": "assistant",
"content": [{ "type": "text", "text": "This audio says: Welcome to Alibaba Cloud" }],
},
{
"role": "user",
"content": [{ "type": "text", "text": "Tell me about this company." }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text"]
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}# ======= Important note =======
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-omni-plus",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
},
{
"type": "text",
"text": "What is this audio about"
}
]
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "This audio says: Welcome to Alibaba Cloud"
}
]
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "Tell me about this company."
}
]
}
],
"stream": true,
"stream_options": {
"include_usage": true
},
"modalities": ["text"]
}'Parsing output Base64-encoded audio data
Method 1: Decode after completion
Qwen-Omni outputs audio as streaming Base64-encoded data. Maintain a string variable during generation and append the Base64-encoded data from each chunk. After generation completes, Base64-decode the complete string to get the audio file. Alternatively, decode and play each chunk in real time.
# Installation instructions for pyaudio:
# APPLE Mac OS X
# brew install portaudio
# pip install pyaudio
# Debian/Ubuntu
# sudo apt-get install python-pyaudio python3-pyaudio
# or
# pip install pyaudio
# CentOS
# sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
# python -m pip install pyaudio
import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf
client = OpenAI(
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3.5-omni-plus", # For Qwen3-Omni-Flash, run in non-thinking mode.
messages=[{"role": "user", "content": "Who are you?"}],
# Set the output modality. Two options are currently supported: ["text","audio"] and ["text"]
modalities=["text", "audio"],
audio={"voice": "Tina", "format": "wav"},
# stream must be set to True, otherwise an error occurs.
stream=True,
stream_options={"include_usage": True},
)
# Method 1: Decode after the generation is complete
audio_string = ""
for chunk in completion:
if chunk.choices:
if hasattr(chunk.choices[0].delta, "audio"):
try:
audio_string += chunk.choices[0].delta.audio["data"]
except Exception as e:
print(chunk.choices[0].delta.content)
else:
print(chunk.usage)
wav_bytes = base64.b64decode(audio_string)
audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
sf.write("audio_assistant_py.wav", audio_np, samplerate=24000)
# Method 2: Decode while generating (comment out the code for Method 1 to use Method 2)
# # Initialize PyAudio
# import pyaudio
# import time
# p = pyaudio.PyAudio()
# # Create an audio stream
# stream = p.open(format=pyaudio.paInt16,
# channels=1,
# rate=24000,
# output=True)
# for chunk in completion:
# if chunk.choices:
# if hasattr(chunk.choices[0].delta, "audio"):
# try:
# audio_string = chunk.choices[0].delta.audio["data"]
# wav_bytes = base64.b64decode(audio_string)
# audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
# # Play the audio data directly
# stream.write(audio_np.tobytes())
# except Exception as e:
# print(chunk.choices[0].delta.content)
# time.sleep(0.8)
# # Clean up resources
# stream.stop_stream()
# stream.close()
# p.terminate()// Before running:
// 1. Ensure Node.js version >= 14 is installed.
// 2. Run: npm install openai wav
//
// To use the real-time playback feature (Method 2), you also need:
// Windows:
// npm install speaker
// Mac:
// brew install portaudio
// npm install speaker
// Linux (Ubuntu/Debian):
// sudo apt-get install libasound2-dev
// npm install speaker
import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3.5-omni-plus", // For Qwen3-Omni-Flash, run in non-thinking mode.
messages: [
{
"role": "user",
"content": "Who are you?"
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Tina", format: "wav" }
});
// Method 1: Decode after the generation is complete
// Requires installation: npm install wav
import { createWriteStream } from 'node:fs'; // node:fs is a built-in Node.js module, no installation required
import { Writer } from 'wav';
async function convertAudio(audioString, audioPath) {
try {
// Decode the Base64 string into a Buffer
const wavBuffer = Buffer.from(audioString, 'base64');
// Create a WAV file write stream
const writer = new Writer({
sampleRate: 24000, // Sample rate
channels: 1, // Mono
bitDepth: 16 // 16-bit depth
});
// Create an output file stream and establish a pipe connection
const outputStream = createWriteStream(audioPath);
writer.pipe(outputStream);
// Write PCM data and end writing
writer.write(wavBuffer);
writer.end();
// Wait for the file to finish writing
await new Promise((resolve, reject) => {
outputStream.on('finish', resolve);
outputStream.on('error', reject);
});
// Add extra wait time to ensure audio integrity
await new Promise(resolve => setTimeout(resolve, 800));
console.log(`Audio file successfully saved as ${audioPath}`);
} catch (error) {
console.error('An error occurred during processing:', error);
}
}
let audioString = "";
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
if (chunk.choices[0].delta.audio) {
if (chunk.choices[0].delta.audio["data"]) {
audioString += chunk.choices[0].delta.audio["data"];
}
}
} else {
console.log(chunk.usage);
}
}
// Execute the conversion
convertAudio(audioString, "audio_assistant_mjs.wav");
// Method 2: Generate and play in real time
// Install necessary components according to your system's instructions above.
// import Speaker from 'speaker'; // Import the audio playback library
// // Create a speaker instance (configuration matches WAV file parameters)
// const speaker = new Speaker({
// sampleRate: 24000, // Sample rate
// channels: 1, // Number of sound channels
// bitDepth: 16, // Bit depth
// signed: true // Signed PCM
// });
// for await (const chunk of completion) {
// if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
// if (chunk.choices[0].delta.audio) {
// if (chunk.choices[0].delta.audio["data"]) {
// const pcmBuffer = Buffer.from(chunk.choices[0].delta.audio.data, 'base64');
// // Write directly to the speaker for playback
// speaker.write(pcmBuffer);
// }
// }
// } else {
// console.log(chunk.usage);
// }
// }
// speaker.on('finish', () => console.log('Playback complete'));
// speaker.end(); // Call based on the actual end of the API streamSend local files with Base64 encoding
Images
This example uses the locally saved file eagle.png.
import os
from openai import OpenAI
import base64
client = OpenAI(
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# Base64 encoding format
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
base64_image = encode_image("eagle.png")
completion = client.chat.completions.create(
model="qwen3.5-omni-plus", # For Qwen3-Omni-Flash, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {"url": f"data:image/png;base64,{base64_image}"},
},
{"type": "text", "text": "What scene is depicted in the image?"},
],
},
],
# Set the output modality. Two options are currently supported: ["text","audio"] and ["text"]
modalities=["text", "audio"],
audio={"voice": "Tina", "format": "wav"},
# stream must be set to True, otherwise an error occurs.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeImage = (imagePath) => {
const imageFile = readFileSync(imagePath);
return imageFile.toString('base64');
};
const base64Image = encodeImage("eagle.png")
const completion = await openai.chat.completions.create({
model: "qwen3.5-omni-plus",// For Qwen3-Omni-Flash, run in non-thinking mode.
messages: [
{
"role": "user",
"content": [{
"type": "image_url",
"image_url": { "url": `data:image/png;base64,${base64Image}` },
},
{ "type": "text", "text": "What scene is depicted in the image?" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Tina", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}Audio
This example uses the locally saved file welcome.mp3.
import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf
import requests
client = OpenAI(
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
def encode_audio(audio_path):
with open(audio_path, "rb") as audio_file:
return base64.b64encode(audio_file.read()).decode("utf-8")
base64_audio = encode_audio("welcome.mp3")
completion = client.chat.completions.create(
model="qwen3.5-omni-plus", # For Qwen3-Omni-Flash, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": f"data:;base64,{base64_audio}",
"format": "mp3",
},
},
{"type": "text", "text": "What is this audio about"},
],
},
],
# Set the output modality. Two options are currently supported: ["text","audio"] and ["text"]
modalities=["text", "audio"],
audio={"voice": "Tina", "format": "wav"},
# stream must be set to True, otherwise an error occurs.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeAudio = (audioPath) => {
const audioFile = readFileSync(audioPath);
return audioFile.toString('base64');
};
const base64Audio = encodeAudio("welcome.mp3")
const completion = await openai.chat.completions.create({
model: "qwen3.5-omni-plus", // For Qwen3-Omni-Flash, run in non-thinking mode.
messages: [
{
"role": "user",
"content": [{
"type": "input_audio",
"input_audio": { "data": `data:;base64,${base64Audio}`, "format": "mp3" },
},
{ "type": "text", "text": "What is this audio about" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Tina", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}Video
Video file
This example uses the locally saved file spring_mountain.mp4.
import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf
client = OpenAI(
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# Base64 encoding format
def encode_video(video_path):
with open(video_path, "rb") as video_file:
return base64.b64encode(video_file.read()).decode("utf-8")
base64_video = encode_video("spring_mountain.mp4")
completion = client.chat.completions.create(
model="qwen3.5-omni-plus", # For Qwen3-Omni-Flash, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "video_url",
"video_url": {"url": f"data:;base64,{base64_video}"},
},
{"type": "text", "text": "What is she singing?"},
],
},
],
# Set the output modality. Two options are currently supported: ["text","audio"] and ["text"]
modalities=["text", "audio"],
audio={"voice": "Tina", "format": "wav"},
# stream must be set to True, otherwise an error occurs.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeVideo = (videoPath) => {
const videoFile = readFileSync(videoPath);
return videoFile.toString('base64');
};
const base64Video = encodeVideo("spring_mountain.mp4")
const completion = await openai.chat.completions.create({
model: "qwen3.5-omni-plus", // For Qwen3-Omni-Flash, run in non-thinking mode.
messages: [
{
"role": "user",
"content": [{
"type": "video_url",
"video_url": { "url": `data:;base64,${base64Video}` },
},
{ "type": "text", "text": "What is she singing?" }]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Tina", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
}Image list
For example, consider the locally saved files football1.jpg, football2.jpg, football3.jpg, and football4.jpg.
import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf
client = OpenAI(
# API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace it with: https://dashscope-intl.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
# Base64 encoding format
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
base64_image_1 = encode_image("football1.jpg")
base64_image_2 = encode_image("football2.jpg")
base64_image_3 = encode_image("football3.jpg")
base64_image_4 = encode_image("football4.jpg")
completion = client.chat.completions.create(
model="qwen3.5-omni-plus", # For Qwen3-Omni-Flash, run in non-thinking mode.
messages=[
{
"role": "user",
"content": [
{
"type": "video",
"video": [
f"data:image/jpeg;base64,{base64_image_1}",
f"data:image/jpeg;base64,{base64_image_2}",
f"data:image/jpeg;base64,{base64_image_3}",
f"data:image/jpeg;base64,{base64_image_4}",
],
},
{"type": "text", "text": "Describe the process shown in this video"},
],
}
],
# Set the output modality. Two options are currently supported: ["text","audio"] and ["text"]
modalities=["text", "audio"],
audio={"voice": "Tina", "format": "wav"},
# stream must be set to True, otherwise an error occurs.
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
if chunk.choices:
print(chunk.choices[0].delta)
else:
print(chunk.usage)import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// The API key differs between the Singapore and Beijing regions. To obtain an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeImage = (imagePath) => {
const imageFile = readFileSync(imagePath);
return imageFile.toString('base64');
};
const base64Image1 = encodeImage("football1.jpg")
const base64Image2 = encodeImage("football2.jpg")
const base64Image3 = encodeImage("football3.jpg")
const base64Image4 = encodeImage("football4.jpg")
const completion = await openai.chat.completions.create({
model: "qwen3.5-omni-plus", // When the model is Qwen3-Omni-Flash, run it in non-thinking mode.
messages: [{
role: "user",
content: [
{
type: "video",
video: [
`data:image/jpeg;base64,${base64Image1}`,
`data:image/jpeg;base64,${base64Image2}`,
`data:image/jpeg;base64,${base64Image3}`,
`data:image/jpeg;base64,${base64Image4}`
]
},
{
type: "text",
text: "Describe the specific procedure of this video."
}
]
}],
stream: true,
stream_options: {
include_usage: true
},
modalities: ["text", "audio"],
audio: { voice: "Tina", format: "wav" }
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta);
} else {
console.log(chunk.usage);
}
API reference
For input and output parameters, see the Qwen API reference.
Billing and rate limits
Billing rules
Qwen-Omni is billed based on tokens consumed across modalities (audio, image, and video). For pricing, see Models.
Free quota
To claim, query, and use your free quota, see Free quota for new users.
Rate limits
For rate limit rules and FAQ, see Rate limiting.
Error codes
If the model call fails and returns an error message, see Error messages for resolution.
Voice list
See Voice list.