全部產品
Search
文件中心

Alibaba Cloud Model Studio:Qwen-Omni

更新時間:Mar 31, 2026

Qwen-Omni 模型能夠接收文本與單一其他模態(圖片、音頻、視頻)的組合輸入,並產生文本或語音形式的回複, 提供多種擬人音色,支援多語言和方言的語音輸出,可應用於內容審核、文本創作、視覺識別、音視頻互動助手等情境。

支援的地區:新加坡、北京,需使用各地區的 API Key

快速開始

前提條件

以下樣本將一段文本發送至 Qwen-Omni的API介面,併流式返迴文本和音訊回複。

import os
import base64
import soundfile as sf
import numpy as np
from openai import OpenAI

# 1. 初始化用戶端
client = OpenAI(
    api_key=os.getenv("DASHSCOPE_API_KEY"),  # 確認已配置環境變數
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

# 2. 發起請求
try:
    completion = client.chat.completions.create(
        model="qwen3.5-omni-plus",
        messages=[{"role": "user", "content": "你是誰"}],
        modalities=["text", "audio"],  # 指定輸出文本和音頻
        audio={"voice": "Tina", "format": "wav"},
        stream=True,  # 必須設定為 True
        stream_options={"include_usage": True},
    )

    # 3. 處理流式響應並解碼音頻
    print("模型回複:")
    audio_base64_string = ""
    for chunk in completion:
        # 處理文本部分
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="")

        # 收集音頻部分
        if chunk.choices and hasattr(chunk.choices[0].delta, "audio") and chunk.choices[0].delta.audio:
            audio_base64_string += chunk.choices[0].delta.audio.get("data", "")

    # 4. 儲存音頻檔案
    if audio_base64_string:
        wav_bytes = base64.b64decode(audio_base64_string)
        audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
        sf.write("audio_assistant.wav", audio_np, samplerate=24000)
        print("\n音頻檔案已儲存至:audio_assistant.wav")

except Exception as e:
    print(f"請求失敗: {e}")
// 運行前的準備工作:
// Windows/Mac/Linux 通用:
// 1. 確保已安裝 Node.js (建議版本 >= 14)
// 2. 運行以下命令安裝必要的依賴:
//    npm install openai wav

import OpenAI from "openai";
import { createWriteStream } from 'node:fs';
import { Writer } from 'wav';

// 定義音頻轉換函式:將Base64字串轉換並儲存為標準的 WAV 音頻檔案
async function convertAudio(audioString, audioPath) {
    try {
        // 解碼Base64字串為Buffer
        const wavBuffer = Buffer.from(audioString, 'base64');
        // 建立WAV檔案寫入流
        const writer = new Writer({
            sampleRate: 24000,  // 採樣率
            channels: 1,        // 單聲道
            bitDepth: 16        // 16位元深度
        });
        // 建立輸出檔案流並建立管道串連
        const outputStream = createWriteStream(audioPath);
        writer.pipe(outputStream);

        // 寫入PCM資料並結束寫入
        writer.write(wavBuffer);
        writer.end();

        // 使用Promise等待檔案寫入完成
        await new Promise((resolve, reject) => {
            outputStream.on('finish', resolve);
            outputStream.on('error', reject);
        });

        // 添加額外等待時間確保音頻完整
        await new Promise(resolve => setTimeout(resolve, 800));

        console.log(`\n音頻檔案已成功儲存為 ${audioPath}`);
    } catch (error) {
        console.error('處理過程中發生錯誤:', error);
    }
}

//  1. 初始化用戶端
const openai = new OpenAI(
    {
        // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
// 2. 發起請求
const completion = await openai.chat.completions.create({
    model: "qwen3.5-omni-plus",  
    messages: [
        {
            "role": "user",
            "content": "你是誰?"
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Tina", format: "wav" }
});

let audioString = "";
console.log("大模型的回複:")

// 3. 處理流式響應並解碼音頻
for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        // 處理常值內容
        if (chunk.choices[0].delta.content) {
            process.stdout.write(chunk.choices[0].delta.content);
        }
        // 處理音頻內容
        if (chunk.choices[0].delta.audio) {
            if (chunk.choices[0].delta.audio["data"]) {
                audioString += chunk.choices[0].delta.audio["data"];
            }
        }
    }
}
// 4. 儲存音頻檔案
convertAudio(audioString, "audio_assistant.wav");
# ======= 重要提示 =======
# 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === 執行時請刪除該注釋 ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3.5-omni-plus",
    "messages": [
        {
            "role": "user", 
            "content": "你是誰?"
        }
    ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    },
    "modalities":["text","audio"],
    "audio":{"voice":"Tina","format":"wav"}
}'

返回結果

運行PythonNode.js代碼後,將在控制台看到模型的文本回複,並在代碼檔案目錄下找到一個名為audio_assistant.wav 的音頻檔案。

大模型的回複:
我是阿里雲研發的大規模語言模型,我叫千問。有什麼我可以協助你的嗎?

運行HTTP代碼將直接返迴文本和Base64編碼(audio欄位)的音頻資料。

data: {"choices":[{"delta":{"content":"我"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757647879,"system_fingerprint":null,"model":"qwen3.5-omni-plus","id":"chatcmpl-a68eca3b-c67e-4666-a72f-73c0b4919860"}
data: {"choices":[{"delta":{"content":"是"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757647879,"system_fingerprint":null,"model":"qwen3.5-omni-plus","id":"chatcmpl-a68eca3b-c67e-4666-a72f-73c0b4919860"}
......
data: {"choices":[{"delta":{"audio":{"data":"/v8AAAAAAAAAAAAAAA...","expires_at":1757647879,"id":"audio_a68eca3b-c67e-4666-a72f-73c0b4919860"}},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757647879,"system_fingerprint":null,"model":"qwen3.5-omni-plus","id":"chatcmpl-a68eca3b-c67e-4666-a72f-73c0b4919860"}
data: {"choices":[{"finish_reason":"stop","delta":{"content":""},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1764763585,"system_fingerprint":null,"model":"qwen3.5-omni-plus","id":"chatcmpl-e8c82e9e-073e-4289-a786-a20eb444ac9c"}
data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":207,"completion_tokens":103,"total_tokens":310,"completion_tokens_details":{"audio_tokens":83,"text_tokens":20},"prompt_tokens_details":{"text_tokens":207}},"created":1757940330,"system_fingerprint":null,"model":"qwen3.5-omni-plus","id":"chatcmpl-9cdd5a26-f9e9-4eff-9dcc-93a878165afc"}

模型選型

  • Qwen3.5-Omni 系列:適用於長視頻分析、會議紀要、字幕產生、內容審核、音視頻互動等情境。

    • 輸入限制:3 小時音頻或 1 小時視頻

    • 音頻控制:支援通過指令調節音量、語速、情緒

    • 視覺能力:與 Qwen3.5 同等水平,可理解畫面、語音、音效等多模態資訊

  • Qwen3-Omni-Flash系列:適用於短視頻分析、成本敏感情境。

    • 輸入限制:150 秒以內音視頻

    • 思考模式:Qwen-Omni 系列中唯一支援思考模式的系列

  • Qwen-Omni-Turbo系列

    已停止更新,功能受限,建議遷移至 Qwen3.5-Omni 系列或 Qwen3-Omni-Flash 系列。

模型系列

音視頻描述能力

深度思考

輸入語言種類

輸出音頻語言種類

音色數量

Qwen3.5-Omni

最新一代全模態模型

不支援

113 種

含92 種語言、21 種方言

語言:中文、英語、德語、法語、意大利語、捷克語、印尼語、泰語、韓語、波蘭語、日語、越南語、芬蘭語、葡萄牙語、西班牙語、荷蘭語、俄語、馬來語、加泰羅尼亞語、瑞典語、土耳其語、烏克蘭語、羅馬尼亞語、斯洛伐克語、丹麥語、冰島語、挪威語(博克馬爾)、馬其頓語、希臘語、匈牙利語、加利西亞語、菲律賓語、克羅地亞語、波斯尼亞語、斯洛文尼亞語、保加利亞語、哈薩克語、白俄羅斯語、拉脫維亞語、愛沙尼亞語、阿塞拜疆語、維吾爾語、斯瓦希裡語、印地語、世界語、柯爾克孜語、塔吉克語、宿務語、南非語、阿拉伯語、立陶宛語、爪哇語、孟加拉語、波斯語、希伯來語、旁遮普語、古吉拉特語、蒙古語、阿斯圖里亞斯語、卡納達語、馬拉地語、國際語、馬拉雅拉姆語、馬爾他語、新挪威語、泰盧固語、烏爾都語、格魯吉亞語、巴斯克語、泰米爾語、奧裡亞語、塞爾維亞語、毛利語

方言:
東北話、貴州話、粵語、河南話、香港粵語、上海話、陝西話、天津話、台灣話、雲南話、安徽話、福建話、甘肅話、廣東話、湖北話、湖南話、江西話、山東話、山西話、四川話、廣西話、海南話、重慶話、長沙話、杭州話、合肥話、銀川話、鄭州話、瀋陽話、溫州話、武漢話、昆明話、太原話、南昌話、濟南話、蘭州話、南京話、客家話、閩南語



36 種

含 29 種語言、7 種方言

語言:
中文、英語、德語、意大利語、葡萄牙語、西班牙語、日語、韓語、法語、俄語、泰語、印尼語、阿拉伯語、越南語、土耳其語、芬蘭語、波蘭語、印地語、荷蘭語、捷克語、烏爾都語、他加祿語、瑞典語、丹麥語、希伯來語、冰島語、馬來語、挪威語、波斯語

方言:
四川話、北京話、天津話、南京話、陝西話、粵語、閩南語

55 種

Qwen3-Omni-Flash

混合思考模型

較弱

支援

19 種

含 10 種語言、9 種方言

語言:

中文、英語、法語、德語、俄語、意大利語、西班牙語、葡萄牙語、日語、韓語

方言:

四川話、上海話、粵語、閩南語、陝西話、南京話、天津話、北京話

19 種

含 10 種語言、9 種方言

語言:

中文、英語、法語、德語、俄語、意大利語、西班牙語、葡萄牙語、日語、韓語

方言:

四川話、上海話、粵語、閩南語、陝西話、南京話、天津話、北京話

17~49 種

因版本而異

Qwen-Omni-Turbo

已停止更新

不支援

中文、英語

中文、英語

4 種

模型的名稱、上下文、價格、快照版本等資訊請參見模型列表;並發限流條件請參考限流

模型效果

音視頻內容分析

請你對這個視頻按時間戳記產生全面的描述。

00:00.000 – 00:02.500

雨夜中的城市街道填滿寬銀幕畫面。長時間曝光下,紅藍車燈拖出光軌滑過,在濕漉漉的路面上 閃爍。一位身穿深色及膝風衣、內搭淺色襯衫領帶的孤獨男子,沿著右側行人路向鏡頭走來。雨滴凝結在他的肩膀和頭髮上;每一步都在潮濕的混凝土上發出沉悶的啪嗒聲。低沉不祥的電子嗡鳴支撐著環境音,同時穩定的雨聲四處劈啪作響。他身旁的磚牆上貼滿彩色塗鴉,街區遠處幾塊霓虹燈招牌發光——一塊清晰顯示橙紅色草書體 "Nice",另一塊豎置招牌的末端露出字母 "AT"。

......

00:12.300 – 00:14.533

鏡頭向下傾斜。在手電筒光圈內,一張 泛黃老化的單頁紙躺在淺水窪中,邊緣燒焦且邊角捲曲。墨跡段落覆蓋大部分表面,但潮濕和焦痕使文字無法辨認。男子跪下,戴手套的手指伸向檔案;手靠近時水波向外蕩漾。警笛達到峰值,隨後開始略微消退,而 環境嗡鳴保持穩定。畫面在顫抖的紙頁即將被觸碰前 定格,片段結束。

僅展示部分時間戳記內容。

多方言理解

給出不同方言歌手的聲音特徵、演唱時的情緒狀態、以及對應的完整歌詞

這段音頻由多位來自中國不同地區的歌手用各自的方言進行饒舌,展現了中國方言的豐富多樣性。以下是每位歌手的聲音特徵、情緒狀態及對應歌詞的詳細描述:

杭州話(吳語)

聲音特徵:

這位男歌手的聲音較為柔和,帶有吳語特有的軟糯和溫婉。他的語調起伏平緩,節奏輕快,給人一種娓娓道來的感覺。

情緒狀態:

悠閑、愜意且充滿生活氣息。他描繪了杭州的慢生活和城市變遷,情緒中帶著對家鄉的眷戀和對現代發展的感慨,整體感覺舒適而親切。

對應歌詞:

> 杭州話,聽聽看,我們格文化,包括家門口的兒化音

> 像花開在西湖邊,不搞文化調查,只有我們搞得懂

> 你們都不懂,三不搭門,格兒,老兒,粉兒,伢兒,這外人分得清

> 出發,變化快,地鐵交通發達,據說要去看看

> 包容新馬泰,看十三太,為城市驕傲,性格直給

> 感情你只為過橋,而且味道不一樣,你不來嘞

僅展示部分效果。

歌詞字幕產生

請你識別這首歌的歌詞,並分句給出歌詞的時間戳記,輸出格式如下:[00:00:15,020 --> 00:00:28,085] :When you walk through a storm, hold your head up high.[00:00:28,085 --> 00:00:40,200] And don't be afraid of the dark. ......

[00:00:12,680 --> 00:00:16,960] 貓線繞過樹搖晃的月光

[00:00:18,400 --> 00:00:22,800] 暖氣片哼著九八年排行

[00:00:24,160 --> 00:00:28,080] 時間撥開雲霧般的熱浪

[00:00:28,920 --> 00:00:33,000] 螢幕裡的霓虹曬在鼻樑

......

[00:03:16,720 --> 00:03:21,680] 我們窩在年輪最柔軟一牆

[00:03:22,400 --> 00:03:27,000] 呼吸被餘溫釀成蜂蜜糖

[00:03:28,160 --> 00:03:33,200] 沙發陷落成雲絮的形狀

[00:03:34,000 --> 00:03:38,800] 每個毛孔都曬著晴朗

[00:04:09,000 --> 00:04:10,020] (End)

僅展示部分效果。

音視頻編程

使用方式

流式輸出

所有對 Qwen-Omni 模型的請求都必須設定 stream=True

模型配置

根據使用情境,為模型配置合適的參數、提示詞、音視頻長度,可在成本、速度和效果之間找到平衡。

音視頻理解

使用情境

推薦視頻長度

Prompt 建議

max_pixels 推薦參數值

快速審核,成本低

≤60分鐘

50 個詞以內的簡單 Prompt

230,400

內容提取(長視頻分段)

≤60分鐘

921,600~2,073,600

標準分析(短視頻打標)

≤4分鐘

使用下方的結構化 Prompt

建議Prompt

Provide a detailed description of the video.
It should explicitly include three sections: 
1. A structured chronological storyline of **every noticeable audio and visual details**
2. A structured list of all visible text. For each text element, include start timestamp, end timestamp, the exact text content, the appearance characteristics. If no text appears, explicitly state so.
3. A structured speech-to-text transcription, include speaker(Corresponding to the character or voice‑over in Section 1, including their accent and tone), exact spoken content, start timestamp, end timestamp, and speaking state (prosody, emotion, and style). If no speech appears, explicitly state so.
Aside from these three required sections, you are free to organize any additional content in any way you find helpful. This additional content can include global information about the entire video or localized information about specific moments. You may choose the topic of this extra content freely.
Output Format:
```
## Storyline
<xx:xx.xxx> - <xx:xx.xxx>
<an unstructured long paragraph in natural language describing what happened during this period, blending both audio and video details.>
<xx:xx.xxx> - <xx:xx.xxx>
<an unstructured long paragraph in natural language describing what happened during this period, blending both audio and video details.>
<xx:xx.xxx> - <xx:xx.xxx>
<an unstructured long paragraph in natural language describing what happened during this period, blending both audio and video details.>
...
## Visible Text
<xx:xx.xxx> - <xx:xx.xxx>
“<element>”: <appearance>
“<element>”: <appearance>
<xx:xx.xxx> - <xx:xx.xxx>
“<element>”: <appearance>
“<element>”: <appearance>
“<element>”: <appearance>
<xx:xx.xxx> - <xx:xx.xxx>
“<element>”: <appearance>
...
## Speakers and Transcript
Speaker profiles:
<speaker> - <profile>
<speaker> - <profile>
<speaker> - <profile>
...
<xx:xx.xxx> - <xx:xx.xxx>
Speaker: <speaker>
State: <description>
Content: “<content>”
<xx:xx.xxx> - <xx:xx.xxx>
Speaker: <speaker>
State: <description>
Content: “<content>”
<xx:xx.xxx> - <xx:xx.xxx>
Speaker: <speaker>
State: <description>
Content: “<content>”
...
## <another section>
<paragraphs>
## <another section>
<paragraphs>
...
```

921,600~2,073,600

精細分析(多說話人/複雜情境)

≤2分鐘

2,073,600

說明

長視頻如需獲得細粒度的描述,建議分段處理。

音頻理解

通過控制音頻長度和 Prompt 複雜度來平衡成本與效果。

使用情境

推薦音頻長度

Prompt 建議

快速審核、低成本

≤60分鐘

50 個詞以內的簡單 Prompt

內容提取(長音頻分段)

≤60分鐘

標準分析(音頻打標)

≤2分鐘

使用結構化 Prompt

結構化Prompt

Provide a detailed description of the audio.

It should explicitly include two sections: 

1. A structured chronological storyline of **every noticeable audio details**
2. A structured speech-to-text transcription, include speaker(Corresponding to the character or voice‑over in Section 1, including their accent and tone), exact spoken content, start timestamp, end timestamp, and speaking state (prosody, emotion, and style). If no speech appears, explicitly state so.

Aside from these two required components, you are free to organize any additional content in any way you find helpful. This additional content can include global information about the entire audio or localized information about specific moments. You may choose the topic of this extra content freely.

Output Format:

```
## Storyline

<xx:xx.xxx> - <xx:xx.xxx>
<an unstructured long paragraph in natural language describing what happened during this period, blending both audio details.>

<xx:xx.xxx> - <xx:xx.xxx>
<an unstructured long paragraph in natural language describing what happened during this period, blending both audio details.>

<xx:xx.xxx> - <xx:xx.xxx>
<an unstructured long paragraph in natural language describing what happened during this period, blending both audio details.>

...

...

## Speakers and Transcript

Speaker profiles:
<speaker> - <profile>
<speaker> - <profile>
<speaker> - <profile>
...

<xx:xx.xxx> - <xx:xx.xxx>
Speaker: <speaker>
State: <description>
Content: “<content>”

<xx:xx.xxx> - <xx:xx.xxx>
Speaker: <speaker>
State: <description>
Content: “<content>”

<xx:xx.xxx> - <xx:xx.xxx>
Speaker: <speaker>
State: <description>
Content: “<content>”

...

## <another section>

<paragraphs>

## <another section>

<paragraphs>

...
```

精細分析(多說話人/複雜情境)

≤1分鐘

說明

長音頻如需獲得細粒度的描述,建議分段處理。

多模態輸入

視頻+文本輸入

視頻的傳入方式可以為圖片列表形式視頻檔案形式(可理解視頻中的音頻)

視頻檔案形式(可理解視頻中的音頻)

  • 檔案數量:

    • Qwen3.5-Omni系列:使用公網URL方式,最多可傳入 512 個;使用Base64編碼方式,最多可傳入 250 個。

    • Qwen3-Omni-Flash系列、Qwen-Omni-Turbo系列:僅支援輸入一個;

  • 檔案大小:

    • Qwen3.5-Omni:限制為 2GB,時間長度限制為 1 小時;

    • Qwen3-Omni-Flash:限制為 256 MB,時間長度限制為 150s;

    • Qwen-Omni-Turbo:限制為 150 MB,時間長度限制為 40s;

  • 檔案格式:MP4、AVI、MKV、MOV、FLV、WMV 等。

  • 視頻檔案中的視覺資訊與音頻資訊會分開計費。

OpenAI 相容

import os
from openai import OpenAI

client = OpenAI(
    # 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3.5-omni-plus", # 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "video_url",
                    "video_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
                    },
                },
                {"type": "text", "text": "視頻的內容是什麼?"},
            ],
        },
    ],
    # 設定輸出資料的模態,當前支援兩種:["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Tina", "format": "wav"},
    # stream 必須設定為 True,否則會報錯
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3.5-omni-plus", // 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "video_url",
                "video_url": { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4" },
            },
            { "type": "text", "text": "視頻的內容是什麼?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Tina", format: "wav" }
});


for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= 重要提示 =======
# 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === 執行時請刪除該注釋 ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3.5-omni-plus",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "video_url",
          "video_url": {
            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241115/cqqkru/1.mp4"
          }
        },
        {
          "type": "text",
          "text": "視頻的內容是什麼"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options": {
        "include_usage": true
    },
    "modalities":["text","audio"],
    "audio":{"voice":"Tina","format":"wav"}
}'

圖片列表形式

圖片數量

  • Qwen3.5-Omni:最少傳入 2 張圖片,最多可傳入 2048 張圖片

  • Qwen3-Omni-Flash:最少傳入 2 張圖片,最多可傳入 128 張圖片

  • Qwen-Omni-Turbo:最少傳入 4 張圖片,最多可傳入 80 張圖片

OpenAI 相容

import os
from openai import OpenAI

client = OpenAI(
    # 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3.5-omni-plus", # 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "video",
                    "video": [
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg",
                    ],
                },
                {"type": "text", "text": "描述這個視頻的具體過程"},
            ],
        }
    ],
    # 設定輸出資料的模態,當前支援兩種:["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Tina", "format": "wav"},
    # stream 必須設定為 True,否則會報錯
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3.5-omni-plus", //模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages: [{
        role: "user",
        content: [
            {
                type: "video",
                video: [
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                    "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
                ]
            },
            {
                type: "text",
                text: "描述這個視頻的具體過程"
            }
        ]
    }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Tina", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= 重要提示 =======
# 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === 執行時請刪除該注釋 ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3.5-omni-plus",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "video",
                    "video": [
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/xzsgiz/football1.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/tdescd/football2.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/zefdja/football3.jpg",
                        "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241108/aedbqh/football4.jpg"
                    ]
                },
                {
                    "type": "text",
                    "text": "描述這個視頻的具體過程"
                }
            ]
        }
    ],
    "stream": true,
    "stream_options": {
        "include_usage": true
    },
    "modalities": ["text", "audio"],
    "audio": {
        "voice": "Tina",
        "format": "wav"
    }
}'

音頻+文本輸入

  • 檔案數量:

    • Qwen3.5-Omni系列:使用公網URL方式,最多可傳入 2048 個;使用Base64編碼方式,最多可傳入 250 個;

    • Qwen3-Omni-Flash系列、Qwen-Omni-Turbo系列:僅支援輸入一個;

  • 檔案大小:

    • Qwen3.5-Omni:不超過 2GB,時間長度最長 3 小時

    • Qwen3-Omni-Flash:不能超過  100MB,時間長度最長 20 分鐘

    • Qwen-Omni-Turbo:不能超過 10MB,時間長度最長 3 分鐘

  • 檔案格式:支援AMR、 WAV、 3GP、 3GPP、 AAC、 MP3等主流格式

以下範例程式碼以傳入音頻公網 URL 為例,傳入本地音頻請參見:輸入 Base 64 編碼的本地檔案。當前只支援以流式輸出的方式進行調用。

OpenAI 相容

import os
from openai import OpenAI

client = OpenAI(
    # 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3.5-omni-plus",# 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
                        "format": "wav",
                    },
                },
                {"type": "text", "text": "這段音頻在說什麼"},
            ],
        },
    ],
    # 設定輸出資料的模態,當前支援兩種:["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Tina", "format": "wav"},
    # stream 必須設定為 True,否則會報錯
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3.5-omni-plus", // 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "input_audio",
                "input_audio": { "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav", "format": "wav" },
            },
            { "type": "text", "text": "這段音頻在說什麼" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Tina", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= 重要提示 =======
# 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === 執行時請刪除該注釋 ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3.5-omni-plus",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_audio",
          "input_audio": {
            "data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20250211/tixcef/cherry.wav",
            "format": "wav"
          }
        },
        {
          "type": "text",
          "text": "這段音頻在說什麼"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    },
    "modalities":["text","audio"],
    "audio":{"voice":"Tina","format":"wav"}
}'

圖片+文本輸入

Qwen-Omni 模型支援傳入多張圖片。對輸入圖片的要求如下:

  • 圖片數量:

    • 公網URL傳入:最多可傳入 2048 張

    • Base 64 編碼:最多可傳入 250 張

  • 映像大小:

    • Qwen3.5系列:單個圖片檔案的大小不超過 20MB

    • Qwen3-Omni-Flash系列、Qwen-Omni-Turbo系列:單個圖片檔案的大小不超過 10MB

  • 圖片的寬度和高度均應大於 10 像素,寬高比不應超過 200:1 或 1:200

  • 支援的圖片類型請參見映像與視頻理解

以下範例程式碼以傳入圖片公網 URL 為例,傳入本地圖片請參見:輸入 Base 64 編碼的本地檔案。當前只支援以流式輸出的方式進行調用。

OpenAI 相容

import os
from openai import OpenAI

client = OpenAI(
    # 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3.5-omni-plus", # 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
                    },
                },
                {"type": "text", "text": "圖中描繪的是什麼景象?"},
            ],
        },
    ],
    # 設定輸出資料的模態,當前支援兩種:["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Tina", "format": "wav"},
    # stream 必須設定為 True,否則會報錯
    stream=True,
    stream_options={
        "include_usage": True
    }
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3.5-omni-plus", // 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "image_url",
                "image_url": { "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg" },
            },
            { "type": "text", "text": "圖中描繪的是什麼景象?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Tina", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= 重要提示 =======
# 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === 執行時請刪除該注釋 ===


curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3.5-omni-plus",
    "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "image_url",
          "image_url": {
            "url": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20241022/emyrja/dog_and_girl.jpeg"
          }
        },
        {
          "type": "text",
          "text": "圖中描繪的是什麼景象?"
        }
      ]
    }
  ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    },
    "modalities":["text","audio"],
    "audio":{"voice":"Tina","format":"wav"}
}'

連網搜尋

Qwen3.5-Omni 系列模型支援連網搜尋功能,可以擷取即時資訊並進行推理分析。

連網搜尋功能需要通過 enable_search 參數開啟,並配置 search_strategyagent。以下樣本展示如何開啟連網搜尋功能查詢即時資訊:

OpenAI 相容

# 運行前的準備工作:
# pip install openai

import os
from openai import OpenAI

# 初始化用戶端
client = OpenAI(
    # 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

# 發起請求(開啟連網搜尋)
try:
    completion = client.chat.completions.create(
        model="qwen3.5-omni-plus",
        messages=[{
            "role": "user", 
            "content": "請查詢今天的日期和星期,並告訴我今天有哪些重要節日"
        }],
        stream=True,
        stream_options={"include_usage": True},
        # 開啟連網搜尋
        extra_body={
            "enable_search": True,
            "search_options": {
                # 連網搜尋策略,僅支援配置為 agent
                "search_strategy": "agent"
            }
        }
    )
    
    print("模型回複(包含即時資訊):")
    for chunk in completion:
        if chunk.choices and chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="")
    print()
    
except Exception as e:
    print(f"請求失敗: {e}")
// 運行前的準備工作:
// npm install openai

import OpenAI from "openai";

// 初始化用戶端
const openai = new OpenAI({
    // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    apiKey: process.env.DASHSCOPE_API_KEY,
    // 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
    baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
});

// 發起請求(開啟連網搜尋)
const completion = await openai.chat.completions.create({
    model: "qwen3.5-omni-plus",
    messages: [{
        "role": "user",
        "content": "請查詢今天的日期和星期,並告訴我今天有哪些重要節日"
    }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    // 開啟連網搜尋
    extra_body: {
        enable_search: true,
        search_options: {
            // 連網搜尋策略,僅支援配置為 agent
            search_strategy: "agent"
        }
    }
});

console.log("模型回複(包含即時資訊):");

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        if (chunk.choices[0].delta.content) {
            process.stdout.write(chunk.choices[0].delta.content);
        }
    }
}
console.log();
# ======= 重要提示 =======
# 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === 執行時請刪除該注釋 ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3.5-omni-plus",
    "messages": [
        {
            "role": "user", 
            "content": "請查詢今天的日期和星期,並告訴我今天有哪些重要節日"
        }
    ],
    "stream": true,
    "stream_options": {
        "include_usage": true
    },
    "enable_search": true,
    "search_options": {
        "search_strategy": "agent"
    }
}'

注意事項

  • 連網搜尋功能僅在 Qwen3.5-Omni 系列模型中支援,search_strategy 僅支援配置為 agent

  • 計費請參考計費說明中的agent策略。

開啟/關閉思考模式

Qwen-Omni系列模型中,僅Qwen3-Omni-Flash 模型屬於混合思考模型,通過enable_thinking參數控制是否開啟思考模式:

  • true:開啟思考模式

  • false(預設):關閉思考模式

OpenAI 相容

import os
from openai import OpenAI

client = OpenAI(
    # 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3-omni-flash",
    messages=[{"role": "user", "content": "你是誰"}],

    # 開啟/關閉思考模式,在思考模式下不支援輸出音頻;qwen-omni-turbo不支援設定enable_thinking。
    extra_body={'enable_thinking': True},

    # 設定輸出資料的模態,非思考模式下當前支援兩種:["text","audio"]、["text"],思考模式僅支援:["text"]
    modalities=["text"],

    # 設定音色,思考模式下不支援設定audio參數
    # audio={"voice": "Tina", "format": "wav"},
    # stream 必須設定為 True,否則會報錯
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3-omni-flash",
    messages: [
        { role: "user", content: "你是誰?" }
    ],

    // stream 必須設定為 True,否則會報錯
    stream: true,
    stream_options: {
        include_usage: true
    },
    // 開啟/關閉思考模式,在思考模式下不支援輸出音頻;qwen-omni-turbo不支援設定enable_thinking。
    extra_body:{'enable_thinking': true},
    //  設定輸出資料的模態,非思考模式下當前支援兩種:["text","audio"]、["text"],思考模式僅支援:["text"]
    modalities: ["text"],
    // 設定音色,思考模式下不支援設定audio參數
    //audio: { voice: "Tina", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= 重要提示 =======
# 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === 執行時請刪除該注釋 ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-omni-flash",
    "messages": [
        {
            "role": "user",
            "content": "你是誰?"
        }
    ],
    "stream":true,
    "stream_options":{
        "include_usage":true
    },
    "modalities":["text"],
    "enable_thinking": true
}'

返回結果

data: {"choices":[{"delta":{"content":null,"role":"assistant","reasoning_content":""},"index":0,"logprobs":null,"finish_reason":null}],"object":"chat.completion.chunk","usage":null,"created":1757937336,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
data: {"choices":[{"finish_reason":null,"logprobs":null,"delta":{"content":null,"reasoning_content":"嗯"},"index":0}],"object":"chat.completion.chunk","usage":null,"reated":1757937336,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
data: {"choices":[{"delta":{"content":null,"reasoning_content":","},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"reated":1757937336,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
......
data: {"choices":[{"delta":{"content":"告訴我"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757937336,"tem_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
data: {"choices":[{"delta":{"content":"!"},"finish_reason":null,"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757937336,"systm_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
data: {"choices":[{"finish_reason":"stop","delta":{"content":"","reasoning_content":null},"index":0,"logprobs":null}],"object":"chat.completion.chunk","usage":null,"created":1757937336,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}
data: {"choices":[],"object":"chat.completion.chunk","usage":{"prompt_tokens":11,"completion_tokens":363,"total_tokens":374,"completion_tokens_details":{"reasoning_tokens":195,"text_tokens":168},"prompt_tokens_details":{"text_tokens":11}},"created":1757937336,"system_fingerprint":null,"model":"qwen3-omni-flash","id":"chatcmpl-ce3d6fe5-e717-4b7e-8b40-3aef12288d4c"}

多輪對話

您在使用 Qwen-Omni 模型的多輪對話功能時,需要注意:

  • Assistant Message

    添加到 messages 數組中的 Assistant Message 只可以包含文本資料。

  • User Message

    一條 User Message 只可以包含文本和一種模態的資料,在多輪對話中您可以在不同的 User Message 中輸入不同模態的資料。

OpenAI 相容

import os
from openai import OpenAI

client = OpenAI(
    # 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3.5-omni-plus", # 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3",
                        "format": "mp3",
                    },
                },
                {"type": "text", "text": "這段音頻在說什麼"},
            ],
        },
        {
            "role": "assistant",
            "content": [{"type": "text", "text": "這段音頻在說:歡迎使用阿里雲"}],
        },
        {
            "role": "user",
            "content": [{"type": "text", "text": "介紹一下這家公司?"}],
        },
    ],
    # 設定輸出資料的模態,當前支援兩種:["text","audio"]、["text"]
    modalities=["text"],
    # stream 必須設定為 True,否則會報錯
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3.5-omni-plus", // 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages: [
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3",
                        "format": "mp3",
                    },
                },
                { "type": "text", "text": "這段音頻在說什麼" },
            ],
        },
        {
            "role": "assistant",
            "content": [{ "type": "text", "text": "這段音頻在說:歡迎使用阿里雲" }],
        },
        {
            "role": "user",
            "content": [{ "type": "text", "text": "介紹一下這家公司?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text"]
});


for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}
# ======= 重要提示 =======
# 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === 執行時請刪除該注釋 ===

curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
  "model": "qwen3.5-omni-plus",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_audio",
          "input_audio": {
            "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
          }
        },
        {
          "type": "text",
          "text": "這段音頻在說什麼"
        }
      ]
    },
    {
      "role": "assistant",
      "content": [
        {
          "type": "text",
          "text": "這段音頻在說:歡迎使用阿里雲"
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "介紹一下這家公司?"
        }
      ]
    }
  ],
  "stream": true,
  "stream_options": {
    "include_usage": true
  },
  "modalities": ["text"]
}'

解析輸出的Base 64 編碼的音頻資料

Qwen-Omni 模型輸出的音頻為流式輸出的 Base 64 編碼資料。您可以在模型產生過程中維護一個字串變數,將每個返回片段的 Base 64 編碼添加到字串變數後,待產生結束後進行 Base64 解碼,得到音頻檔案;也可以將每個返回片段的 Base 64 編碼資料即時解碼並播放。

# Installation instructions for pyaudio:
# APPLE Mac OS X
#   brew install portaudio
#   pip install pyaudio
# Debian/Ubuntu
#   sudo apt-get install python-pyaudio python3-pyaudio
#   or
#   pip install pyaudio
# CentOS
#   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
#   python -m pip install pyaudio

import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf

client = OpenAI(
    # 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

completion = client.chat.completions.create(
    model="qwen3.5-omni-plus", # 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages=[{"role": "user", "content": "你是誰"}],
    # 設定輸出資料的模態,當前支援兩種:["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Tina", "format": "wav"},
    # stream 必須設定為 True,否則會報錯
    stream=True,
    stream_options={"include_usage": True},
)

# 方式1: 待產生結束後再進行解碼
audio_string = ""
for chunk in completion:
    if chunk.choices:
        if hasattr(chunk.choices[0].delta, "audio"):
            try:
                audio_string += chunk.choices[0].delta.audio["data"]
            except Exception as e:
                print(chunk.choices[0].delta.content)
    else:
        print(chunk.usage)

wav_bytes = base64.b64decode(audio_string)
audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
sf.write("audio_assistant_py.wav", audio_np, samplerate=24000)

# 方式2: 邊產生邊解碼(使用方式2請將方式1的代碼進行注釋)
# # 初始化 PyAudio
# import pyaudio
# import time
# p = pyaudio.PyAudio()
# # 建立音頻流
# stream = p.open(format=pyaudio.paInt16,
#                 channels=1,
#                 rate=24000,
#                 output=True)

# for chunk in completion:
#     if chunk.choices:
#         if hasattr(chunk.choices[0].delta, "audio"):
#             try:
#                 audio_string = chunk.choices[0].delta.audio["data"]
#                 wav_bytes = base64.b64decode(audio_string)
#                 audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
#                 # 直接播放音頻資料
#                 stream.write(audio_np.tobytes())
#             except Exception as e:
#                 print(chunk.choices[0].delta.content)

# time.sleep(0.8)
# # 清理資源
# stream.stop_stream()
# stream.close()
# p.terminate()
// 運行前的準備工作:
// Windows/Mac/Linux 通用:
// 1. 確保已安裝 Node.js (建議版本 >= 14)
// 2. 運行以下命令安裝必要的依賴:
//    npm install openai wav
// 
// 如果要使用即時播放功能 (方式2), 還需要:
// Windows:
//    npm install speaker
// Mac:
//    brew install portaudio
//    npm install speaker
// Linux (Ubuntu/Debian):
//    sudo apt-get install libasound2-dev
//    npm install speaker

import OpenAI from "openai";

const openai = new OpenAI(
    {
        // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);
const completion = await openai.chat.completions.create({
    model: "qwen3.5-omni-plus", // 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages: [
        {
            "role": "user",
            "content": "你是誰?"
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Tina", format: "wav" }
});

// 方式1: 待產生結束後再進行解碼
// 需要安裝: npm install wav
import { createWriteStream } from 'node:fs';  // node:fs 是 Node.js 內建模組,無需安裝
import { Writer } from 'wav';

async function convertAudio(audioString, audioPath) {
    try {
        // 解碼Base64字串為Buffer
        const wavBuffer = Buffer.from(audioString, 'base64');
        // 建立WAV檔案寫入流
        const writer = new Writer({
            sampleRate: 24000,  // 採樣率
            channels: 1,        // 單聲道
            bitDepth: 16        // 16位元深度
        });
        // 建立輸出檔案流並建立管道串連
        const outputStream = createWriteStream(audioPath);
        writer.pipe(outputStream);

        // 寫入PCM資料並結束寫入
        writer.write(wavBuffer);
        writer.end();

        // 使用Promise等待檔案寫入完成
        await new Promise((resolve, reject) => {
            outputStream.on('finish', resolve);
            outputStream.on('error', reject);
        });

        // 添加額外等待時間確保音頻完整
        await new Promise(resolve => setTimeout(resolve, 800));

        console.log(`音頻檔案已成功儲存為 ${audioPath}`);
    } catch (error) {
        console.error('處理過程中發生錯誤:', error);
    }
}

let audioString = "";
for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        if (chunk.choices[0].delta.audio) {
            if (chunk.choices[0].delta.audio["data"]) {
                audioString += chunk.choices[0].delta.audio["data"];
            }
        }
    } else {
        console.log(chunk.usage);
    }
}
// 執行轉換
convertAudio(audioString, "audio_assistant_mjs.wav");


// 方式2: 邊產生邊即時播放
// 需要先按照上方系統對應的說明安裝必要組件
// import Speaker from 'speaker'; // 引入音頻播放庫

// // 建立擴音器執行個體(配置與 WAV 檔案參數一致)
// const speaker = new Speaker({
//     sampleRate: 24000,  // 採樣率
//     channels: 1,        // 聲道數
//     bitDepth: 16,       // 位深
//     signed: true        // 有符號 PCM
// });
// for await (const chunk of completion) {
//     if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
//         if (chunk.choices[0].delta.audio) {
//             if (chunk.choices[0].delta.audio["data"]) {
//                 const pcmBuffer = Buffer.from(chunk.choices[0].delta.audio.data, 'base64');
//                 // 直接寫入擴音器播放
//                 speaker.write(pcmBuffer);
//             }
//         }
//     } else {
//         console.log(chunk.usage);
//     }
// }
// speaker.on('finish', () => console.log('播放完成'));
// speaker.end(); // 根據實際 API 流結束情況調用

輸入 Base 64 編碼的本地檔案

圖片

以儲存在本地的eagle.png為例。

import os
from openai import OpenAI
import base64

client = OpenAI(
    # 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)


#  Base 64 編碼格式
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


base64_image = encode_image("eagle.png")

completion = client.chat.completions.create(
    model="qwen3.5-omni-plus", # 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image_url",
                    "image_url": {"url": f"data:image/png;base64,{base64_image}"},
                },
                {"type": "text", "text": "圖中描繪的是什麼景象?"},
            ],
        },
    ],
    # 設定輸出資料的模態,當前支援兩種:["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Tina", "format": "wav"},
    # stream 必須設定為 True,否則會報錯
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeImage = (imagePath) => {
    const imageFile = readFileSync(imagePath);
    return imageFile.toString('base64');
};
const base64Image = encodeImage("eagle.png")

const completion = await openai.chat.completions.create({
    model: "qwen3.5-omni-plus",// 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "image_url",
                "image_url": { "url": `data:image/png;base64,${base64Image}` },
            },
            { "type": "text", "text": "圖中描繪的是什麼景象?" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Tina", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

音頻

以儲存在本地的welcome.mp3為例。

import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf
import requests

client = OpenAI(
    # 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)


def encode_audio(audio_path):
    with open(audio_path, "rb") as audio_file:
        return base64.b64encode(audio_file.read()).decode("utf-8")


base64_audio = encode_audio("welcome.mp3")

completion = client.chat.completions.create(
    model="qwen3.5-omni-plus", # 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": f"data:;base64,{base64_audio}",
                        "format": "mp3",
                    },
                },
                {"type": "text", "text": "這段音頻在說什麼"},
            ],
        },
    ],
    # 設定輸出資料的模態,當前支援兩種:["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Tina", "format": "wav"},
    # stream 必須設定為 True,否則會報錯
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeAudio = (audioPath) => {
    const audioFile = readFileSync(audioPath);
    return audioFile.toString('base64');
};
const base64Audio = encodeAudio("welcome.mp3")

const completion = await openai.chat.completions.create({
    model: "qwen3.5-omni-plus", // 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "input_audio",
                "input_audio": { "data": `data:;base64,${base64Audio}`, "format": "mp3" },
            },
            { "type": "text", "text": "這段音頻在說什麼" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Tina", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

視頻

視頻檔案

以儲存在本地的spring_mountain.mp4為例。

import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf

client = OpenAI(
    # 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)

#  Base 64 編碼格式
def encode_video(video_path):
    with open(video_path, "rb") as video_file:
        return base64.b64encode(video_file.read()).decode("utf-8")


base64_video = encode_video("spring_mountain.mp4")

completion = client.chat.completions.create(
    model="qwen3.5-omni-plus", # 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "video_url",
                    "video_url": {"url": f"data:;base64,{base64_video}"},
                },
                {"type": "text", "text": "她在唱什麼"},
            ],
        },
    ],
    # 設定輸出資料的模態,當前支援兩種:["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Tina", "format": "wav"},
    # stream 必須設定為 True,否則會報錯
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeVideo = (videoPath) => {
    const videoFile = readFileSync(videoPath);
    return videoFile.toString('base64');
};
const base64Video = encodeVideo("spring_mountain.mp4")

const completion = await openai.chat.completions.create({
    model: "qwen3.5-omni-plus", // 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages: [
        {
            "role": "user",
            "content": [{
                "type": "video_url",
                "video_url": { "url": `data:;base64,${base64Video}` },
            },
            { "type": "text", "text": "她在唱什麼" }]
        }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Tina", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }
}

圖片列表

以儲存在本地的football1.jpgfootball2.jpgfootball3.jpgfootball4.jpg為例。

import os
from openai import OpenAI
import base64
import numpy as np
import soundfile as sf

client = OpenAI(
    # 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
    base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)


#  Base 64 編碼格式
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")


base64_image_1 = encode_image("football1.jpg")
base64_image_2 = encode_image("football2.jpg")
base64_image_3 = encode_image("football3.jpg")
base64_image_4 = encode_image("football4.jpg")

completion = client.chat.completions.create(
    model="qwen3.5-omni-plus", # 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "video",
                    "video": [
                        f"data:image/jpeg;base64,{base64_image_1}",
                        f"data:image/jpeg;base64,{base64_image_2}",
                        f"data:image/jpeg;base64,{base64_image_3}",
                        f"data:image/jpeg;base64,{base64_image_4}",
                    ],
                },
                {"type": "text", "text": "描述這個視頻的具體過程"},
            ],
        }
    ],
    # 設定輸出資料的模態,當前支援兩種:["text","audio"]、["text"]
    modalities=["text", "audio"],
    audio={"voice": "Tina", "format": "wav"},
    # stream 必須設定為 True,否則會報錯
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in completion:
    if chunk.choices:
        print(chunk.choices[0].delta)
    else:
        print(chunk.usage)
import OpenAI from "openai";
import { readFileSync } from 'fs';

const openai = new OpenAI(
    {
        // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        apiKey: process.env.DASHSCOPE_API_KEY,
        // 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/compatible-mode/v1
        baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
    }
);

const encodeImage = (imagePath) => {
    const imageFile = readFileSync(imagePath);
    return imageFile.toString('base64');
  };
const base64Image1 = encodeImage("football1.jpg")
const base64Image2 = encodeImage("football2.jpg")
const base64Image3 = encodeImage("football3.jpg")
const base64Image4 = encodeImage("football4.jpg")

const completion = await openai.chat.completions.create({
    model: "qwen3.5-omni-plus", // 模型為Qwen3-Omni-Flash時,請在非思考模式下運行
    messages: [{
        role: "user",
        content: [
            {
                type: "video",
                video: [
                    `data:image/jpeg;base64,${base64Image1}`,
                    `data:image/jpeg;base64,${base64Image2}`,
                    `data:image/jpeg;base64,${base64Image3}`,
                    `data:image/jpeg;base64,${base64Image4}`
                ]
            },
            {
                type: "text",
                text: "描述這個視頻的具體過程"
            }
        ]
    }],
    stream: true,
    stream_options: {
        include_usage: true
    },
    modalities: ["text", "audio"],
    audio: { voice: "Tina", format: "wav" }
});

for await (const chunk of completion) {
    if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
        console.log(chunk.choices[0].delta);
    } else {
        console.log(chunk.usage);
    }

API參考

關於千問Omni 模型的輸入輸出參數,請參見千問

計費與限流

計費規則

Qwen-Omni模型根據不同模態(音頻、映像、視頻)對應的Token數計費。計費詳情請參見模型列表

音頻、圖片與視頻轉換為Token數的規則

音頻

  • Qwen3.5-Omni總 Tokens 數 = 音頻時間長度(單位:秒)* 7

  • Qwen3-Omni-Flash:總 Tokens 數 = 音頻時間長度(單位:秒)* 12.5

  • Qwen-Omni-Turbo:總 Tokens 數 = 音頻時間長度(單位:秒)* 25

若音頻時間長度不足1秒,則按 1 秒計算。

圖片

  • Qwen3.5-OmniQwen3-Omni-Flash模型32x32像素對應 1 個 Token

  • Qwen-Omni-Turbo模型:每28x28像素對應 1 個 Token

一張圖最少需要 4 個 Token,最多支援 1280 個 Token;可使用以下代碼,傳入映像路徑即可估算單張圖片消耗的 Token 總量:

import math
# 使用以下命令安裝Pillow庫:pip install Pillow
from PIL import Image

# Qwen-Omni-Turbo模型,factor為28
# factor = 28
# Qwen3-Omni-Flash模型,factor為32
factor = 32

def token_calculate(image_path=''):
    """
    param image_path: 映像路徑
    return: 單張映像的Token數
    """
    if len(image_path) > 0:
        # 開啟指定的PNG圖片檔案
        image = Image.open(image_path)
        # 擷取圖片的原始大小
        height = image.height
        width = image.width
        print(f"縮放前的映像尺寸為:高度為{height},寬度為{width}")
        # 將高度調整為factor的整數倍
        h_bar = round(height / factor) * factor
        # 將寬度調整為factor的整數倍
        w_bar = round(width / factor) * factor
        # 映像的Token下限:4個Token
        min_pixels = 4 * factor * factor
        # 映像的Token上限:1280個Token
        max_pixels = 1280 * factor * factor
        # 對映像進行縮放處理,調整像素的總數在範圍[min_pixels,max_pixels]內
        if h_bar * w_bar > max_pixels:
            # 計算縮放因子beta,使得縮放後的映像總像素數不超過max_pixels
            beta = math.sqrt((height * width) / max_pixels)
            # 重新計算調整後的高度,確保為factor的整數倍
            h_bar = math.floor(height / beta / factor) * factor
            # 重新計算調整後的寬度,確保為factor的整數倍
            w_bar = math.floor(width / beta / factor) * factor
        elif h_bar * w_bar < min_pixels:
            # 計算縮放因子beta,使得縮放後的映像總像素數不低於min_pixels
            beta = math.sqrt(min_pixels / (height * width))
            # 重新計算調整後的高度,確保為factor的整數倍
            h_bar = math.ceil(height * beta / factor) * factor
            # 重新計算調整後的寬度,確保為factor的整數倍
            w_bar = math.ceil(width * beta / factor) * factor
        print(f"縮放後的映像尺寸為:高度為{h_bar},寬度為{w_bar}")
        # 計算映像的Token數:總像素除以factor * factor
        token = int((h_bar * w_bar) / (factor * factor)) + 2
        print(f"縮放後的token數量為:{token}")
        return token
    else:
        raise ValueError("映像路徑不可為空,請提供有效影像檔路徑")
    
if __name__ == "__main__":
    token = token_calculate(image_path="xxx/test.jpg")

視頻

視頻檔案的 Token 分為 video_tokens(視覺)與 audio_tokens(音頻)。

  • video_tokens

    計算過程較為複雜。請參見以下代碼:

    # 使用前安裝:pip install opencv-python
    import math
    import os
    import logging
    import cv2
    
    # 固定參數
    FRAME_FACTOR = 2
    
    # Qwen3-Omni-Flash模型,IMAGE_FACTOR為32
    IMAGE_FACTOR = 32
    
    # Qwen-Omni-Turbo模型,IMAGE_FACTOR為28
    # IMAGE_FACTOR = 28
    
    # 視訊框架的長寬比
    MAX_RATIO = 200
    
    # 視訊框架的像素下限,Qwen3-Omni-Flash為:128 * 32 * 32
    VIDEO_MIN_PIXELS = 128 * 32 * 32
    # Qwen-Omni-Turbo
    # VIDEO_MIN_PIXELS = 128 * 28 * 28
    
    # 視訊框架的像素上限,Qwen3-Omni-Flash為:768 * 32 * 32
    VIDEO_MAX_PIXELS = 768 * 32 * 32
    # Qwen-Omni-Turbo:
    # VIDEO_MAX_PIXELS = 768 * 28 * 28
    
    FPS = 2
    # 最少抽取幀數
    FPS_MIN_FRAMES = 4
    
    # 最大抽取幀數
    # Qwen3-Omni-Flash模型的大抽取幀數:128
    # Qwen3-Omni-Turbo模型的大抽取幀數:80
    FPS_MAX_FRAMES = 128
    
    # 視頻輸入的最大像素值,Qwen3-Omni-Flash為 16384 * 32 * 32;
    VIDEO_TOTAL_PIXELS = 16384 * 32 * 32
    # Qwen-Omni-Turbo:
    # VIDEO_TOTAL_PIXELS = 16384 * 28 * 28
    
    def round_by_factor(number, factor):
        return round(number / factor) * factor
    
    def ceil_by_factor(number, factor):
        return math.ceil(number / factor) * factor
    
    def floor_by_factor(number, factor):
        return math.floor(number / factor) * factor
    
    def get_video(video_path):
        cap = cv2.VideoCapture(video_path)
        frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
        frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
        total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
        video_fps = cap.get(cv2.CAP_PROP_FPS)
        cap.release()
        return frame_height, frame_width, total_frames, video_fps
    
    def smart_nframes(total_frames, video_fps):
        min_frames = ceil_by_factor(FPS_MIN_FRAMES, FRAME_FACTOR)
        max_frames = floor_by_factor(min(FPS_MAX_FRAMES, total_frames), FRAME_FACTOR)
        duration = total_frames / video_fps if video_fps != 0 else 0
        if duration - int(duration) > (1 / FPS):
            total_frames = math.ceil(duration * video_fps)
        else:
            total_frames = math.ceil(int(duration) * video_fps)
        nframes = total_frames / video_fps * FPS
        nframes = int(min(min(max(nframes, min_frames), max_frames), total_frames))
        if not (FRAME_FACTOR <= nframes <= total_frames):
            raise ValueError(f"nframes should in interval [{FRAME_FACTOR}, {total_frames}], but got {nframes}.")
        return nframes
    
    def smart_resize(height, width, nframes, factor=IMAGE_FACTOR):
        min_pixels = VIDEO_MIN_PIXELS
        total_pixels = VIDEO_TOTAL_PIXELS
        max_pixels = max(min(VIDEO_MAX_PIXELS, total_pixels / nframes * FRAME_FACTOR), int(min_pixels * 1.05))
        if max(height, width) / min(height, width) > MAX_RATIO:
            raise ValueError(f"absolute aspect ratio must be smaller than {MAX_RATIO}, got {max(height, width) / min(height, width)}")
        h_bar = max(factor, round_by_factor(height, factor))
        w_bar = max(factor, round_by_factor(width, factor))
        if h_bar * w_bar > max_pixels:
            beta = math.sqrt((height * width) / max_pixels)
            h_bar = floor_by_factor(height / beta, factor)
            w_bar = floor_by_factor(width / beta, factor)
        elif h_bar * w_bar < min_pixels:
            beta = math.sqrt(min_pixels / (height * width))
            h_bar = ceil_by_factor(height * beta, factor)
            w_bar = ceil_by_factor(width * beta, factor)
        return h_bar, w_bar
    
    def video_token_calculate(video_path):
        height, width, total_frames, video_fps = get_video(video_path)
        nframes = smart_nframes(total_frames, video_fps)
        resized_height, resized_width = smart_resize(height, width, nframes)
        video_token = int(math.ceil(nframes / FPS) * resized_height / 32 * resized_width / 32)
        video_token += 2  # 視覺標記
        return video_token
    
    if __name__ == "__main__":
        video_path = "spring_mountain.mp4"  # 你的視頻路徑
        video_token = video_token_calculate(video_path)
        print("video_tokens:", video_token)
  • audio_tokens

    • Qwen3.5-Omni:總 Tokens 數 = 音頻時間長度(單位:秒)* 7

    • Qwen3-Omni-Flash:總Tokens數 = 音頻時間長度(單位:秒)* 12.5

    • Qwen-Omni-Turbo:總Tokens數 = 音頻時間長度(單位:秒)* 25

    若音頻時間長度不足1秒,則按 1 秒計算。

免費額度

關於免費額度的領取、查詢、使用方法等詳情,請參見新人免費額度

限流

模型限流規則及常見問題,請參見限流

錯誤碼

如果模型調用失敗並返回報錯資訊,請參見錯誤資訊進行解決。

音色列表

Qwen-Omni模型的音色列表可參見音色列表