Qwen3-Omni-Captioner is an open-source model built on Qwen3-Omni. It automatically generates accurate and comprehensive descriptions for complex audio, including speech, ambient sounds, music, and sound effects, without requiring any prompts. The model can identify speaker emotions, musical elements such as style and instruments, and sensitive information. It is ideal for audio content analysis, security audits, intent recognition, and video editing.
Supported models
International (Singapore)
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(Tokens) | (Million tokens) | |||||
qwen3-omni-30b-a3b-captioner | 65,536 | 32,768 | 32,768 | $3.81 | $3.06 | 1 million tokens Validity: 90 days after you activate Alibaba Cloud Model Studio |
Mainland China (Beijing)
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(Tokens) | (Per 1 million tokens) | |||||
qwen3-omni-30b-a3b-captioner | 65,536 | 32,768 | 32,768 | $2.265 | $1.821 | No free quota |
Token conversion rule for audio: Total tokens = Audio duration (in seconds) × 12.5. If the audio duration is less than one second, it is counted as one second.Getting started
Prerequisites
If you use an SDK to make calls, install the latest version of the SDK.
Qwen3-Omni-Captioner only supports API calls. It is not available for online testing in the Alibaba Cloud Model Studio console.
The following code samples show how to analyze online audio specified by a URL, not a local file. Learn how to pass local files and the limits on audio files.
OpenAI compatible
Python
import os
from openai import OpenAI
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-omni-30b-a3b-captioner",
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20240916/xvappi/Renovation_Noise.wav"
}
}
]
}
]
)
print(completion.choices[0].message.content)Node.js
import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3-omni-30b-a3b-captioner",
messages: [
{
"role": "user",
"content": [{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20240916/xvappi/Renovation_Noise.wav"
}
}]
}]
});
console.log(completion.choices[0].message.content)curl
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20240916/xvappi/Renovation_Noise.wav"
}
}
]
}
]
}'DashScope
Python
import dashscope
import os
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url="https://dashscope-intl.aliyuncs.com/api/v1"
messages = [
{
"role": "user",
"content": [
{"audio": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20240916/xvappi/Renovation_Noise.wav"}
]
}
]
response = dashscope.MultiModalConversation.call(
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
api_key=os.getenv('DASHSCOPE_API_KEY'),
model="qwen3-omni-30b-a3b-captioner",
messages=messages
)
print("Output:")
print(response["output"]["choices"][0]["message"].content[0]["text"])Java
import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the base-url for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20240916/xvappi/Renovation_Noise.wav")))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.model("qwen3-omni-30b-a3b-captioner")
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println("Output:\n" + result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before execution ===
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"input":{
"messages":[
{
"role": "user",
"content": [
{"audio": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20240916/xvappi/Renovation_Noise.wav"}
]
}
]
}
}'How it works
Single-turn interaction: The model does not support multi-turn conversation. Each request is an independent analysis task.
Fixed task: The model's core task is to generate audio descriptions in English only. You cannot use instructions, such as a system message, to change its behavior, such as controlling the output format or content focus.
Audio input only: The model accepts only audio as input. You do not need to pass text prompts. The format of the
messageparameter is fixed.
Streaming output
After the model receives input, it generates intermediate results step-by-step. The final result is a combination of these intermediate results. This method of generating and outputting results simultaneously is called streaming output. With streaming output, you can read the response as it is generated, which reduces your waiting time.
OpenAI compatible
Enabling streaming output with the OpenAI compatible method is straightforward. Simply set the stream parameter to true in your request.
Python
import os
from openai import OpenAI
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-omni-30b-a3b-captioner",
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20240916/xvappi/Renovation_Noise.wav"
}
}
]
}
],
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
# If stream_options.include_usage is True, the choices field of the last chunk is an empty list and should be skipped. You can get the token usage from chunk.usage.
if chunk.choices and chunk.choices[0].delta.content != "":
print(chunk.choices[0].delta.content,end="")Node.js
import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3-omni-30b-a3b-captioner",
messages: [
{
"role": "user",
"content": [{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20240916/xvappi/Renovation_Noise.wav"
},
}]
}],
stream: true,
stream_options: {
include_usage: true
},
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta.content);
} else {
console.log(chunk.usage);
}
}curl
# ======= Important =======
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20240916/xvappi/Renovation_Noise.wav"
}
}
]
}
],
"stream":true,
"stream_options":{
"include_usage":true
}
}'DashScope
You can call the model through the DashScope SDK or using HTTP to use streaming output. Set the parameters as follows based on your call method:
Python SDK: Set the
streamparameter to True.Java SDK: Use the
streamCallmethod.HTTP: In the header, set
X-DashScope-SSEtoenable.
By default, streaming output is non-incremental. This means that each returned chunk contains all previously generated content. If you want incremental streaming output, set theincremental_outputparameter (orincrementalOutputfor Java) totrue.
Python
import dashscope
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url="https://dashscope-intl.aliyuncs.com/api/v1"
messages = [
{
"role": "user",
"content": [
{"audio": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20240916/xvappi/Renovation_Noise.wav"}
]
}
]
response = dashscope.MultiModalConversation.call(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx",
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv('DASHSCOPE_API_KEY'),
model="qwen3-omni-30b-a3b-captioner",
messages=messages,
stream=True,
incremental_output=True
)
full_content = ""
print("Streaming output:")
for response in response:
if response["output"]["choices"][0]["message"].content:
print(response["output"]["choices"][0]["message"].content[0]["text"])
full_content += response["output"]["choices"][0]["message"].content[0]["text"]
print(f"Full content: {full_content}")Java
import java.util.Arrays;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void streamCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
// qwen3-omni-30b-a3b-captioner supports only one audio file as input.
.content(Arrays.asList(
new HashMap<String, Object>(){{put("audio", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20240916/xvappi/Renovation_Noise.wav");}}
)).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-omni-30b-a3b-captioner")
.message(userMessage)
.incrementalOutput(true)
.build();
Flowable<MultiModalConversationResult> result = conv.streamCall(param);
result.blockingForEach(item -> {
try {
var content = item.getOutput().getChoices().get(0).getMessage().getContent();
// Check if content exists and is not empty.
if (content != null && !content.isEmpty()) {
System.out.println(content.get(0).get("text"));
}
} catch (Exception e){
System.exit(0);
}
});
}
public static void main(String[] args) {
try {
streamCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before execution ===
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"input":{
"messages":[
{
"role": "user",
"content": [
{"audio": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/en-US/20240916/xvappi/Renovation_Noise.wav"}
]
}
]
},
"parameters": {
"incremental_output": true
}
}'Pass a local file (Base64 encoding or file path)
The model provides two methods to upload a local file:
Base64 encoding
Direct file path (Recommended for more stable transmission)
Upload methods:
Pass by file path
Pass the file path directly to the model. This method is supported only by the DashScope Python and Java SDKs, not by HTTP. Refer to the following table to specify the file path based on your programming language and operating system.
Pass by Base64 encoding
Convert the file to a Base64-encoded string and then pass it to the model.
Limits:
We recommend passing the file path directly for greater stability. You can also use Base64 encoding for files smaller than 1 MB.
When passing a file path directly, the audio file must be smaller than 10 MB.
When passing a file using Base64 encoding, the encoded string must be smaller than 10 MB. Base64 encoding increases the data size.
Pass by file path
Passing a file path is supported only by the DashScope Python and Java SDKs, not by HTTP.
Python
import dashscope
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path of your local audio file.
# The full path of the local file must be prefixed with file:// to ensure a valid path, for example: file:///home/images/test.mp3
audio_file_path = "file://ABSOLUTE_PATH/welcome.mp3"
messages = [
{
"role": "user",
# Pass the file path prefixed with file:// in the audio parameter.
"content": [{"audio": audio_file_path}],
}
]
response = dashscope.MultiModalConversation.call(
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model="qwen3-omni-30b-a3b-captioner",
messages=messages)
print("Output:")
print(response["output"]["choices"][0]["message"].content[0]["text"])
Java
import java.util.Arrays;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void callWithLocalFile()
throws ApiException, NoApiKeyException, UploadFileException {
// Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path of your local audio file.
// The full path of the local file must be prefixed with file:// to ensure a valid path, for example: file:///home/images/test.mp3
// The current test system is macOS. If you use Windows, use "file:///ABSOLUTE_PATH/welcome.mp3" instead.
String localFilePath = "file://ABSOLUTE_PATH/welcome.mp3";
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
new HashMap<String, Object>(){{put("audio", localFilePath);}}
))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-omni-30b-a3b-captioner")
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println("Output:\n" + result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
callWithLocalFile();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}Pass by Base64 encoding
OpenAI compatible
Python
import os
from openai import OpenAI
import base64
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
def encode_audio(audio_path):
with open(audio_path, "rb") as audio_file:
return base64.b64encode(audio_file.read()).decode("utf-8")
# Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path of your local audio file.
audio_file_path = "xxx/ABSOLUTE_PATH/welcome.mp3"
base64_audio = encode_audio(audio_file_path)
completion = client.chat.completions.create(
model="qwen3-omni-30b-a3b-captioner",
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
# When passing a local file with Base64 encoding, you must use the data: prefix to ensure a valid file URL.
# The "base64" keyword must be included before the Base64-encoded data (base64_audio), otherwise an error will occur.
"data": f"data:;base64,{base64_audio}"
},
}
],
},
]
)
print(completion.choices[0].message.content)Node.js
import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeAudio = (audioPath) => {
const audioFile = readFileSync(audioPath);
return audioFile.toString('base64');
};
// Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path of your local audio file.
const base64Audio = encodeAudio("xxx/ABSOLUTE_PATH/welcome.mp3")
const completion = await openai.chat.completions.create({
model: "qwen3-omni-30b-a3b-captioner",
messages: [
{
"role": "user",
"content": [{
"type": "input_audio",
"input_audio": { "data": `data:;base64,${base64Audio}`}
}]
}]
});
console.log(completion.choices[0].message.content);curl
For information about how to convert a file to a Base64-encoded string, see the code sample.
For demonstration purposes, the
"data:;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5...."Base64 string is truncated. In practice, you must pass the complete encoded string.
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "data:;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5...."
}
}
]
}
]
}'DashScope
Python
import os
import base64
import dashscope
dashscope.base_http_api_url="https://dashscope-intl.aliyuncs.com/api/v1"
# Encoding function: Converts a local file to a Base64-encoded string
def encode_audio(audio_file_path):
with open(audio_file_path, "rb") as audio_file:
return base64.b64encode(audio_file.read()).decode("utf-8")
# Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path of your local audio file.
audio_file_path = "xxx/ABSOLUTE_PATH/welcome.mp3"
base64_audio = encode_audio(audio_file_path)
print(base64_audio)
messages = [
{
"role": "user",
# When passing a local file with Base64 encoding, you must use the data: prefix to ensure a valid file URL.
# The "base64" keyword must be included before the Base64-encoded data (base64_audio), otherwise an error will occur.
"content": [{"audio":f"data:;base64,{base64_audio}"}],
}
]
response = dashscope.MultiModalConversation.call(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY")
model="qwen3-omni-30b-a3b-captioner",
messages=messages,
)
print(response.output.choices[0].message.content[0]["text"])Java
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.Base64;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static String encodeAudioToBase64(String audioPath) throws IOException {
Path path = Paths.get(audioPath);
byte[] audioBytes = Files.readAllBytes(path);
return Base64.getEncoder().encodeToString(audioBytes);
}
public static void callWithLocalFile()
throws ApiException, NoApiKeyException, UploadFileException,IOException{
// Replace ABSOLUTE_PATH/welcome.mp3 with the actual path of your local file.
String localFilePath = "ABSOLUTE_PATH/welcome.mp3";
String base64Audio = encodeAudioToBase64(localFilePath);
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
// When passing a local file with Base64 encoding, you must use the data: prefix to ensure a valid file URL.
// The "base64" keyword must be included before the Base64-encoded data (base64_audio), otherwise an error will occur.
.content(Arrays.asList(
new HashMap<String, Object>(){{put("audio", "data:;base64," + base64Audio);}}
))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.model("qwen3-omni-30b-a3b-captioner")
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println("Output:\n" + result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
callWithLocalFile();
} catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}curl
For information about how to convert a file to a Base64-encoded string, see the code sample.
For demonstration purposes, the
"data:;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5...."Base64 string is truncated. In practice, you must pass the complete encoded string.
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before execution ===
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"input":{
"messages":[
{
"role": "user",
"content": [
{"audio": "data:;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5...."}
]
}
]
}
}'API reference
For more information about the input and output parameters of Qwen3-Omni-Captioner, see Qwen.
Error codes
If a call fails, see Error messages for troubleshooting.
FAQ
Limitations
The model has the following limits for audio files:
Duration: Less than or equal to 40 minutes.
Number of files: Only one audio file is supported per request.
File formats: Supported formats include AMR, WAV (CodecID: GSM_MS), WAV (PCM), 3GP, 3GPP, AAC, and MP3.
File input methods: Publicly accessible audio URL, Base64 encoding, or local file path.
File size:
Public URL: No more than 1 GB.
File path: The audio file must be smaller than 10 MB.
Base64 encoding: The encoded Base64 string must be smaller than 10 MB. For more information, see Pass a local file (Base64 encoding or file path).
To compress a file, see How to compress an audio file to the required size?