Qwen3-Omni-Captioner is an open-source model built on Qwen3-Omni. It automatically generates accurate and comprehensive descriptions for complex audio—speech, ambient sounds, music, and sound effects—without prompts. The model identifies speaker emotions, musical elements (style, instruments), and sensitive information. Ideal for audio content analysis, security audits, intent recognition, and video editing.
Availability
Supported regions
Supported models
International
In international deployment mode, the endpoint and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled worldwide, excluding Chinese Mainland.
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||
qwen3-omni-30b-a3b-captioner | 65,536 | 32,768 | 32,768 | $3.81 | $3.06 | 1 million tokens Valid for 90 days after activating Model Studio |
Chinese Mainland
In Chinese Mainland deployment mode, the endpoint and data storage are both located in the Beijing region. Model inference compute resources are limited to Chinese Mainland.
Model | Context window | Max input | Max output | Input cost | Output cost | Free quota |
(tokens) | (per 1M tokens) | |||||
qwen3-omni-30b-a3b-captioner | 65,536 | 32,768 | 32,768 | $2.265 | $1.821 | No free quota. |
Token conversion rule for audio: Total tokens = Audio duration (in seconds) × 12.5. If the audio duration is less than one second, it is counted as one second.
Getting started
Prerequisites
-
If you use an SDK to make calls, install the latest version of the SDK.
Qwen3-Omni-Captioner supports API calls only. Online testing in the Model Studio console is not available.
These code samples analyze online audio via a URL, not local files. Learn how to pass local files and audio file limits.
OpenAI compatible
Python
import os
from openai import OpenAI
client = OpenAI(
# API keys differ by region. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# Singapore region URL. For Beijing, use: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-omni-30b-a3b-captioner",
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"
}
}
]
}
]
)
print(completion.choices[0].message.content)
Node.js
import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// Singapore region URL. For Beijing, use: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3-omni-30b-a3b-captioner",
messages: [
{
"role": "user",
"content": [{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"
}
}]
}]
});
console.log(completion.choices[0].message.content)
curl
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Singapore region base_url. For Beijing, use: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"
}
}
]
}
]
}'
DashScope
Python
import dashscope
import os
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url="https://dashscope-intl.aliyuncs.com/api/v1"
messages = [
{
"role": "user",
"content": [
{"audio": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"}
]
}
]
response = dashscope.MultiModalConversation.call(
# API keys differ by region. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If environment variable not configured, replace with your API key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model="qwen3-omni-30b-a3b-captioner",
messages=messages
)
print("Output:")
print(response["output"]["choices"][0]["message"].content[0]["text"])
Java
import java.util.Arrays;
import java.util.Collections;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the base-url for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav")))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.model("qwen3-omni-30b-a3b-captioner")
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println("Output:\n" + result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
curl
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Singapore region base_url. For Beijing, use: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before execution ===
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"input":{
"messages":[
{
"role": "user",
"content": [
{"audio": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"}
]
}
]
}
}'
How it works
-
Single-turn interaction: Each request is an independent analysis task. Multi-turn conversation is not supported.
-
Fixed task: The model generates audio descriptions in English only. You cannot use instructions (e.g., system messages) to change behavior, output format, or content focus.
-
Audio input only: The model accepts audio only. Text prompts are not needed. The
messageparameter format is fixed.
Streaming output
Streaming output generates intermediate results step-by-step and returns them simultaneously, allowing you to read responses as they're generated. This reduces wait time.
OpenAI compatible
Set stream to true to enable streaming output.
Python
import os
from openai import OpenAI
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# Singapore region URL. For Beijing, use: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
completion = client.chat.completions.create(
model="qwen3-omni-30b-a3b-captioner",
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"
}
}
]
}
],
stream=True,
stream_options={"include_usage": True},
)
for chunk in completion:
# If stream_options.include_usage is True, the choices field of the last chunk is an empty list and should be skipped. You can get the token usage from chunk.usage.
if chunk.choices and chunk.choices[0].delta.content != "":
print(chunk.choices[0].delta.content,end="")
Node.js
import OpenAI from "openai";
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// Singapore region URL. For Beijing, use: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const completion = await openai.chat.completions.create({
model: "qwen3-omni-30b-a3b-captioner",
messages: [
{
"role": "user",
"content": [{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"
},
}]
}],
stream: true,
stream_options: {
include_usage: true
},
});
for await (const chunk of completion) {
if (Array.isArray(chunk.choices) && chunk.choices.length > 0) {
console.log(chunk.choices[0].delta.content);
} else {
console.log(chunk.usage);
}
}
curl
# ======= Important =======
# Singapore region base_url. For Beijing, use: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"
}
}
]
}
],
"stream":true,
"stream_options":{
"include_usage":true
}
}'
DashScope
Call via DashScope SDK or HTTP. Set parameters based on your method:
-
Python SDK: Set the
streamparameter to True. -
Java SDK: Use the
streamCallmethod. -
HTTP: In the header, set
X-DashScope-SSEtoenable.
By default, streaming output is non-incremental. This means each returned chunk contains all previously generated content. For incremental output, setincremental_output(incrementalOutputin Java) totrue.
Python
import dashscope
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url="https://dashscope-intl.aliyuncs.com/api/v1"
messages = [
{
"role": "user",
"content": [
{"audio": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"}
]
}
]
response = dashscope.MultiModalConversation.call(
# If environment variable not configured, replace with your API key: api_key="sk-xxx"
# API keys differ by region. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv('DASHSCOPE_API_KEY'),
model="qwen3-omni-30b-a3b-captioner",
messages=messages,
stream=True,
incremental_output=True
)
full_content = ""
print("Streaming output:")
for response in response:
if response["output"]["choices"][0]["message"].content:
print(response["output"]["choices"][0]["message"].content[0]["text"])
full_content += response["output"]["choices"][0]["message"].content[0]["text"]
print(f"Full content: {full_content}")
Java
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import io.reactivex.Flowable;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// Singapore region URL. For Beijing, use: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void streamCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
// qwen3-omni-30b-a3b-captioner supports only one audio file as input.
.content(Arrays.asList(
new HashMap<String, Object>(){{put("audio", "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav");}}
)).build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// If environment variable not configured, replace with your API key: .apiKey("sk-xxx")
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-omni-30b-a3b-captioner")
.message(userMessage)
.incrementalOutput(true)
.build();
Flowable<MultiModalConversationResult> result = conv.streamCall(param);
result.blockingForEach(item -> {
try {
List<com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult.Output.Choice.Message.Content> content = item.getOutput().getChoices().get(0).getMessage().getContent();
// Check if content exists and is not empty.
if (content != null && !content.isEmpty()) {
System.out.println(content.get(0).get("text"));
}
} catch (Exception e){
System.exit(0);
}
});
}
public static void main(String[] args) {
try {
streamCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
curl
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Singapore region base_url. For Beijing, use: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before execution ===
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"input":{
"messages":[
{
"role": "user",
"content": [
{"audio": "https://help-static-aliyun-doc.aliyuncs.com/file-manage-files/zh-CN/20240916/xvappi/%E8%A3%85%E4%BF%AE%E5%99%AA%E9%9F%B3.wav"}
]
}
]
},
"parameters": {
"incremental_output": true
}
}'
Pass local file (Base64 encoding or file path)
Two methods are available to upload local files:
-
Use Base64 encoding
-
Direct file path (Recommended for greater transmission stability)
Upload methods:
Pass by file path
Pass the file path directly to the model. Supported by DashScope Python and Java SDKs only, not HTTP. See the table below for path format by language and OS.
Pass by Base64 encoding
Convert the file to a Base64 string and pass it to the model.
Limits:
-
Recommended: pass the file path directly for greater transmission stability. For files under 1 MB, Base64 encoding also works.
-
When passing by file path, audio files must be under 10 MB.
-
When using Base64, the encoded string must be under 10 MB. Note: Base64 increases file size.
Pass by file path
File path passing is supported by DashScope Python and Java SDKs only, not HTTP.
Python
import dashscope
import os
# Singapore region base_url. For Beijing, use: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path of your local audio file.
# The full path of the local file must be prefixed with file:// to ensure a valid path, for example: file:///home/images/test.mp3
audio_file_path = "file://ABSOLUTE_PATH/welcome.mp3"
messages = [
{
"role": "user",
# Pass the file path prefixed with file:// in the audio parameter.
"content": [{"audio": audio_file_path}],
}
]
response = dashscope.MultiModalConversation.call(
# API keys differ by region. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
api_key=os.getenv('DASHSCOPE_API_KEY'),
model="qwen3-omni-30b-a3b-captioner",
messages=messages)
print("Output:")
print(response["output"]["choices"][0]["message"].content[0]["text"])
Java
import java.util.Arrays;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
public static void callWithLocalFile()
throws ApiException, NoApiKeyException, UploadFileException {
// Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path of your local audio file.
// The full path of the local file must be prefixed with file:// to ensure a valid path, for example: file:///home/images/test.mp3
// The current test system is macOS. If you use Windows, use "file:///ABSOLUTE_PATH/welcome.mp3" instead.
String localFilePath = "file://ABSOLUTE_PATH/welcome.mp3";
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
.content(Arrays.asList(
new HashMap<String, Object>(){{put("audio", localFilePath);}}
))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If environment variable not configured, replace with your API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-omni-30b-a3b-captioner")
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println("Output:\n" + result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
callWithLocalFile();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
Pass by Base64 encoding
OpenAI compatible
Python
import os
from openai import OpenAI
import base64
client = OpenAI(
# API keys for the Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
# Singapore region URL. For Beijing, use: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
def encode_audio(audio_path):
with open(audio_path, "rb") as audio_file:
return base64.b64encode(audio_file.read()).decode("utf-8")
# Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path of your local audio file.
audio_file_path = "xxx/ABSOLUTE_PATH/welcome.mp3"
base64_audio = encode_audio(audio_file_path)
completion = client.chat.completions.create(
model="qwen3-omni-30b-a3b-captioner",
messages=[
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
# When passing a local file with Base64 encoding, you must use the data: prefix to ensure a valid file URL.
# The "base64" keyword must be included before the Base64-encoded data (base64_audio), otherwise an error will occur.
"data": f"data:;base64,{base64_audio}"
},
}
],
},
]
)
print(completion.choices[0].message.content)
Node.js
import OpenAI from "openai";
import { readFileSync } from 'fs';
const openai = new OpenAI(
{
// API keys for the Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
apiKey: process.env.DASHSCOPE_API_KEY,
// Singapore region URL. For Beijing, use: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1"
}
);
const encodeAudio = (audioPath) => {
const audioFile = readFileSync(audioPath);
return audioFile.toString('base64');
};
// Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path of your local audio file.
const base64Audio = encodeAudio("xxx/ABSOLUTE_PATH/welcome.mp3")
const completion = await openai.chat.completions.create({
model: "qwen3-omni-30b-a3b-captioner",
messages: [
{
"role": "user",
"content": [{
"type": "input_audio",
"input_audio": { "data": `data:;base64,${base64Audio}`}
}]
}]
});
console.log(completion.choices[0].message.content);
curl
-
For information about how to convert a file to a Base64-encoded string, see the code sample.
-
For demonstration purposes, the
"data:;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5...."Base64 string is truncated. In practice, you must pass the complete encoded string.
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Singapore region base_url. For Beijing, use: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# === Delete this comment before execution ===
curl -X POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"messages": [
{
"role": "user",
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "data:;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5...."
}
}
]
}
]
}'
DashScope
Python
import os
import base64
import dashscope
dashscope.base_http_api_url="https://dashscope-intl.aliyuncs.com/api/v1"
# Encoding function: Converts a local file to a Base64-encoded string
def encode_audio(audio_file_path):
with open(audio_file_path, "rb") as audio_file:
return base64.b64encode(audio_file.read()).decode("utf-8")
# Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path of your local audio file.
audio_file_path = "xxx/ABSOLUTE_PATH/welcome.mp3"
base64_audio = encode_audio(audio_file_path)
print(base64_audio)
messages = [
{
"role": "user",
# When passing a local file with Base64 encoding, you must use the data: prefix to ensure a valid file URL.
# The "base64" keyword must be included before the Base64-encoded data (base64_audio), otherwise an error will occur.
"content": [{"audio":f"data:;base64,{base64_audio}"}],
}
]
response = dashscope.MultiModalConversation.call(
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key="sk-xxx"
# API keys differ by region. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen3-omni-30b-a3b-captioner",
messages=messages,
)
print(response.output.choices[0].message.content[0]["text"])
Java
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Arrays;
import java.util.Base64;
import java.util.HashMap;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
public class Main {
static {
// The following is the base_url for the Singapore region. If you use a model in the Beijing region, replace the base_url with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl="https://dashscope-intl.aliyuncs.com/api/v1";
}
private static String encodeAudioToBase64(String audioPath) throws IOException {
Path path = Paths.get(audioPath);
byte[] audioBytes = Files.readAllBytes(path);
return Base64.getEncoder().encodeToString(audioBytes);
}
public static void callWithLocalFile()
throws ApiException, NoApiKeyException, UploadFileException,IOException{
// Replace ABSOLUTE_PATH/welcome.mp3 with the actual path of your local file.
String localFilePath = "ABSOLUTE_PATH/welcome.mp3";
String base64Audio = encodeAudioToBase64(localFilePath);
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder().role(Role.USER.getValue())
// When passing a local file with Base64 encoding, you must use the data: prefix to ensure a valid file URL.
// The "base64" keyword must be included before the Base64-encoded data (base64_audio), otherwise an error will occur.
.content(Arrays.asList(
new HashMap<String, Object>(){{put("audio", "data:;base64," + base64Audio);}}
))
.build();
MultiModalConversationParam param = MultiModalConversationParam.builder()
.model("qwen3-omni-30b-a3b-captioner")
// API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.message(userMessage)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println("Output:\n" + result.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
}
public static void main(String[] args) {
try {
callWithLocalFile();
} catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
curl
-
For information about how to convert a file to a Base64-encoded string, see the code sample.
-
For demonstration purposes, the
"data:;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5...."Base64 string is truncated. In practice, you must pass the complete encoded string.
# ======= Important =======
# API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Singapore region base_url. For Beijing, use: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# === Delete this comment before execution ===
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-omni-30b-a3b-captioner",
"input":{
"messages":[
{
"role": "user",
"content": [
{"audio": "data:;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5...."}
]
}
]
}
}'
API reference
For Qwen3-Omni-Captioner parameters, see Qwen.
Error codes
If the model call fails and returns an error message, see Error messages for resolution.
FAQ
Limitations
Audio file limits:
-
Duration: Up to 40 minutes.
-
Number of files: Only one audio file is supported per request.
-
File formats: AMR, WAV (CodecID: GSM_MS), WAV (PCM), 3GP, 3GPP, AAC, and MP3.
-
File input methods: Public URL, Base64 encoding, or local file path.
-
File size:
-
Public URL: No more than 1 GB.
-
File path: The audio file must be smaller than 10 MB.
-
Base64 encoding: The encoded Base64 string must be smaller than 10 MB. For more information, see Pass local file.
To compress a file, see How to compress an audio file to the required size?
-