Request body | Qwen3-ASR-FlashThe following example shows how to recognize an audio file from a URL. For an example of how to recognize a local audio file, see Getting started. cURL# ======= Important =======
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
# === Delete this comment before execution ===
curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen3-asr-flash",
"input": {
"messages": [
{
"content": [
{
"text": ""
}
],
"role": "system"
},
{
"content": [
{
"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
],
"role": "user"
}
]
},
"parameters": {
"asr_options": {
"enable_itn": false
}
}
}'
Javaimport java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3")))
.build();
MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
// Configure the context for customized recognition here
.content(Arrays.asList(Collections.singletonMap("text", "")))
.build();
Map<String, Object> asrOptions = new HashMap<>();
asrOptions.put("enable_itn", false);
// asrOptions.put("language", "en"); // Optional. If the language of the audio is known, specify the language with this parameter to improve recognition accuracy.
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
// If the environment variable is not configured, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-asr-flash")
.message(userMessage)
.message(sysMessage)
.parameter("asr_options", asrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(JsonUtils.toJson(result));
}
public static void main(String[] args) {
try {
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
Pythonimport os
import dashscope
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{"role": "system", "content": [{"text": ""}]}, # Configure the context for customized recognition
{"role": "user", "content": [{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"}]}
]
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
# If the environment variable is not configured, replace the following line with your Model Studio API key: api_key = "sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen3-asr-flash",
messages=messages,
result_format="message",
asr_options={
#"language": "en", # Optional. If the language of the audio is known, specify the language with this parameter to improve recognition accuracy.
"enable_itn":False
}
)
print(response)
|
model string (Required) The name of the model. This parameter applies only to the Qwen3-ASR-Flash and Qwen-Audio ASR models. |
messages array (Required) The list of messages. When you make an HTTP call, place messages in the input object. Message types System Message object (Optional) The goal or role of the model. If you specify a system message, it must be the first message in the list. This parameter is supported only by Qwen3-ASR-Flash. Qwen-Audio ASR does not support this parameter. Properties content array (Required) The message content. Properties text string Specifies the context. Qwen3-ASR-Flash lets you provide reference information, such as background text and entity vocabularies, as context during speech recognition to obtain customized results. Length limit: 10,000 tokens. For more information, see Context enhancement. role string (Required) Set to system. User Message object (Required) The message sent from the user to the model. Properties content array (Required) The content of the user message. Properties audio string (Required) The audio to be recognized. For more information about how to use this parameter, see Getting started. The Qwen3-ASR-Flash model supports three input formats: a Base64-encoded file, the absolute path of a local file, or a URL of a file accessible over the public network. The Qwen-Audio ASR model supports two input formats: the absolute path of a local file or a URL of a file accessible over the public network. When you use a SDK, if the audio file is stored in Object Storage Service (OSS), temporary URLs that start with oss:// are not supported. When you use a RESTful API, if the audio file is stored in OSS, temporary URLs that start with oss:// are supported. However, note the following:
Important The temporary URL is valid for 48 hours and cannot be used after it expires. Do not use it in a production environment. The API for obtaining an upload credential is limited to 100 QPS and does not support scaling out. Do not use it in production environments, high-concurrency scenarios, or stress testing scenarios. For production environments, use a stable storage service such as Alibaba Cloud OSS to ensure long-term file availability and avoid rate limiting issues.
role string (Required) The role of the user message. Set to user. |
asr_options object (Optional) Specifies whether to enable certain features. This parameter is supported only by Qwen3-ASR-Flash. Qwen-Audio ASR does not support this parameter. Properties language string (Optional) No default value If you know the language of the audio, you can specify it using this parameter to improve recognition accuracy. You can specify only one language. If the language of the audio is uncertain or includes multiple languages, such as a mix of Chinese, English, Japanese, and Korean, do not specify this parameter. Valid values: enable_itn boolean (Optional) Defaults to: false Specifies whether to enable Inverse Text Normalization (ITN). This feature is applicable only to Chinese and English audio. Valid values: |