Qwen's audio file recognition models convert recorded audio into text. They support features such as multilingual recognition, singing voice recognition, and noise rejection.
Core features
Multilingual recognition: Supports speech recognition for multiple languages, including Mandarin and various dialects such as Cantonese and Sichuanese.
Adaptation to complex environments: Can handle complex acoustic environments. Supports automatic language detection and intelligent filtering of non-human sounds.
Singing voice recognition: Can transcribe an entire song, even with background music (BGM).
Emotion recognition: Supports recognition of multiple emotional states, including surprise, calm, happiness, sadness, disgust, anger, and fear.
Availability
Supported models:
The service provides two core models:
Qwen3-ASR-Flash-Filetrans: Designed for asynchronous recognition of long audio files up to 12 hours. It is suitable for scenarios such as transcribing meetings and interviews.
Qwen3-ASR-Flash: Designed for synchronous or streaming recognition of short audio files up to 5 minutes. It is suitable for scenarios such as voice messaging and real-time captions.
International
Under the international deployment mode, both the endpoint and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled globally (excluding the Chinese mainland).
When you call the following models, select an API key from the Singapore region:
Qwen3-ASR-Flash-Filetrans: qwen3-asr-flash-filetrans (stable version, currently equivalent to qwen3-asr-flash-filetrans-2025-11-17), qwen3-asr-flash-filetrans-2025-11-17 (snapshot version)
Qwen3-ASR-Flash: qwen3-asr-flash (stable version, currently equivalent to qwen3-asr-flash-2025-09-08), qwen3-asr-flash-2026-02-10 (latest snapshot version), qwen3-asr-flash-2025-09-08 (snapshot version)
US
Under the US deployment mode, both the endpoint and data storage are located in the US (Virginia) region. Model inference compute resources are restricted to the US.
When you call the following model, select an API key from the US region:
Qwen3-ASR-Flash: qwen3-asr-flash-us (stable version, currently equivalent to qwen3-asr-flash-2025-09-08-us), qwen3-asr-flash-2025-09-08-us (snapshot version)
Chinese Mainland
Under the Chinese mainland deployment mode, both the endpoint and data storage are located in the Beijing region. Model inference compute resources are restricted to the Chinese mainland.
When you call the following models, select an API key from the Beijing region:
Qwen3-ASR-Flash-Filetrans: qwen3-asr-flash-filetrans (stable version, currently equivalent to qwen3-asr-flash-filetrans-2025-11-17), qwen3-asr-flash-filetrans-2025-11-17 (snapshot version)
Qwen3-ASR-Flash: qwen3-asr-flash (stable version, currently equivalent to qwen3-asr-flash-2025-09-08), qwen3-asr-flash-2026-02-10 (latest snapshot version), qwen3-asr-flash-2025-09-08 (snapshot version)
For more information, see Model list.
Model selection
Scenario | Recommended | Reason | Notes |
Long audio recognition | qwen3-asr-flash-filetrans | Supports recordings up to 12 hours long. Provides emotion recognition and sentence/word-level timestamps, suitable for later indexing and analysis. | The audio file size cannot exceed 2 GB, and the duration cannot exceed 12 hours. |
Short audio recognition | qwen3-asr-flash or qwen3-asr-flash-us | Short audio recognition, low latency. | The audio file size cannot exceed 10 MB, and the duration cannot exceed 5 minutes. |
Customer service quality inspection | qwen3-asr-flash-filetrans, qwen3-asr-flash, or qwen3-asr-flash-us | Can analyze customer emotions. | Does not support sensitive words filter or speaker diarization. Select the appropriate model based on the audio duration. |
Caption generation for news or interviews | qwen3-asr-flash-filetrans | Long audio, punctuation prediction, and timestamps allow for direct generation of structured captions. | Requires post-processing to generate standard subtitle files. Select the appropriate model based on the audio duration. |
Multilingual video localization | qwen3-asr-flash-filetrans, qwen3-asr-flash, or qwen3-asr-flash-us | Covers multiple languages and dialects, suitable for cross-language caption production. | Select the appropriate model based on the audio duration. |
Singing audio analysis | qwen3-asr-flash-filetrans, qwen3-asr-flash, or qwen3-asr-flash-us | Recognizes lyrics and analyzes emotions, suitable for song indexing and recommendations. | Select the appropriate model based on the audio duration. |
For more information, see Compare models.
Getting started
Before you begin, get an API key. To use an SDK, install the latest version of the SDK.
DashScope
Qwen3-ASR-Flash-Filetrans
Qwen3-ASR-Flash-Filetrans is designed for asynchronous transcription of audio files and supports recordings up to 12 hours long. This model requires a publicly accessible URL of an audio file as input and does not support direct uploads of local files. It is a non-streaming API that returns the complete recognition result after the task completes.
cURL
When you use cURL for speech recognition, first submit a task to get a task ID (task_id), and then use the ID to retrieve the task result.
Submit a task
# ======= Important =======
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
# The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-Async: enable" \
-d '{
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
},
"parameters": {
"channel_id":[
0
],
"enable_itn": false,
"enable_words": true
}
}'Get the task result
# ======= Important =======
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}. Replace {task_id} with your actual task ID.
# The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl -X GET 'https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "X-DashScope-Async: enable" \
-H "Content-Type: application/json"Complete example
Java
import com.google.gson.Gson;
import com.google.gson.annotations.SerializedName;
import okhttp3.*;
import java.io.IOException;
import java.util.concurrent.TimeUnit;
public class Main {
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
private static final String API_URL_SUBMIT = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription";
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/tasks/
private static final String API_URL_QUERY = "https://dashscope-intl.aliyuncs.com/api/v1/tasks/";
private static final Gson gson = new Gson();
public static void main(String[] args) {
// The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured environment variables, replace the following line with: String apiKey = "sk-xxx"
String apiKey = System.getenv("DASHSCOPE_API_KEY");
OkHttpClient client = new OkHttpClient();
// 1. Submit task
/*String payloadJson = """
{
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
},
"parameters": {
"channel_id": [0],
"enable_itn": false,
"language": "zh"
}
}
""";*/
String payloadJson = """
{
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
},
"parameters": {
"channel_id": [0],
"enable_itn": false,
"enable_words": true
}
}
""";
RequestBody body = RequestBody.create(payloadJson, MediaType.get("application/json; charset=utf-8"));
Request submitRequest = new Request.Builder()
.url(API_URL_SUBMIT)
.addHeader("Authorization", "Bearer " + apiKey)
.addHeader("Content-Type", "application/json")
.addHeader("X-DashScope-Async", "enable")
.post(body)
.build();
String taskId = null;
try (Response response = client.newCall(submitRequest).execute()) {
if (response.isSuccessful() && response.body() != null) {
String respBody = response.body().string();
ApiResponse apiResp = gson.fromJson(respBody, ApiResponse.class);
if (apiResp.output != null) {
taskId = apiResp.output.taskId;
System.out.println("Task submitted. task_id: " + taskId);
} else {
System.out.println("Submission response content: " + respBody);
return;
}
} else {
System.out.println("Task submission failed! HTTP code: " + response.code());
if (response.body() != null) {
System.out.println(response.body().string());
}
return;
}
} catch (IOException e) {
e.printStackTrace();
return;
}
// 2. Poll task status
boolean finished = false;
while (!finished) {
try {
TimeUnit.SECONDS.sleep(2); // Wait for 2 seconds before querying again
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return;
}
String queryUrl = API_URL_QUERY + taskId;
Request queryRequest = new Request.Builder()
.url(queryUrl)
.addHeader("Authorization", "Bearer " + apiKey)
.addHeader("X-DashScope-Async", "enable")
.addHeader("Content-Type", "application/json")
.get()
.build();
try (Response response = client.newCall(queryRequest).execute()) {
if (response.body() != null) {
String queryResponse = response.body().string();
ApiResponse apiResp = gson.fromJson(queryResponse, ApiResponse.class);
if (apiResp.output != null && apiResp.output.taskStatus != null) {
String status = apiResp.output.taskStatus;
System.out.println("Current task status: " + status);
if ("SUCCEEDED".equalsIgnoreCase(status)
|| "FAILED".equalsIgnoreCase(status)
|| "UNKNOWN".equalsIgnoreCase(status)) {
finished = true;
System.out.println("Task completed. Final result: ");
System.out.println(queryResponse);
}
} else {
System.out.println("Query response content: " + queryResponse);
}
}
} catch (IOException e) {
e.printStackTrace();
return;
}
}
}
static class ApiResponse {
@SerializedName("request_id")
String requestId;
Output output;
}
static class Output {
@SerializedName("task_id")
String taskId;
@SerializedName("task_status")
String taskStatus;
}
}Python
import os
import time
import requests
import json
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
API_URL_SUBMIT = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription"
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/tasks/
API_URL_QUERY_BASE = "https://dashscope-intl.aliyuncs.com/api/v1/tasks/"
def main():
# The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured environment variables, replace the following line with: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"X-DashScope-Async": "enable"
}
# 1. Submit the task
payload = {
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
},
"parameters": {
"channel_id": [0],
# "language": "zh",
"enable_itn": False,
"enable_words": True
}
}
print("Submitting ASR transcription task...")
try:
submit_resp = requests.post(API_URL_SUBMIT, headers=headers, data=json.dumps(payload))
except requests.RequestException as e:
print(f"Failed to submit task request: {e}")
return
if submit_resp.status_code != 200:
print(f"Task submission failed! HTTP code: {submit_resp.status_code}")
print(submit_resp.text)
return
resp_data = submit_resp.json()
output = resp_data.get("output")
if not output or "task_id" not in output:
print("Abnormal submission response content:", resp_data)
return
task_id = output["task_id"]
print(f"Task submitted. task_id: {task_id}")
# 2. Poll the task status
finished = False
while not finished:
time.sleep(2) # Wait for 2 seconds before querying again
query_url = API_URL_QUERY_BASE + task_id
try:
query_resp = requests.get(query_url, headers=headers)
except requests.RequestException as e:
print(f"Failed to query task: {e}")
return
if query_resp.status_code != 200:
print(f"Task query failed! HTTP code: {query_resp.status_code}")
print(query_resp.text)
return
query_data = query_resp.json()
output = query_data.get("output")
if output and "task_status" in output:
status = output["task_status"]
print(f"Current task status: {status}")
if status.upper() in ("SUCCEEDED", "FAILED", "UNKNOWN"):
finished = True
print("Task completed. Final result:")
print(json.dumps(query_data, indent=2, ensure_ascii=False))
else:
print("Query response content:", query_data)
if __name__ == "__main__":
main()Java SDK
import com.alibaba.dashscope.audio.qwen_asr.*;
import com.alibaba.dashscope.utils.Constants;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import com.google.gson.JsonObject;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.ArrayList;
import java.util.HashMap;
public class Main {
public static void main(String[] args) {
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
QwenTranscriptionParam param =
QwenTranscriptionParam.builder()
// The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-asr-flash-filetrans")
.fileUrl("https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_example_1.wav")
//.parameter("language", "zh")
//.parameter("channel_id", new ArrayList<String>(){{add("0");add("1");}})
.parameter("enable_itn", false)
.parameter("enable_words", true)
.build();
try {
QwenTranscription transcription = new QwenTranscription();
// Submit the task
QwenTranscriptionResult result = transcription.asyncCall(param);
System.out.println("create task result: " + result);
// Query the task status
result = transcription.fetch(QwenTranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
System.out.println("task status: " + result);
// Wait for the task to complete
result =
transcription.wait(
QwenTranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
System.out.println("task result: " + result);
// Get the speech recognition result
QwenTranscriptionTaskResult taskResult = result.getResult();
if (taskResult != null) {
// Get the URL of the recognition result
String transcriptionUrl = taskResult.getTranscriptionUrl();
// Get the result from the URL
HttpURLConnection connection =
(HttpURLConnection) new URL(transcriptionUrl).openConnection();
connection.setRequestMethod("GET");
connection.connect();
BufferedReader reader =
new BufferedReader(new InputStreamReader(connection.getInputStream()));
// Format and print the JSON result
Gson gson = new GsonBuilder().setPrettyPrinting().create();
System.out.println(gson.toJson(gson.fromJson(reader, JsonObject.class)));
}
} catch (Exception e) {
System.out.println("error: " + e);
}
}
}Python SDK
import json
import os
import sys
from http import HTTPStatus
import dashscope
from dashscope.audio.qwen_asr import QwenTranscription
from dashscope.api_entities.dashscope_response import TranscriptionResponse
# run the transcription script
if __name__ == '__main__':
# The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured environment variables, replace the following line with: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
task_response = QwenTranscription.async_call(
model='qwen3-asr-flash-filetrans',
file_url='https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_example_1.wav',
#language="",
enable_itn=False,
enable_words=True
)
print(f'task_response: {task_response}')
print(task_response.output.task_id)
query_response = QwenTranscription.fetch(task=task_response.output.task_id)
print(f'query_response: {query_response}')
task_result = QwenTranscription.wait(task=task_response.output.task_id)
print(f'task_result: {task_result}')Qwen3-ASR-Flash
Qwen3-ASR-Flash supports recordings up to 5 minutes long. This model accepts a publicly accessible audio file URL or a direct upload of a local file as input. It can also return recognition results as a stream.
Input: Audio file URL
Python SDK
import os
import dashscope
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{"role": "user", "content": [{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"}]}
]
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured environment variables, replace the following line with: api_key = "sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
# If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
model="qwen3-asr-flash",
messages=messages,
result_format="message",
asr_options={
#"language": "zh", # Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
"enable_itn":False
}
)
print(response)Java SDK
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3")))
.build();
MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
.build();
Map<String, Object> asrOptions = new HashMap<>();
asrOptions.put("enable_itn", false);
// asrOptions.put("language", "zh"); // Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
.model("qwen3-asr-flash")
.message(sysMessage)
.message(userMessage)
.parameter("asr_options", asrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(JsonUtils.toJson(result));
}
public static void main(String[] args) {
try {
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}cURL
# ======= Important =======
# The following URL is for the Singapore region. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you use a model in the US region, add the "us" suffix
# === Delete this comment before execution ===
curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-asr-flash",
"input": {
"messages": [
{
"content": [
{
"text": ""
}
],
"role": "system"
},
{
"content": [
{
"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
],
"role": "user"
}
]
},
"parameters": {
"asr_options": {
"enable_itn": false
}
}
}'Input: Base64-encoded audio file
Input Base64-encoded data (Data URL) in the format: data:<mediatype>;base64,<data>.
<mediatype>: MIME typeVaries by audio format, for example:
WAV:
audio/wavMP3:
audio/mpeg
<data>: Base64-encoded string of the audioBase64 encoding increases file size. Keep the original file small enough so the encoded data stays within the 10 MB input limit.
Example:
data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9
Python SDK
The example uses the audio file: welcome.mp3.
import base64
import dashscope
import os
import pathlib
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# Replace with your actual audio file path
file_path = "welcome.mp3"
# Replace with your actual audio file MIME type
audio_mime_type = "audio/mpeg"
file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
raise FileNotFoundError(f"Audio file not found: {file_path}")
base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"
messages = [
{"role": "user", "content": [{"audio": data_uri}]}
]
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured environment variables, replace the following line with: api_key = "sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
# If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
model="qwen3-asr-flash",
messages=messages,
result_format="message",
asr_options={
# "language": "zh", # Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
"enable_itn":False
}
)
print(response)Java SDK
The example uses the audio file: welcome.mp3.
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.*;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
// Replace with your actual audio file path
private static final String AUDIO_FILE = "welcome.mp3";
// Replace with your actual audio file MIME type
private static final String AUDIO_MIME_TYPE = "audio/mpeg";
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException, IOException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", toDataUrl())))
.build();
MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
.build();
Map<String, Object> asrOptions = new HashMap<>();
asrOptions.put("enable_itn", false);
// asrOptions.put("language", "zh"); // Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
.model("qwen3-asr-flash")
.message(sysMessage)
.message(userMessage)
.parameter("asr_options", asrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(JsonUtils.toJson(result));
}
public static void main(String[] args) {
try {
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
// Generate data URI
public static String toDataUrl() throws IOException {
byte[] bytes = Files.readAllBytes(Paths.get(AUDIO_FILE));
String encoded = Base64.getEncoder().encodeToString(bytes);
return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
}
}Input: Absolute path to local audio file
When using the DashScope SDK to process local image files, you must provide the file path. The following table shows how to construct the file path for your specific scenario and operating system.
System | SDK | Input file path | Example |
Linux or macOS | Python SDK | file://{absolute file path} | file:///home/images/test.png |
Java SDK | |||
Windows | Python SDK | file://{absolute file path} | file://D:/images/test.png |
Java SDK | file:///{absolute file path} | file:///D:images/test.png |
When using local files, the API call limit is 100 QPS and cannot be scaled. Do not use this method in production environments, high-concurrency scenarios, or stress testing. For higher concurrency, upload files to OSS and call the API using the audio file URL.
Python SDK
The example uses the audio file: welcome.mp3.
import os
import dashscope
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path to your local audio file
audio_file_path = "file://ABSOLUTE_PATH/welcome.mp3"
messages = [
{"role": "user", "content": [{"audio": audio_file_path}]}
]
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured environment variables, replace the following line with: api_key = "sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
# If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
model="qwen3-asr-flash",
messages=messages,
result_format="message",
asr_options={
# "language": "zh", # Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
"enable_itn":False
}
)
print(response)Java SDK
The example uses the audio file: welcome.mp3.
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
// Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path to your local file
String localFilePath = "file://ABSOLUTE_PATH/welcome.mp3";
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", localFilePath)))
.build();
MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
.build();
Map<String, Object> asrOptions = new HashMap<>();
asrOptions.put("enable_itn", false);
// asrOptions.put("language", "zh"); // Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
.model("qwen3-asr-flash")
.message(sysMessage)
.message(userMessage)
.parameter("asr_options", asrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(JsonUtils.toJson(result));
}
public static void main(String[] args) {
try {
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}Streaming output
The model generates results incrementally rather than all at once. Non-streaming output waits until the model finishes generating and then returns the complete result. Streaming output returns intermediate results in real time, letting you read results as they are generated and reducing wait time. Set parameters differently based on your calling method to enable streaming output:
DashScope Python SDK: Set the
streamparameter to true.DashScope Java SDK: Use the
streamCallinterface.DashScope HTTP: Set the
X-DashScope-SSEheader toenable.
Python SDK
import os
import dashscope
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{"role": "user", "content": [{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"}]}
]
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured environment variables, replace the following line with: api_key = "sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
# If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
model="qwen3-asr-flash",
messages=messages,
result_format="message",
asr_options={
# "language": "zh", # Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
"enable_itn":False
},
stream=True
)
for response in response:
try:
print(response["output"]["choices"][0]["message"].content[0]["text"])
except:
passJava SDK
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import io.reactivex.Flowable;
public class Main {
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3")))
.build();
MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
.build();
Map<String, Object> asrOptions = new HashMap<>();
asrOptions.put("enable_itn", false);
// asrOptions.put("language", "zh"); // Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
.model("qwen3-asr-flash")
.message(sysMessage)
.message(userMessage)
.parameter("asr_options", asrOptions)
.build();
Flowable<MultiModalConversationResult> resultFlowable = conv.streamCall(param);
resultFlowable.blockingForEach(item -> {
try {
System.out.println(item.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
} catch (Exception e){
System.exit(0);
}
});
}
public static void main(String[] args) {
try {
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}cURL
# ======= Important =======
# The following URL is for the Singapore region. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you use a model in the US region, add the "us" suffix
# === Delete this comment before execution ===
curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "qwen3-asr-flash",
"input": {
"messages": [
{
"content": [
{
"text": ""
}
],
"role": "system"
},
{
"content": [
{
"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
],
"role": "user"
}
]
},
"parameters": {
"incremental_output": true,
"asr_options": {
"enable_itn": false
}
}
}'OpenAI compatible
The US region does not support OpenAI-compatible mode.
Only the Qwen3-ASR-Flash series models support OpenAI-compatible calls. OpenAI-compatible mode only accepts publicly accessible audio file URLs and does not support local file paths.
Use OpenAI Python SDK version 1.52.0 or later, and Node.js SDK version 4.68.0 or later.
The asr_options parameter is not part of the OpenAI standard. When using the OpenAI SDK, pass it through extra_body.
Input: Audio file URL
Python SDK
from openai import OpenAI
import os
try:
client = OpenAI(
# The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured environment variables, replace the following line with: api_key = "sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore/US region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
stream_enabled = False # Enable streaming output
completion = client.chat.completions.create(
model="qwen3-asr-flash",
messages=[
{
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
}
],
"role": "user"
}
],
stream=stream_enabled,
# Do not set stream_options when stream is False
# stream_options={"include_usage": True},
extra_body={
"asr_options": {
# "language": "zh",
"enable_itn": False
}
}
)
if stream_enabled:
full_content = ""
print("Streaming output:")
for chunk in completion:
# If stream_options.include_usage is True, the last chunk's choices field is an empty list and should be skipped (token usage can be obtained via chunk.usage)
print(chunk)
if chunk.choices and chunk.choices[0].delta.content:
full_content += chunk.choices[0].delta.content
print(f"Full content: {full_content}")
else:
print(f"Non-streaming output: {completion.choices[0].message.content}")
except Exception as e:
print(f"Error: {e}")Node.js SDK
// Preparation before running:
// Works on Windows/Mac/Linux:
// 1. Ensure Node.js is installed (version >= 14 recommended)
// 2. Run this command to install dependencies: npm install openai
import OpenAI from "openai";
const client = new OpenAI({
// The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured environment variables, replace the following line with: apiKey: "sk-xxx",
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore/US region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
});
async function main() {
try {
const streamEnabled = false; // Enable streaming output
const completion = await client.chat.completions.create({
model: "qwen3-asr-flash",
messages: [
{
role: "user",
content: [
{
type: "input_audio",
input_audio: {
data: "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
}
]
}
],
stream: streamEnabled,
// Do not set stream_options when stream is False
// stream_options: {
// "include_usage": true
// },
extra_body: {
asr_options: {
// language: "zh",
enable_itn: false
}
}
});
if (streamEnabled) {
let fullContent = "";
console.log("Streaming output:");
for await (const chunk of completion) {
console.log(JSON.stringify(chunk));
if (chunk.choices && chunk.choices.length > 0) {
const delta = chunk.choices[0].delta;
if (delta && delta.content) {
fullContent += delta.content;
}
}
}
console.log(`Full content: ${fullContent}`);
} else {
console.log(`Non-streaming output: ${completion.choices[0].message.content}`);
}
} catch (err) {
console.error(`Error: ${err}`);
}
}
main();cURL
# ======= Important =======
# The following URL is for the Singapore/US region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl -X POST 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-asr-flash",
"messages": [
{
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
}
],
"role": "user"
}
],
"stream":false,
"asr_options": {
"enable_itn": false
}
}'Input: Base64-encoded audio file
Input Base64-encoded data (Data URL) in the format: data:<mediatype>;base64,<data>.
<mediatype>: MIME typeVaries by audio format, for example:
WAV:
audio/wavMP3:
audio/mpeg
<data>: Base64-encoded string of the audioBase64 encoding increases file size. Keep the original file small enough so the encoded data stays within the 10 MB input limit.
Example:
data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9
Python SDK
The example uses the audio file: welcome.mp3.
import base64
from openai import OpenAI
import os
import pathlib
try:
# Replace with your actual audio file path
file_path = "welcome.mp3"
# Replace with your actual audio file MIME type
audio_mime_type = "audio/mpeg"
file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
raise FileNotFoundError(f"Audio file not found: {file_path}")
base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"
client = OpenAI(
# The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured environment variables, replace the following line with: api_key = "sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
# The following URL is for the Singapore/US region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
stream_enabled = False # Enable streaming output
completion = client.chat.completions.create(
model="qwen3-asr-flash",
messages=[
{
"content": [
{
"type": "input_audio",
"input_audio": {
"data": data_uri
}
}
],
"role": "user"
}
],
stream=stream_enabled,
# Do not set stream_options when stream is False
# stream_options={"include_usage": True},
extra_body={
"asr_options": {
# "language": "zh",
"enable_itn": False
}
}
)
if stream_enabled:
full_content = ""
print("Streaming output:")
for chunk in completion:
# If stream_options.include_usage is True, the last chunk's choices field is an empty list and should be skipped (token usage can be obtained via chunk.usage)
print(chunk)
if chunk.choices and chunk.choices[0].delta.content:
full_content += chunk.choices[0].delta.content
print(f"Full content: {full_content}")
else:
print(f"Non-streaming output: {completion.choices[0].message.content}")
except Exception as e:
print(f"Error: {e}")Node.js SDK
The example uses the audio file: welcome.mp3.
// Preparation before running:
// Works on Windows/Mac/Linux:
// 1. Ensure Node.js is installed (version >= 14 recommended)
// 2. Run this command to install dependencies: npm install openai
import OpenAI from "openai";
import { readFileSync } from 'fs';
const client = new OpenAI({
// The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured environment variables, replace the following line with: apiKey: "sk-xxx",
apiKey: process.env.DASHSCOPE_API_KEY,
// The following URL is for the Singapore/US region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
});
const encodeAudioFile = (audioFilePath) => {
const audioFile = readFileSync(audioFilePath);
return audioFile.toString('base64');
};
// Replace with your actual audio file path
const dataUri = `data:audio/mpeg;base64,${encodeAudioFile("welcome.mp3")}`;
async function main() {
try {
const streamEnabled = false; // Enable streaming output
const completion = await client.chat.completions.create({
model: "qwen3-asr-flash",
messages: [
{
role: "user",
content: [
{
type: "input_audio",
input_audio: {
data: dataUri
}
}
]
}
],
stream: streamEnabled,
// Do not set stream_options when stream is False
// stream_options: {
// "include_usage": true
// },
extra_body: {
asr_options: {
// language: "zh",
enable_itn: false
}
}
});
if (streamEnabled) {
let fullContent = "";
console.log("Streaming output:");
for await (const chunk of completion) {
console.log(JSON.stringify(chunk));
if (chunk.choices && chunk.choices.length > 0) {
const delta = chunk.choices[0].delta;
if (delta && delta.content) {
fullContent += delta.content;
}
}
}
console.log(`Full content: ${fullContent}`);
} else {
console.log(`Non-streaming output: ${completion.choices[0].message.content}`);
}
} catch (err) {
console.error(`Error: ${err}`);
}
}
main();API reference
Compare models
The feature set for qwen3-asr-flash and qwen3-asr-flash-2025-09-08 also applies to their US (Virginia) region counterparts: qwen3-asr-flash-us and qwen3-asr-flash-2025-09-08-us.
Feature | qwen3-asr-flash-filetrans, qwen3-asr-flash-filetrans-2025-11-17 | qwen3-asr-flash,qwen3-asr-flash-2026-02-10, qwen3-asr-flash-2025-09-08 |
Supported languages | Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish | |
Audio formats | aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv | aac, amr, avi, aiff, flac, flv, mkv, mp3, mpeg, ogg, opus, wav, webm, wma, wmv |
Sample rate | Depends on audio format:
| |
Sound channels | Any Models handle multi-channel audio differently:
| |
Input method | Publicly accessible file URL | Base64-encoded file, absolute local file path, publicly accessible file URL |
Audio size/duration | File size ≤ 2 GB, duration ≤ 12 hours | File size ≤ 10 MB, duration ≤ 5 minutes |
Emotion recognition | Always enabled. View results via the response parameter | |
Timestamps | Always enabled. Control timestamp granularity via the request parameter Character-level timestamps are only guaranteed for: Chinese, English, Japanese, Korean, German, French, Spanish, Italian, Portuguese, Russian. Accuracy may vary for other languages. | |
Punctuation prediction | Always enabled | |
ITN | Disabled by default, can be enabled. Only for Chinese and English. | |
Singing recognition | Always enabled | |
Noise rejection | Always enabled | |
Sensitive words filter | ||
Speaker diarization | ||
Filler word filtering | ||
VAD | Always enabled | |
Rate limit (RPM) | 100 | |
Connection type | DashScope: Java/Python SDK, RESTful API | DashScope: Java/Python SDK, RESTful API OpenAI: Python/Node.js SDK, RESTful API |
Pricing | International: $0.000035/second US: $0.000032/second Chinese mainland: $0.000032/second | |
FAQ
Q: How do I provide a publicly accessible audio URL for the API?
We recommend using Object Storage Service (OSS), which provides highly available and reliable storage and easily generates public URLs.
Verify your URL is publicly accessible: Open the URL in a browser or use curl to ensure the audio file downloads or plays successfully (HTTP status code 200).
Q: How do I check if my audio format meets requirements?
Use the open-source tool ffprobe to quickly get detailed audio information:
# Check container format (format_name), codec (codec_name), sample rate (sample_rate), and number of channels (channels)
ffprobe -v error -show_entries format=format_name -show_entries stream=codec_name,sample_rate,channels -of default=noprint_wrappers=1 your_audio_file.mp3Q: How do I process audio to meet model requirements?
Use the open-source tool FFmpeg to trim or convert audio formats:
Audio trimming: Extract a segment from a long audio file
# -i: Input file # -ss 00:01:30: Start time (1 minute 30 seconds) # -t 00:02:00: Duration (2 minutes) # -c copy: Copy audio stream without re-encoding (fast) # output_clip.wav: Output file ffmpeg -i long_audio.wav -ss 00:01:30 -t 00:02:00 -c copy output_clip.wavFormat conversion
For example, convert any audio to 16 kHz, 16-bit, mono WAV
# -i: Input file # -ac 1: Set to 1 channel (mono) # -ar 16000: Set sample rate to 16000 Hz (16 kHz) # -sample_fmt s16: Set sample format to 16-bit signed integer PCM # output.wav: Output file ffmpeg -i input.mp3 -ac 1 -ar 16000 -sample_fmt s16 output.wav