The Qwen audio file recognition models convert recorded audio into text. The models support features such as multi-language recognition, singing voice recognition, and noise suppression.
Core features
Multi-language recognition: Recognizes multiple languages, including Mandarin and various dialects such as Cantonese and Sichuanese.
Adaptation to complex environments: Handles complex acoustic environments. Supports automatic language detection and intelligent filtering of non-human sounds.
Singing voice recognition: Transcribes entire songs, even with background music (BGM).
Context biasing: Improves recognition accuracy by configuring context. For more information, see Context biasing.
Emotion recognition: Recognizes multiple emotional states, including surprise, calm, happiness, sadness, disgust, anger, and fear.
Availability
Supported regions:
Supported models
This service offers two core models:
Qwen3-ASR-Flash-Filetrans: Designed for asynchronous recognition of long audio files up to 12 hours. Suitable for scenarios such as transcribing meeting records and interviews.
Qwen3-ASR-Flash: Designed for synchronous or streaming recognition of short audio files up to 5 minutes. Suitable for scenarios such as voice messaging and real-time captions.
International (Singapore)
Qwen3-ASR-Flash-Filetrans: qwen3-asr-flash-filetrans, qwen3-asr-flash-filetrans-2025-11-17 (snapshot)
Qwen3-ASR-Flash: qwen3-asr-flash (stable, currently equivalent to qwen3-asr-flash-2025-09-08), qwen3-asr-flash-2025-09-08 (snapshot)
China (Beijing)
Qwen3-ASR-Flash-Filetrans: qwen3-asr-flash-filetrans, qwen3-asr-flash-filetrans-2025-11-17 (snapshot)
Qwen3-ASR-Flash: qwen3-asr-flash (stable, currently equivalent to qwen3-asr-flash-2025-09-08), qwen3-asr-flash-2025-09-08 (snapshot)
Model selection
Scenario | Recommended model | Reason | Notes |
Long audio recognition | qwen3-asr-flash-filetrans | Supports up to 12 hours of recording. Provides emotion recognition and sentence-level timestamps, suitable for later indexing and analysis. | The audio file size cannot exceed 2 GB, and the duration cannot exceed 12 hours. |
Short audio recognition | qwen3-asr-flash | Low-latency recognition for short audio. | The audio file size cannot exceed 10 MB, and the duration cannot exceed 5 minutes. |
Customer service quality inspection | qwen3-asr-flash-filetrans, qwen3-asr-flash | Can analyze customer emotions. | Does not support sensitive word filtering or speaker diarization. Select the appropriate model based on the audio duration. |
News/interview program caption generation | qwen3-asr-flash-filetrans | Long audio, punctuation prediction, and timestamps to directly generate structured captions. | Requires post-processing to generate standard subtitle files. Select the appropriate model based on the audio duration. |
Multilingual video localization | qwen3-asr-flash-filetrans, qwen3-asr-flash | Covers multiple languages and dialects, suitable for cross-language caption creation. | Select the appropriate model based on the audio duration. |
Singing audio analysis | qwen3-asr-flash-filetrans, qwen3-asr-flash | Recognizes lyrics and analyzes emotions, suitable for song indexing and recommendation. | Select the appropriate model based on the audio duration. |
For more information, see Model feature comparison.
Getting started
This service does not currently support online trials, call its API instead. The following code samples show how to call the API.
Before you begin, make sure you have created an API key. To use the SDK, install the latest version of the DashScope SDK.
qwen3-asr-flash-filetrans
qwen3-asr-flash-filetrans is designed for asynchronous recognition of audio files and supports recordings up to 12 hours long. This model requires the input to be a publicly accessible URL of an audio file and does not support direct uploads of local files. It is a non-streaming API that returns all recognition results at once after the task is complete.
cURL
When using cURL for speech recognition, first submit a task to get a task ID (task_id), and then use this ID to retrieve the task execution result.
Submit a task
# ======= Important =======
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
# The API keys for the Singapore and Beijing regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header "Content-Type: application/json" \
--header "X-DashScope-Async: enable" \
--data '{
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
},
"parameters": {
"channel_id":[
0
],
"enable_itn": false
}
}'Get the task execution result
# ======= Important =======
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}. Note: Replace {task_id} with the ID of the task to query.
# The API keys for the Singapore and Beijing regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl --location --request GET 'https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header "X-DashScope-Async: enable" \
--header "Content-Type: application/json"Complete examples
Java
import com.google.gson.Gson;
import com.google.gson.annotations.SerializedName;
import okhttp3.*;
import java.io.IOException;
import java.util.concurrent.TimeUnit;
public class Main {
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
private static final String API_URL_SUBMIT = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription";
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/tasks/
private static final String API_URL_QUERY = "https://dashscope-intl.aliyuncs.com/api/v1/tasks/";
private static final Gson gson = new Gson();
public static void main(String[] args) {
// The API keys for the Singapore and Beijing regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: String apiKey = "sk-xxx"
String apiKey = System.getenv("DASHSCOPE_API_KEY");
OkHttpClient client = new OkHttpClient();
// 1. Submit the task
/*String payloadJson = """
{
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
},
"parameters": {
"channel_id": [0],
"enable_itn": false,
"language": "zh",
"corpus": {
"text": ""
}
}
}
""";*/
String payloadJson = """
{
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
},
"parameters": {
"channel_id": [0],
"enable_itn": false
}
}
""";
RequestBody body = RequestBody.create(payloadJson, MediaType.get("application/json; charset=utf-8"));
Request submitRequest = new Request.Builder()
.url(API_URL_SUBMIT)
.addHeader("Authorization", "Bearer " + apiKey)
.addHeader("Content-Type", "application/json")
.addHeader("X-DashScope-Async", "enable")
.post(body)
.build();
String taskId = null;
try (Response response = client.newCall(submitRequest).execute()) {
if (response.isSuccessful() && response.body() != null) {
String respBody = response.body().string();
ApiResponse apiResp = gson.fromJson(respBody, ApiResponse.class);
if (apiResp.output != null) {
taskId = apiResp.output.taskId;
System.out.println("Task submitted. task_id: " + taskId);
} else {
System.out.println("Submission response content: " + respBody);
return;
}
} else {
System.out.println("Task submission failed! HTTP code: " + response.code());
if (response.body() != null) {
System.out.println(response.body().string());
}
return;
}
} catch (IOException e) {
e.printStackTrace();
return;
}
// 2. Poll the task status
boolean finished = false;
while (!finished) {
try {
TimeUnit.SECONDS.sleep(2); // Wait 2 seconds before querying again
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return;
}
String queryUrl = API_URL_QUERY + taskId;
Request queryRequest = new Request.Builder()
.url(queryUrl)
.addHeader("Authorization", "Bearer " + apiKey)
.addHeader("X-DashScope-Async", "enable")
.addHeader("Content-Type", "application/json")
.get()
.build();
try (Response response = client.newCall(queryRequest).execute()) {
if (response.body() != null) {
String queryResponse = response.body().string();
ApiResponse apiResp = gson.fromJson(queryResponse, ApiResponse.class);
if (apiResp.output != null && apiResp.output.taskStatus != null) {
String status = apiResp.output.taskStatus;
System.out.println("Current task status: " + status);
if ("SUCCEEDED".equalsIgnoreCase(status)
|| "FAILED".equalsIgnoreCase(status)
|| "UNKNOWN".equalsIgnoreCase(status)) {
finished = true;
System.out.println("Task completed. Final result: ");
System.out.println(queryResponse);
}
} else {
System.out.println("Query response content: " + queryResponse);
}
}
} catch (IOException e) {
e.printStackTrace();
return;
}
}
}
static class ApiResponse {
@SerializedName("request_id")
String requestId;
Output output;
}
static class Output {
@SerializedName("task_id")
String taskId;
@SerializedName("task_status")
String taskStatus;
}
}Python
import os
import time
import requests
import json
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
API_URL_SUBMIT = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription"
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/tasks/
API_URL_QUERY_BASE = "https://dashscope-intl.aliyuncs.com/api/v1/tasks/"
def main():
# The API keys for the Singapore and Beijing regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"X-DashScope-Async": "enable"
}
# 1. Submit the task
payload = {
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
},
"parameters": {
"channel_id": [0],
# "language": "zh",
"enable_itn": False
# "corpus": {
# "text": ""
# }
}
}
print("Submitting ASR transcription task...")
try:
submit_resp = requests.post(API_URL_SUBMIT, headers=headers, data=json.dumps(payload))
except requests.RequestException as e:
print(f"Failed to submit the task request: {e}")
return
if submit_resp.status_code != 200:
print(f"Task submission failed! HTTP code: {submit_resp.status_code}")
print(submit_resp.text)
return
resp_data = submit_resp.json()
output = resp_data.get("output")
if not output or "task_id" not in output:
print("Abnormal submission response content:", resp_data)
return
task_id = output["task_id"]
print(f"Task submitted. task_id: {task_id}")
# 2. Poll the task status
finished = False
while not finished:
time.sleep(2) # Wait 2 seconds before querying again
query_url = API_URL_QUERY_BASE + task_id
try:
query_resp = requests.get(query_url, headers=headers)
except requests.RequestException as e:
print(f"Failed to query the task: {e}")
return
if query_resp.status_code != 200:
print(f"Task query failed! HTTP code: {query_resp.status_code}")
print(query_resp.text)
return
query_data = query_resp.json()
output = query_data.get("output")
if output and "task_status" in output:
status = output["task_status"]
print(f"Current task status: {status}")
if status.upper() in ("SUCCEEDED", "FAILED", "UNKNOWN"):
finished = True
print("Task completed. The final result is as follows:")
print(json.dumps(query_data, indent=2, ensure_ascii=False))
else:
print("Query response content:", query_data)
if __name__ == "__main__":
main()Java SDK
import com.alibaba.dashscope.audio.qwen_asr.*;
import com.alibaba.dashscope.utils.Constants;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import com.google.gson.JsonObject;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.ArrayList;
import java.util.HashMap;
public class Main {
public static void main(String[] args) {
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
QwenTranscriptionParam param =
QwenTranscriptionParam.builder()
// The API keys for the Singapore and Beijing regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-asr-flash-filetrans")
.fileUrl("https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_example_1.wav")
//.parameter("language", "zh")
//.parameter("channel_id", new ArrayList<String>(){{add("0");add("1");}})
.parameter("enable_itn", false)
//.parameter("corpus", new HashMap<String, String>() {{put("text", "");}})
.build();
try {
QwenTranscription transcription = new QwenTranscription();
// Submit the task
QwenTranscriptionResult result = transcription.asyncCall(param);
System.out.println("create task result: " + result);
// Query the task status
result = transcription.fetch(QwenTranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
System.out.println("task status: " + result);
// Wait for the task to complete
result =
transcription.wait(
QwenTranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
System.out.println("task result: " + result);
// Get the speech recognition result
QwenTranscriptionTaskResult taskResult = result.getResult();
if (taskResult != null) {
// Get the URL of the recognition result
String transcriptionUrl = taskResult.getTranscriptionUrl();
// Get the result from the URL
HttpURLConnection connection =
(HttpURLConnection) new URL(transcriptionUrl).openConnection();
connection.setRequestMethod("GET");
connection.connect();
BufferedReader reader =
new BufferedReader(new InputStreamReader(connection.getInputStream()));
// Format and print the JSON result
Gson gson = new GsonBuilder().setPrettyPrinting().create();
System.out.println(gson.toJson(gson.fromJson(reader, JsonObject.class)));
}
} catch (Exception e) {
System.out.println("error: " + e);
}
}
}Python SDK
import json
import os
import sys
from http import HTTPStatus
import dashscope
from dashscope.audio.qwen_asr import QwenTranscription
from dashscope.api_entities.dashscope_response import TranscriptionResponse
# run the transcription script
if __name__ == '__main__':
# The API keys for the Singapore and Beijing regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
task_response = QwenTranscription.async_call(
model='qwen3-asr-flash-filetrans',
file_url='https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_example_1.wav',
#language="",
enable_itn=False
#corpus= {
# "text": ""
#}
)
print(f'task_response: {task_response}')
print(task_response.output.task_id)
query_response = QwenTranscription.fetch(task=task_response.output.task_id)
print(f'query_response: {query_response}')
task_result = QwenTranscription.wait(task=task_response.output.task_id)
print(f'task_result: {task_result}')qwen3-asr-flash
qwen3-asr-flash supports audio files up to 5 minutes long. This model accepts a publicly accessible URL of an audio file or a direct upload of a local file as input. It also supports streaming output for recognition results.
Input: Audio file URL
Python SDK
import os
import dashscope
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{"role": "system", "content": [{"text": ""}]}, # Configure the context for custom recognition
{"role": "user", "content": [{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"}]}
]
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore and Beijing regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen3-asr-flash",
messages=messages,
result_format="message",
asr_options={
#"language": "zh", # Optional. If the language of the audio is known, specify it with this parameter to improve recognition accuracy.
"enable_itn":False
}
)
print(response)Java SDK
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import comalibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3")))
.build();
MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
// Configure the context for custom recognition here
.content(Arrays.asList(Collections.singletonMap("text", "")))
.build();
Map<String, Object> asrOptions = new HashMap<>();
asrOptions.put("enable_itn", false);
// asrOptions.put("language", "zh"); // Optional. If the language of the audio is known, specify it with this parameter to improve recognition accuracy.
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore and Beijing regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-asr-flash")
.message(userMessage)
.message(sysMessage)
.parameter("asr_options", asrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(JsonUtils.toJson(result));
}
public static void main(String[] args) {
try {
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}cURL
You can configure the context for custom recognition using the text parameter of the System Message.
# ======= Important =======
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# The API keys for the Singapore and Beijing regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"model": "qwen3-asr-flash",
"input": {
"messages": [
{
"content": [
{
"text": ""
}
],
"role": "system"
},
{
"content": [
{
"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
],
"role": "user"
}
]
},
"parameters": {
"asr_options": {
"enable_itn": false
}
}
}'Input: Base64-encoded audio file
You can input Base64-encoded data (Data URL), which uses the following format: data:<mediatype>;base64,<data>.
<mediatype>: Multipurpose Internet Mail Extensions (MIME) typeThis varies depending on the audio format. For example:
WAV:
audio/wavMP3:
audio/mpegM4A:
audio/mp4
<data>: The Base64-encoded string of the audio.Base64 encoding increases the file size. You must control the original file size to ensure the encoded data does not exceed the input audio size limit of 10 MB.
Example:
data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9
Python SDK
import base64
import dashscope
import os
import pathlib
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# Replace with the actual path to your audio file
file_path = "welcome.mp3"
# Replace with the actual MIME type of your audio file
audio_mime_type = "audio/mpeg"
file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
raise FileNotFoundError(f"Audio file not found: {file_path}")
base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"
messages = [
{"role": "system", "content": [{"text": ""}]}, # Configure the context for custom recognition
{"role": "user", "content": [{"audio": data_uri}]}
]
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore and Beijing regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen3-asr-flash",
messages=messages,
result_format="message",
asr_options={
# "language": "zh", # Optional. If the language of the audio is known, specify it with this parameter to improve recognition accuracy.
"enable_itn":False
}
)
print(response)Java SDK
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.*;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import comalibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
// Replace with the actual path to your audio file
private static final String AUDIO_FILE = "welcome.mp3";
// Replace with the actual MIME type of your audio file
private static final String AUDIO_MIME_TYPE = "audio/mpeg";
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException, IOException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", toDataUrl())))
.build();
MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
// Configure the context for custom recognition here
.content(Arrays.asList(Collections.singletonMap("text", "")))
.build();
Map<String, Object> asrOptions = new HashMap<>();
asrOptions.put("enable_itn", false);
// asrOptions.put("language", "zh"); // Optional. If the language of the audio is known, specify it with this parameter to improve recognition accuracy.
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore and Beijing regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-asr-flash")
.message(userMessage)
.message(sysMessage)
.parameter("asr_options", asrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(JsonUtils.toJson(result));
}
public static void main(String[] args) {
try {
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
// Generate data URI
public static String toDataUrl() throws IOException {
byte[] bytes = Files.readAllBytes(Paths.get(AUDIO_FILE));
String encoded = Base64.getEncoder().encodeToString(bytes);
return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
}
}Input: Absolute path of a local audio file
When using the DashScope SDK to process local audio files, you need to pass the file path. Refer to the following table to create the file path based on your usage and operating system.
System | SDK | File path to pass | Example |
Linux or macOS | Python SDK | file://{absolute_path_to_file} | file:///home/images/test.png |
Java SDK | |||
Windows | Python SDK | file://{absolute_path_to_file} | file://D:/images/test.png |
Java SDK | file:///{absolute_path_to_file} | file:///D:images/test.png |
When you use local files, the API call limit is 100 QPS, and this limit cannot be increased. This method is not recommended for production environments, high-concurrency situations, or stress testing. For higher concurrency, we recommend that you upload the file to OSS and call the API using the audio file URL.
Python SDK
import os
import dashscope
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path to your local audio file.
audio_file_path = "file://ABSOLUTE_PATH/welcome.mp3"
messages = [
{"role": "system", "content": [{"text": ""}]}, # Configure the context for custom recognition
{"role": "user", "content": [{"audio": audio_file_path}]}
]
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore and Beijing regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen3-asr-flash",
messages=messages,
result_format="message",
asr_options={
# "language": "zh", # Optional. If the language of the audio is known, specify it with this parameter to improve recognition accuracy.
"enable_itn":False
}
)
print(response)Java SDK
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
// Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path to your local file.
String localFilePath = "file://ABSOLUTE_PATH/welcome.mp3";
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", localFilePath)))
.build();
MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
// Configure the context for custom recognition here
.content(Arrays.asList(Collections.singletonMap("text", "")))
.build();
Map<String, Object> asrOptions = new HashMap<>();
asrOptions.put("enable_itn", false);
// asrOptions.put("language", "zh"); // Optional. If the language of the audio is known, specify it with this parameter to improve recognition accuracy.
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore and Beijing regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-asr-flash")
.message(userMessage)
.message(sysMessage)
.parameter("asr_options", asrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(JsonUtils.toJson(result));
}
public static void main(String[] args) {
try {
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}Streaming output
The model generates the final result incrementally, not all at once. Non-streaming output waits for the model to finish and then returns the complete result. Streaming output returns intermediate results as they are generated. This lets you see the output as it is generated, which reduces the waiting time. To enable streaming output, set the appropriate parameter based on your calling method:
DashScope Python SDK: Set the
streamparameter to true.DashScope Java SDK: Call the service using the
streamCallinterface.HTTP: Set
X-DashScope-SSEtoenablein the request header.
Python SDK
import os
import dashscope
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{"role": "system", "content": [{"text": ""}]}, # Configure the context for custom recognition
{"role": "user", "content": [{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"}]}
]
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore and Beijing regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen3-asr-flash",
messages=messages,
result_format="message",
asr_options={
# "language": "zh", # Optional. If the language of the audio is known, specify it with this parameter to improve recognition accuracy.
"enable_itn":False
},
stream=True
)
for response in response:
try:
print(response["output"]["choices"][0]["message"].content[0]["text"])
except:
passJava SDK
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import io.reactivex.Flowable;
public class Main {
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3")))
.build();
MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
// Configure the context for custom recognition here
.content(Arrays.asList(Collections.singletonMap("text", "")))
.build();
Map<String, Object> asrOptions = new HashMap<>();
asrOptions.put("enable_itn", false);
// asrOptions.put("language", "zh"); // Optional. If the language of the audio is known, specify it with this parameter to improve recognition accuracy.
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore and Beijing regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-asr-flash")
.message(userMessage)
.message(sysMessage)
.parameter("asr_options", asrOptions)
.build();
Flowable<MultiModalConversationResult> resultFlowable = conv.streamCall(param);
resultFlowable.blockingForEach(item -> {
try {
System.out.println(item.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
} catch (Exception e){
System.exit(0);
}
});
}
public static void main(String[] args) {
try {
// The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}cURL
You can configure the context for custom recognition using the text parameter of the System Message.
# ======= Important =======
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# The API keys for the Singapore and Beijing regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
--header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
--header 'Content-Type: application/json' \
--header 'X-DashScope-SSE: enable' \
--data '{
"model": "qwen3-asr-flash",
"input": {
"messages": [
{
"content": [
{
"text": ""
}
],
"role": "system"
},
{
"content": [
{
"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
],
"role": "user"
}
]
},
"parameters": {
"incremental_output": true,
"asr_options": {
"enable_itn": false
}
}
}'Core usage: Context biasing
Qwen3-ASR supports context biasing. By providing context, you can optimize the recognition of domain-specific vocabulary, such as names, places, and product terms, to significantly improve transcription accuracy. This feature is more flexible and powerful than traditional hotword solutions.
Length limit: The context cannot exceed 10,000 tokens.
Usage: When you call the API, pass the text in the text parameter of the system message.
Supported text types: Includes, but is not limited to:
Hotword lists (in various separator formats, such as hotword 1, hotword 2, hotword 3, hotword 4)
Text paragraphs or chapters of any format and length
Mixed content: Any combination of word lists and paragraphs
Irrelevant or meaningless text (including garbled characters). The model has high fault tolerance for irrelevant text and is almost never negatively affected.
Example:
In this example, the correct transcription for an audio segment is "What jargon from the investment banking circle do you know? First, the nine major foreign investment banks, the Bulge Bracket, BB...".
Without context biasing Without context biasing, the model incorrectly recognizes "Bulge Bracket" as "Bird Rock". Recognition result: "What jargon from the investment banking circle do you know? First, the nine major foreign investment banks, the Bird Rock, BB..." | With context biasing With context biasing, the model correctly recognizes the investment bank names. Recognition result: "What jargon from the investment banking circle do you know? First, the nine major foreign investment banks, the Bulge Bracket, BB..." |
To achieve this result, you can add any of the following content to the context:
Word lists:
Word list 1:
Bulge Bracket, Boutique, Middle Market, domestic securities firmsWord list 2:
Bulge Bracket Boutique Middle Market domestic securities firmsWord list 3:
['Bulge Bracket', 'Boutique', 'Middle Market', 'domestic securities firms']
Natural language:
The secrets of investment banking categories revealed! Recently, many friends from Australia have asked me, what exactly is an investment bank? Today, I'll explain it. For international students, investment banks can be mainly divided into four categories: Bulge Bracket, Boutique, Middle Market, and domestic securities firms. Bulge Bracket investment banks: These are what we often call the nine major investment banks, including Goldman Sachs, Morgan Stanley, etc. These large banks are enormous in both business scope and scale. Boutique investment banks: These banks are relatively small but highly specialized in their business areas. For example, Lazard, Evercore, etc., have deep expertise and experience in specific fields. Middle Market investment banks: This type of bank mainly serves medium-sized companies, providing services such as mergers and acquisitions, and IPOs. Although not as large as the major banks, they have a high influence in specific markets. Domestic securities firms: With the rise of the Chinese market, domestic securities firms are also playing an increasingly important role in the international market. In addition, there are some Position and business divisions, you can refer to the relevant charts. I hope this information helps you better understand investment banking and prepare for your future career!Natural language with interference: The context can contain irrelevant text, such as the names in the following example.
The secrets of investment banking categories revealed! Recently, many friends from Australia have asked me, what exactly is an investment bank? Today, I'll explain it. For international students, investment banks can be mainly divided into four categories: Bulge Bracket, Boutique, Middle Market, and domestic securities firms. Bulge Bracket investment banks: These are what we often call the nine major investment banks, including Goldman Sachs, Morgan Stanley, etc. These large banks are enormous in both business scope and scale. Boutique investment banks: These banks are relatively small but highly specialized in their business areas. For example, Lazard, Evercore, etc., have deep expertise and experience in specific fields. Middle Market investment banks: This type of bank mainly serves medium-sized companies, providing services such as mergers and acquisitions, and IPOs. Although not as large as the major banks, they have a high influence in specific markets. Domestic securities firms: With the rise of the Chinese market, domestic securities firms are also playing an increasingly important role in the international market. In addition, there are some Position and business divisions, you can refer to the relevant charts. I hope this information helps you better understand investment banking and prepare for your future career! Wang Haoxuan, Li Zihan, Zhang Jingxing, Liu Xinyi, Chen Junjie, Yang Siyuan, Zhao Yutong, Huang Zhiqiang, Zhou Zimo, Wu Yajing, Xu Ruoxi, Sun Haoran, Hu Jinyu, Zhu Chenxi, Guo Wenbo, He Jingshu, Gao Yuhang, Lin Yifei, Zheng Xiaoyan, Liang Bowen, Luo Jiaqi, Song Mingzhe, Xie Wanting, Tang Ziqian, Han Mengyao, Feng Yiran, Cao Qinxue, Deng Zirui, Xiao Wangshu, Xu Jiashu, Cheng Yinuo, Yuan Zhiruo, Peng Haoyu, Dong Simiao, Fan Jingyu, Su Zijin, Lv Wenxuan, Jiang Shihan, Ding Muchen, Wei Shuyao, Ren Tianyou, Jiang Yichen, Hua Qingyu, Shen Xinghe, Fu Jinyu, Yao Xingchen, Zhong Lingyu, Yan Licheng, Jin Ruoshui, Taoranting, Qi Shaoshang, Xue Zhilan, Zou Yunfan, Xiong Ziang, Bai Wenfeng, Yi Qianfan
API reference
Model feature comparison
Feature/Attribute | qwen3-asr-flash-filetrans, qwen3-asr-flash-filetrans-2025-11-17 | qwen3-asr-flash, qwen3-asr-flash-2025-09-08 |
Supported languages | Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Türkiye, Ukrainian, Vietnamese | |
Supported audio formats | aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv | aac, amr, avi, aiff, flac, flv, m4a, mkv, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv |
Sample rate | Any | 16 kHz |
Sound channel | Any | Mono |
Input format | A publicly accessible URL of the file to be recognized | Base64-encoded file, absolute path of a local file, or a publicly accessible URL of the file to be recognized |
Audio size/duration | Audio file size cannot exceed 2 GB, and duration cannot exceed 12 hours | Audio file size cannot exceed 10 MB, and duration cannot exceed 5 minutes |
Emotion recognition | . Always on | |
Timestamp | . Always on | |
Punctuation prediction | . Always on | |
Context biasing | . Configurable | |
ITN | . Disabled by default. Can be enabled. Applies only to Chinese and English. | |
Singing voice recognition | . Always on | |
Noise rejection | . Always on | |
Sensitive word filtering | ||
Speaker diarization | ||
Filler word filtering | ||
VAD | . Always on | |
Throttling (RPM) | 100 | |
Connection type | RESTful API | Java/Python SDK, RESTful API |
Price | International (Singapore): $0.000035/second China (Beijing): $0.000032/second | |
FAQ
Q: How do I provide a publicly accessible audio URL for the API?
You can use Object Storage Service (OSS). It is a highly available and reliable storage service that lets you easily generate public access URLs.
To verify that the generated URL is publicly accessible: Access the URL in a browser or with the curl command to ensure the audio file downloads or plays successfully (HTTP status code 200).
Q: How do I check if the audio format meets the requirements?
You can use the open-source tool ffprobe to quickly get detailed information about the audio:
# Query the audio container format (format_name), encoding (codec_name), sample rate (sample_rate), and number of channels (channels)
ffprobe -v error -show_entries format=format_name -show_entries stream=codec_name,sample_rate,channels -of default=noprint_wrappers=1 your_audio_file.mp3Q: How do I process audio to meet the model's requirements?
You can use the open-source tool FFmpeg to crop or convert audio:
Crop audio: Extract a clip from a long audio file
# -i: input file # -ss 00:01:30: Set the crop start time (starts at 1 minute and 30 seconds) # -t 00:02:00: Set the crop duration (2 minutes) # -c copy: Directly copy the audio stream without re-encoding for faster processing # output_clip.wav: output file ffmpeg -i long_audio.wav -ss 00:01:30 -t 00:02:00 -c copy output_clip.wavFormat conversion
For example, you can convert any audio file to a 16 kHz, 16-bit, mono WAV file.
# -i: input file # -ac 1: Set the number of audio channels to 1 (mono) # -ar 16000: Set the sample rate to 16000 Hz (16 kHz) # -sample_fmt s16: Set the sample format to 16-bit signed integer PCM # output.wav: output file ffmpeg -i input.mp3 -ac 1 -ar 16000 -sample_fmt s16 output.wav