Qwen's audio file recognition converts recorded audio to text with multilingual recognition, singing voice transcription, and noise rejection.
Core features
-
Multilingual recognition: Supports multiple languages including Mandarin and dialects such as Cantonese and Sichuanese.
-
Adaptation to complex environments: Supports complex acoustic environments with automatic language detection and intelligent filtering of non-human sounds.
-
Singing voice recognition: Transcribe entire songs with background music (BGM).
-
Emotion recognition: Recognizes multiple emotional states including surprise, calm, happiness, sadness, disgust, anger, and fear.
Availability
Supported models:
Qwen provides two core models:
-
Qwen3-ASR-Flash-Filetrans: Designed for asynchronous recognition of long audio files up to 12 hours. Ideal for transcribing meetings and interviews.
-
Qwen3-ASR-Flash: Designed for synchronous or streaming recognition of short audio files up to 5 minutes. Ideal for voice messaging and real-time captions.
International
international deployment mode: Endpoint and data in Singapore region. Compute resources scheduled globally (excluding Chinese mainland).
Select an API key from the Singapore region:
-
Qwen3-ASR-Flash-Filetrans: qwen3-asr-flash-filetrans (stable, currently qwen3-asr-flash-filetrans-2025-11-17), qwen3-asr-flash-filetrans-2025-11-17 (snapshot)
-
Qwen3-ASR-Flash: qwen3-asr-flash (stable, currently qwen3-asr-flash-2025-09-08), qwen3-asr-flash-2026-02-10 (latest snapshot), qwen3-asr-flash-2025-09-08 (snapshot)
US
US deployment mode: Endpoint and data in US (Virginia) region. Compute resources restricted to US.
Select an API key from the US region:
Qwen3-ASR-Flash: qwen3-asr-flash-us (stable, currently qwen3-asr-flash-2025-09-08-us), qwen3-asr-flash-2025-09-08-us (snapshot)
Chinese Mainland
Chinese mainland deployment mode: Endpoint and data in Beijing region. Compute resources restricted to Chinese mainland.
Select an API key from the Beijing region:
-
Qwen3-ASR-Flash-Filetrans: qwen3-asr-flash-filetrans (stable, currently qwen3-asr-flash-filetrans-2025-11-17), qwen3-asr-flash-filetrans-2025-11-17 (snapshot)
-
Qwen3-ASR-Flash: qwen3-asr-flash (stable, currently qwen3-asr-flash-2025-09-08), qwen3-asr-flash-2026-02-10 (latest snapshot), qwen3-asr-flash-2025-09-08 (snapshot)
See Model list.
Model selection
|
Scenario |
Recommended |
Reason |
Notes |
|
Long audio recognition |
qwen3-asr-flash-filetrans |
Supports recordings up to 12 hours. Provides emotion recognition and timestamps for indexing and analysis. |
Audio file size cannot exceed 2 GB, and duration cannot exceed 12 hours. |
|
Short audio recognition |
qwen3-asr-flash or qwen3-asr-flash-us |
Short audio recognition, low latency. |
Audio file size cannot exceed 10 MB, and duration cannot exceed 5 minutes. |
|
Customer service quality inspection |
qwen3-asr-flash-filetrans, qwen3-asr-flash, or qwen3-asr-flash-us |
Can analyze customer emotions. |
No sensitive words filter or speaker diarization. Select model by audio duration. |
|
Caption generation for news or interviews |
qwen3-asr-flash-filetrans |
Long audio with punctuation and timestamps for structured captions. |
Requires post-processing for standard subtitle files. Select model by audio duration. |
|
Multilingual video localization |
qwen3-asr-flash-filetrans, qwen3-asr-flash, or qwen3-asr-flash-us |
Covers multiple languages and dialects, suitable for cross-language caption production. |
Select model by audio duration. |
|
Singing audio analysis |
qwen3-asr-flash-filetrans, qwen3-asr-flash, or qwen3-asr-flash-us |
Recognizes lyrics and analyzes emotions, suitable for song indexing and recommendations. |
Select model by audio duration. |
See Compare models.
Getting started
Get an API key and install the latest SDK.
DashScope
Qwen3-ASR-Flash-Filetrans
Asynchronous transcription of audio files up to 12 hours. Requires publicly accessible URL (no local file uploads). Non-streaming API returns complete result after task completion.
cURL
Submit a task to get task_id, then retrieve the result.
Submit a task
# IMPORTANT: Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Production: Use environment variables, not hardcoded keys
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-Async: enable" \
-d '{
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
},
"parameters": {
"channel_id":[
0
],
"enable_itn": false,
"enable_words": true
}
}'Get the task result
# IMPORTANT: Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Production: Use environment variables, not hardcoded keys
curl -X GET 'https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "X-DashScope-Async: enable" \
-H "Content-Type: application/json"Complete example
Java
import com.google.gson.Gson;
import com.google.gson.annotations.SerializedName;
import okhttp3.*;
import java.io.IOException;
import java.util.concurrent.TimeUnit;
public class Main {
// Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
private static final String API_URL_SUBMIT = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription";
// Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1/tasks/
private static final String API_URL_QUERY = "https://dashscope-intl.aliyuncs.com/api/v1/tasks/";
private static final Gson gson = new Gson();
public static void main(String[] args) {
// API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If not using environment variables, replace the line below with: String apiKey = "sk-xxx"
String apiKey = System.getenv("DASHSCOPE_API_KEY");
OkHttpClient client = new OkHttpClient();
// 1. Submit task
/*String payloadJson = """
{
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
},
"parameters": {
"channel_id": [0],
"enable_itn": false,
"language": "zh"
}
}
""";*/
String payloadJson = """
{
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
},
"parameters": {
"channel_id": [0],
"enable_itn": false,
"enable_words": true
}
}
""";
RequestBody body = RequestBody.create(payloadJson, MediaType.get("application/json; charset=utf-8"));
Request submitRequest = new Request.Builder()
.url(API_URL_SUBMIT)
.addHeader("Authorization", "Bearer " + apiKey)
.addHeader("Content-Type", "application/json")
.addHeader("X-DashScope-Async", "enable")
.post(body)
.build();
String taskId = null;
try (Response response = client.newCall(submitRequest).execute()) {
if (response.isSuccessful() && response.body() != null) {
String respBody = response.body().string();
ApiResponse apiResp = gson.fromJson(respBody, ApiResponse.class);
if (apiResp.output != null) {
taskId = apiResp.output.taskId;
System.out.println("Task submitted. task_id: " + taskId);
} else {
System.out.println("Submission response content: " + respBody);
return;
}
} else {
System.out.println("Task submission failed! HTTP code: " + response.code());
if (response.body() != null) {
System.out.println(response.body().string());
}
return;
}
} catch (IOException e) {
e.printStackTrace();
return;
}
// 2. Poll task status
boolean finished = false;
while (!finished) {
try {
TimeUnit.SECONDS.sleep(2); // Wait 2 seconds before querying again
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
return;
}
String queryUrl = API_URL_QUERY + taskId;
Request queryRequest = new Request.Builder()
.url(queryUrl)
.addHeader("Authorization", "Bearer " + apiKey)
.addHeader("X-DashScope-Async", "enable")
.addHeader("Content-Type", "application/json")
.get()
.build();
try (Response response = client.newCall(queryRequest).execute()) {
if (response.body() != null) {
String queryResponse = response.body().string();
ApiResponse apiResp = gson.fromJson(queryResponse, ApiResponse.class);
if (apiResp.output != null && apiResp.output.taskStatus != null) {
String status = apiResp.output.taskStatus;
System.out.println("Current task status: " + status);
if ("SUCCEEDED".equalsIgnoreCase(status)
|| "FAILED".equalsIgnoreCase(status)
|| "UNKNOWN".equalsIgnoreCase(status)) {
finished = true;
System.out.println("Task completed. Final result: ");
System.out.println(queryResponse);
}
} else {
System.out.println("Query response content: " + queryResponse);
}
}
} catch (IOException e) {
e.printStackTrace();
return;
}
}
}
static class ApiResponse {
@SerializedName("request_id")
String requestId;
Output output;
}
static class Output {
@SerializedName("task_id")
String taskId;
@SerializedName("task_status")
String taskStatus;
}
}
Python
import os
import time
import requests
import json
# Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
API_URL_SUBMIT = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription"
# Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1/tasks/
API_URL_QUERY_BASE = "https://dashscope-intl.aliyuncs.com/api/v1/tasks/"
def main():
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If not using environment variables, replace the line below with: api_key = "sk-xxx"
# Production: Use environment variables, not hardcoded keys
api_key = os.getenv("DASHSCOPE_API_KEY")
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
"X-DashScope-Async": "enable"
}
# 1. Submit the task
payload = {
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
},
"parameters": {
"channel_id": [0],
# "language": "zh",
"enable_itn": False,
"enable_words": True
}
}
print("Submitting ASR transcription task...")
try:
submit_resp = requests.post(API_URL_SUBMIT, headers=headers, data=json.dumps(payload))
except requests.RequestException as e:
print(f"Failed to submit task request: {e}")
return
if submit_resp.status_code != 200:
print(f"Task submission failed! HTTP code: {submit_resp.status_code}")
print(submit_resp.text)
return
resp_data = submit_resp.json()
output = resp_data.get("output")
if not output or "task_id" not in output:
print("Abnormal submission response content:", resp_data)
return
task_id = output["task_id"]
print(f"Task submitted. task_id: {task_id}")
# 2. Poll the task status
finished = False
while not finished:
time.sleep(2) # Wait 2 seconds before querying again
query_url = API_URL_QUERY_BASE + task_id
try:
query_resp = requests.get(query_url, headers=headers)
except requests.RequestException as e:
print(f"Failed to query task: {e}")
return
if query_resp.status_code != 200:
print(f"Task query failed! HTTP code: {query_resp.status_code}")
print(query_resp.text)
return
query_data = query_resp.json()
output = query_data.get("output")
if output and "task_status" in output:
status = output["task_status"]
print(f"Current task status: {status}")
if status.upper() in ("SUCCEEDED", "FAILED", "UNKNOWN"):
finished = True
print("Task completed. Final result:")
print(json.dumps(query_data, indent=2, ensure_ascii=False))
else:
print("Query response content:", query_data)
if __name__ == "__main__":
main()
Java SDK
import com.alibaba.dashscope.audio.qwen_asr.*;
import com.alibaba.dashscope.utils.Constants;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import com.google.gson.JsonObject;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.ArrayList;
import java.util.HashMap;
public class Main {
public static void main(String[] args) {
// Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
QwenTranscriptionParam param =
QwenTranscriptionParam.builder()
// API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If not using environment variables, replace the line below with: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model("qwen3-asr-flash-filetrans")
.fileUrl("https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_example_1.wav")
//.parameter("language", "zh")
//.parameter("channel_id", new ArrayList<String>(){{add("0");add("1");}})
.parameter("enable_itn", false)
.parameter("enable_words", true)
.build();
try {
QwenTranscription transcription = new QwenTranscription();
// Submit the task
QwenTranscriptionResult result = transcription.asyncCall(param);
System.out.println("create task result: " + result);
// Query the task status
result = transcription.fetch(QwenTranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
System.out.println("task status: " + result);
// Wait for the task to complete
result =
transcription.wait(
QwenTranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
System.out.println("task result: " + result);
// Get the speech recognition result
QwenTranscriptionTaskResult taskResult = result.getResult();
if (taskResult != null) {
// Get the URL of the recognition result
String transcriptionUrl = taskResult.getTranscriptionUrl();
// Get the result from the URL
HttpURLConnection connection =
(HttpURLConnection) new URL(transcriptionUrl).openConnection();
connection.setRequestMethod("GET");
connection.connect();
BufferedReader reader =
new BufferedReader(new InputStreamReader(connection.getInputStream()));
// Format and print the JSON result
Gson gson = new GsonBuilder().setPrettyPrinting().create();
System.out.println(gson.toJson(gson.fromJson(reader, JsonObject.class)));
}
} catch (Exception e) {
System.out.println("error: " + e);
}
}
}Python SDK
import json
import os
import sys
from http import HTTPStatus
import dashscope
from dashscope.audio.qwen_asr import QwenTranscription
from dashscope.api_entities.dashscope_response import TranscriptionResponse
# run the transcription script
if __name__ == '__main__':
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If not using environment variables, replace the line below with: dashscope.api_key = "sk-xxx"
# Production: Use environment variables, not hardcoded keys
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
# Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
task_response = QwenTranscription.async_call(
model='qwen3-asr-flash-filetrans',
file_url='https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_example_1.wav',
#language="",
enable_itn=False,
enable_words=True
)
print(f'task_response: {task_response}')
print(task_response.output.task_id)
query_response = QwenTranscription.fetch(task=task_response.output.task_id)
print(f'query_response: {query_response}')
task_result = QwenTranscription.wait(task=task_response.output.task_id)
print(f'task_result: {task_result}')Qwen3-ASR-Flash
Qwen3-ASR-Flash supports recordings up to 5 minutes long. This model accepts a publicly accessible audio file URL or a direct upload of a local file as input. It can also return recognition results as a stream.
Input: Audio file URL
Python SDK
import os
import dashscope
# Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{"role": "user", "content": [{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"}]}
]
response = dashscope.MultiModalConversation.call(
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If not using environment variables, replace the line below with: api_key = "sk-xxx"
# Production: Use environment variables, not hardcoded keys
api_key=os.getenv("DASHSCOPE_API_KEY"),
# If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
model="qwen3-asr-flash",
messages=messages,
result_format="message",
asr_options={
#"language": "zh", # Optional. Specify known audio language to improve accuracy.
"enable_itn":False
}
)
print(response)
Java SDK
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3")))
.build();
MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
.build();
Map<String, Object> asrOptions = new HashMap<>();
asrOptions.put("enable_itn", false);
// asrOptions.put("language", "zh"); // Optional. Specify known audio language to improve accuracy.
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
.model("qwen3-asr-flash")
.message(sysMessage)
.message(userMessage)
.parameter("asr_options", asrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(JsonUtils.toJson(result));
}
public static void main(String[] args) {
try {
// Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
cURL
# ======= Important =======
# Singapore region URL. US: https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation | Beijing: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you use a model in the US region, add the "us" suffix
# Production: Use environment variables, not hardcoded keys
curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-asr-flash",
"input": {
"messages": [
{
"content": [
{
"text": ""
}
],
"role": "system"
},
{
"content": [
{
"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
],
"role": "user"
}
]
},
"parameters": {
"asr_options": {
"enable_itn": false
}
}
}'
Input: Base64-encoded audio file
Input Base64-encoded data (Data URL) in the format: data:<mediatype>;base64,<data>.
-
<mediatype>: MIME typeVaries by audio format, for example:
-
WAV:
audio/wav -
MP3:
audio/mpeg
-
-
<data>: Base64-encoded string of the audioBase64 encoding increases file size. Keep the original file small enough so the encoded data stays within the 10 MB input limit.
-
Example:
data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9
Python SDK
The example uses the audio file: welcome.mp3.
import base64
import dashscope
import os
import pathlib
# Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# Replace with your actual audio file path
file_path = "welcome.mp3"
# Replace with your actual audio file MIME type
audio_mime_type = "audio/mpeg"
file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
raise FileNotFoundError(f"Audio file not found: {file_path}")
base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"
messages = [
{"role": "user", "content": [{"audio": data_uri}]}
]
response = dashscope.MultiModalConversation.call(
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If not using environment variables, replace the line below with: api_key = "sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
# If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
model="qwen3-asr-flash",
messages=messages,
result_format="message",
asr_options={
# "language": "zh", # Optional. Specify known audio language to improve accuracy.
"enable_itn":False
}
)
print(response)
Java SDK
The example uses the audio file: welcome.mp3.
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.*;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
// Replace with actual file path
private static final String AUDIO_FILE = "welcome.mp3";
// Replace with your actual audio file MIME type
private static final String AUDIO_MIME_TYPE = "audio/mpeg";
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException, IOException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", toDataUrl())))
.build();
MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
.build();
Map<String, Object> asrOptions = new HashMap<>();
asrOptions.put("enable_itn", false);
// asrOptions.put("language", "zh"); // Optional. Specify known audio language to improve accuracy.
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
.model("qwen3-asr-flash")
.message(sysMessage)
.message(userMessage)
.parameter("asr_options", asrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(JsonUtils.toJson(result));
}
public static void main(String[] args) {
try {
// Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
// Generate data URI
public static String toDataUrl() throws IOException {
byte[] bytes = Files.readAllBytes(Paths.get(AUDIO_FILE));
String encoded = Base64.getEncoder().encodeToString(bytes);
return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
}
}
Input: Absolute path to local audio file
DashScope SDK requires file paths for local files. Construct paths per scenario and OS:
|
System |
SDK |
Input file path |
Example |
|
Linux or macOS |
Python SDK |
file://{absolute file path} |
file:///home/images/test.png |
|
Java SDK |
|||
|
Windows |
Python SDK |
file://{absolute file path} |
file://D:/images/test.png |
|
Java SDK |
file:///{absolute file path} |
file:///D:images/test.png |
When using local files, the API call limit is 100 QPS and cannot be scaled. Do not use this method in production environments, high-concurrency scenarios, or stress testing. For higher concurrency, upload files to OSS and call the API using the audio file URL.
Python SDK
The example uses the audio file: welcome.mp3.
import os
import dashscope
# Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
# Replace ABSOLUTE_PATH with your local file path
audio_file_path = "file://ABSOLUTE_PATH/welcome.mp3"
messages = [
{"role": "user", "content": [{"audio": audio_file_path}]}
]
response = dashscope.MultiModalConversation.call(
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If not using environment variables, replace the line below with: api_key = "sk-xxx"
# Production: Use environment variables, not hardcoded keys
api_key=os.getenv("DASHSCOPE_API_KEY"),
# If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
model="qwen3-asr-flash",
messages=messages,
result_format="message",
asr_options={
# "language": "zh", # Optional. Specify known audio language to improve accuracy.
"enable_itn":False
}
)
print(response)
Java SDK
The example uses the audio file: welcome.mp3.
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
// Replace ABSOLUTE_PATH with your local file path
String localFilePath = "file://ABSOLUTE_PATH/welcome.mp3";
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", localFilePath)))
.build();
MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
.build();
Map<String, Object> asrOptions = new HashMap<>();
asrOptions.put("enable_itn", false);
// asrOptions.put("language", "zh"); // Optional. Specify known audio language to improve accuracy.
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
.model("qwen3-asr-flash")
.message(sysMessage)
.message(userMessage)
.parameter("asr_options", asrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(JsonUtils.toJson(result));
}
public static void main(String[] args) {
try {
// Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
Streaming output
The model generates results incrementally rather than all at once. Non-streaming output waits until the model finishes generating and then returns the complete result. Streaming output returns intermediate results in real time, letting you read results as they are generated and reducing wait time. Set parameters differently based on your calling method to enable streaming output:
-
DashScope Python SDK: Set the
streamparameter to true. -
DashScope Java SDK: Use the
streamCallinterface. -
DashScope HTTP: Set the
X-DashScope-SSEheader toenable.
Python SDK
import os
import dashscope
# Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{"role": "user", "content": [{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"}]}
]
response = dashscope.MultiModalConversation.call(
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If not using environment variables, replace the line below with: api_key = "sk-xxx"
# Production: Use environment variables, not hardcoded keys
api_key=os.getenv("DASHSCOPE_API_KEY"),
# If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
model="qwen3-asr-flash",
messages=messages,
result_format="message",
asr_options={
# "language": "zh", # Optional. Specify known audio language to improve accuracy.
"enable_itn":False
},
stream=True
)
for response in response:
try:
print(response["output"]["choices"][0]["message"].content[0]["text"])
except:
pass
Java SDK
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import io.reactivex.Flowable;
public class Main {
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3")))
.build();
MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
.build();
Map<String, Object> asrOptions = new HashMap<>();
asrOptions.put("enable_itn", false);
// asrOptions.put("language", "zh"); // Optional. Specify known audio language to improve accuracy.
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
.model("qwen3-asr-flash")
.message(sysMessage)
.message(userMessage)
.parameter("asr_options", asrOptions)
.build();
Flowable<MultiModalConversationResult> resultFlowable = conv.streamCall(param);
resultFlowable.blockingForEach(item -> {
try {
System.out.println(item.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
} catch (Exception e){
System.exit(0);
}
});
}
public static void main(String[] args) {
try {
// Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
cURL
# ======= Important =======
# Singapore region URL. US: https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation | Beijing: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you use a model in the US region, add the "us" suffix
# Production: Use environment variables, not hardcoded keys
curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
"model": "qwen3-asr-flash",
"input": {
"messages": [
{
"content": [
{
"text": ""
}
],
"role": "system"
},
{
"content": [
{
"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
],
"role": "user"
}
]
},
"parameters": {
"incremental_output": true,
"asr_options": {
"enable_itn": false
}
}
}'
OpenAI compatible
US region: No OpenAI-compatible mode support.
Qwen3-ASR-Flash series only. OpenAI-compatible mode accepts public URLs only (no local paths).
Use OpenAI Python SDK version 1.52.0 or later, and Node.js SDK version 4.68.0 or later.
The asr_options parameter is not part of the OpenAI standard. When using the OpenAI SDK, pass it through extra_body.
Input: Audio file URL
Python SDK
from openai import OpenAI
import os
try:
client = OpenAI(
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If not using environment variables, replace the line below with: api_key = "sk-xxx",
# Production: Use environment variables, not hardcoded keys
api_key=os.getenv("DASHSCOPE_API_KEY"),
# Singapore/US region URL. For Beijing:https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
stream_enabled = False # Enable streaming output
completion = client.chat.completions.create(
model="qwen3-asr-flash",
messages=[
{
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
}
],
"role": "user"
}
],
stream=stream_enabled,
# Do not set stream_options when stream is False
# stream_options={"include_usage": True},
extra_body={
"asr_options": {
# "language": "zh",
"enable_itn": False
}
}
)
if stream_enabled:
full_content = ""
print("Streaming output:")
for chunk in completion:
# With stream_options.include_usage=True, skip last chunk's empty choices (usage in chunk.usage)
print(chunk)
if chunk.choices and chunk.choices[0].delta.content:
full_content += chunk.choices[0].delta.content
print(f"Full content: {full_content}")
else:
print(f"Non-streaming output: {completion.choices[0].message.content}")
except Exception as e:
print(f"Error: {e}")
Node.js SDK
// Prerequisites (Windows/Mac/Linux):
// 1. Node.js (version ≥ 14)
// 2. npm install openai
import OpenAI from "openai";
const client = new OpenAI({
// API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If not using environment variables, replace the line below with: apiKey: "sk-xxx",
apiKey: process.env.DASHSCOPE_API_KEY,
// Singapore/US region URL. For Beijing:https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
});
async function main() {
try {
const streamEnabled = false; // Enable streaming output
const completion = await client.chat.completions.create({
model: "qwen3-asr-flash",
messages: [
{
role: "user",
content: [
{
type: "input_audio",
input_audio: {
data: "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
}
]
}
],
stream: streamEnabled,
// Do not set stream_options when stream is False
// stream_options: {
// "include_usage": true
// },
extra_body: {
asr_options: {
// language: "zh",
enable_itn: false
}
}
});
if (streamEnabled) {
let fullContent = "";
console.log("Streaming output:");
for await (const chunk of completion) {
console.log(JSON.stringify(chunk));
if (chunk.choices && chunk.choices.length > 0) {
const delta = chunk.choices[0].delta;
if (delta && delta.content) {
fullContent += delta.content;
}
}
}
console.log(`Full content: ${fullContent}`);
} else {
console.log(`Non-streaming output: ${completion.choices[0].message.content}`);
}
} catch (err) {
console.error(`Error: ${err}`);
}
}
main();
cURL
# ======= Important =======
# Singapore/US region URL. For Beijing:https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Production: Use environment variables, not hardcoded keys
curl -X POST 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-asr-flash",
"messages": [
{
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
}
],
"role": "user"
}
],
"stream":false,
"asr_options": {
"enable_itn": false
}
}'
Input: Base64-encoded audio file
Input Base64-encoded data (Data URL) in the format: data:<mediatype>;base64,<data>.
-
<mediatype>: MIME typeVaries by audio format, for example:
-
WAV:
audio/wav -
MP3:
audio/mpeg
-
-
<data>: Base64-encoded string of the audioBase64 encoding increases file size. Keep the original file small enough so the encoded data stays within the 10 MB input limit.
-
Example:
data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9
Python SDK
The example uses the audio file: welcome.mp3.
import base64
from openai import OpenAI
import os
import pathlib
try:
# Replace with your actual audio file path
file_path = "welcome.mp3"
# Replace with your actual audio file MIME type
audio_mime_type = "audio/mpeg"
file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
raise FileNotFoundError(f"Audio file not found: {file_path}")
base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"
client = OpenAI(
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If not using environment variables, replace the line below with: api_key = "sk-xxx",
api_key=os.getenv("DASHSCOPE_API_KEY"),
# Singapore/US region URL. For Beijing:https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
stream_enabled = False # Enable streaming output
completion = client.chat.completions.create(
model="qwen3-asr-flash",
messages=[
{
"content": [
{
"type": "input_audio",
"input_audio": {
"data": data_uri
}
}
],
"role": "user"
}
],
stream=stream_enabled,
# Do not set stream_options when stream is False
# stream_options={"include_usage": True},
extra_body={
"asr_options": {
# "language": "zh",
"enable_itn": False
}
}
)
if stream_enabled:
full_content = ""
print("Streaming output:")
for chunk in completion:
# With stream_options.include_usage=True, skip last chunk's empty choices (usage in chunk.usage)
print(chunk)
if chunk.choices and chunk.choices[0].delta.content:
full_content += chunk.choices[0].delta.content
print(f"Full content: {full_content}")
else:
print(f"Non-streaming output: {completion.choices[0].message.content}")
except Exception as e:
print(f"Error: {e}")
Node.js SDK
The example uses the audio file: welcome.mp3.
// Prerequisites (Windows/Mac/Linux):
// 1. Node.js (version ≥ 14)
// 2. npm install openai
import OpenAI from "openai";
import { readFileSync } from 'fs';
const client = new OpenAI({
// API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If not using environment variables, replace the line below with: apiKey: "sk-xxx",
apiKey: process.env.DASHSCOPE_API_KEY,
// Singapore/US region URL. For Beijing:https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
});
const encodeAudioFile = (audioFilePath) => {
const audioFile = readFileSync(audioFilePath);
return audioFile.toString('base64');
};
// Replace with actual file path
const dataUri = `data:audio/mpeg;base64,${encodeAudioFile("welcome.mp3")}`;
async function main() {
try {
const streamEnabled = false; // Enable streaming output
const completion = await client.chat.completions.create({
model: "qwen3-asr-flash",
messages: [
{
role: "user",
content: [
{
type: "input_audio",
input_audio: {
data: dataUri
}
}
]
}
],
stream: streamEnabled,
// Do not set stream_options when stream is False
// stream_options: {
// "include_usage": true
// },
extra_body: {
asr_options: {
// language: "zh",
enable_itn: false
}
}
});
if (streamEnabled) {
let fullContent = "";
console.log("Streaming output:");
for await (const chunk of completion) {
console.log(JSON.stringify(chunk));
if (chunk.choices && chunk.choices.length > 0) {
const delta = chunk.choices[0].delta;
if (delta && delta.content) {
fullContent += delta.content;
}
}
}
console.log(`Full content: ${fullContent}`);
} else {
console.log(`Non-streaming output: ${completion.choices[0].message.content}`);
}
} catch (err) {
console.error(`Error: ${err}`);
}
}
main();
API reference
Compare models
The feature set for qwen3-asr-flash and qwen3-asr-flash-2025-09-08 also applies to their US (Virginia) region counterparts: qwen3-asr-flash-us and qwen3-asr-flash-2025-09-08-us.
|
Feature |
qwen3-asr-flash-filetrans, qwen3-asr-flash-filetrans-2025-11-17 |
qwen3-asr-flash,qwen3-asr-flash-2026-02-10, qwen3-asr-flash-2025-09-08 |
|
Supported languages |
Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish |
|
|
Audio formats |
aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv |
aac, amr, avi, aiff, flac, flv, mkv, mp3, mpeg, ogg, opus, wav, webm, wma, wmv |
|
Sample rate |
Depends on audio format:
|
|
|
Sound channels |
Any Multi-channel handling:
|
|
|
Input method |
Public URL |
Base64-encoded file, absolute local path, or public URL |
|
Audio size/duration |
Audio file size cannot exceed 2 GB, and duration cannot exceed 12 hours |
Audio file size cannot exceed 10 MB, and duration cannot exceed 5 minutes |
|
Emotion recognition |
Always enabled. View results via the response parameter |
|
|
Timestamps |
Always enabled. Control timestamp granularity via the request parameter Character-level timestamps are only guaranteed for: Chinese, English, Japanese, Korean, German, French, Spanish, Italian, Portuguese, Russian. Accuracy may vary for other languages. |
|
|
Punctuation prediction |
Always enabled |
|
|
ITN |
Disabled by default, can be enabled. Only for Chinese and English. |
|
|
Singing recognition |
Always enabled |
|
|
Noise rejection |
Always enabled |
|
|
Sensitive words filter |
|
|
|
Speaker diarization |
|
|
|
Filler word filtering |
|
|
|
VAD |
Always enabled |
|
|
Rate limit (RPM) |
100 |
|
|
Connection type |
DashScope: Java/Python SDK, RESTful API |
DashScope: Java/Python SDK, RESTful API OpenAI: Python/Node.js SDK, RESTful API |
|
Pricing |
International: $0.000035/second US: $0.000032/second Chinese mainland: $0.000032/second |
|
FAQ
Q: How do I provide a publicly accessible audio URL for the API?
Use Object Storage Service (OSS) for highly available storage and easy public URL generation.
Verify URL accessibility: Test with browser or curl (expect HTTP 200).
Q: How do I check if my audio format meets requirements?
Use ffprobe to get audio information:
# Check container format (format_name), codec (codec_name), sample rate (sample_rate), and number of channels (channels)
ffprobe -v error -show_entries format=format_name -show_entries stream=codec_name,sample_rate,channels -of default=noprint_wrappers=1 your_audio_file.mp3
Q: How do I process audio to meet model requirements?
Use FFmpeg to trim or convert audio:
-
Audio trimming: Extract segment from long file
# -i: Input file # -ss 00:01:30: Start time (1 minute 30 seconds) # -t 00:02:00: Duration (2 minutes) # -c copy: Copy audio without re-encoding (fast) # output_clip.wav: Output file ffmpeg -i long_audio.wav -ss 00:01:30 -t 00:02:00 -c copy output_clip.wav -
Format conversion
Convert any audio to 16 kHz, 16-bit, mono WAV:
# -i: Input file # -ac 1: Set to 1 channel (mono) # -ar 16000: Set sample rate to 16000 Hz (16 kHz) # -sample_fmt s16: Set sample format to 16-bit signed integer PCM # output.wav: Output file ffmpeg -i input.mp3 -ac 1 -ar 16000 -sample_fmt s16 output.wav