Qwen-ASR transcribes audio files to text with support for 26 languages, emotion detection, and word-level timestamps. This page covers request and response parameters for each connection type.
Getting started: For model details and selection guidance, see Audio file recognition - Qwen.
Choose a connection type
Each model supports specific connection types for different use cases.
| Model | Supported connection types | Best for |
|---|---|---|
| Qwen3-ASR-Flash | OpenAI compatible, DashScope synchronous | Real-time recognition of short audio (up to 10 MB) |
| Qwen3-ASR-Flash-Filetrans | DashScope asynchronous only | Long audio files or batch processing |
US region: Use DashScope synchronous or asynchronous (OpenAI compatible mode not supported).
Recommended: Audio files under 10 MB → OpenAI compatible (Qwen3-ASR-Flash). Long audio files → DashScope asynchronous (Qwen3-ASR-Flash-Filetrans).
OpenAI compatible
Endpoints
All examples use the OpenAI compatible endpoint (Qwen3-ASR-Flash).
| Deployment mode | HTTP endpoint | SDK base_url |
|---|---|---|
| International (Singapore) | POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions |
https://dashscope-intl.aliyuncs.com/compatible-mode/v1 |
| Chinese Mainland (Beijing) | POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions |
https://dashscope.aliyuncs.com/compatible-mode/v1 |
-
International: Endpoint and data storage in Singapore. Global inference (excluding Chinese mainland).
-
Chinese Mainland: Endpoint and data storage in Beijing. Inference restricted to Chinese mainland.
-
API keys differ by region. See Get API key.
Request body
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | The model name. Set to qwen3-asr-flash. |
messages |
array | Yes | The message list. See Message structure (OpenAI compatible). |
asr_options |
object | No | Recognition options (see asr_options (OpenAI compatible)). Not a standard OpenAI parameter—pass via extra_body in OpenAI SDKs. |
stream |
boolean | No | Default: false. Set to true for faster first-token response and reduced timeout risk (see Streaming output). |
stream_options |
object | No | Streaming configuration. Only effective when stream is true. Do not set when stream is false. |
stream_options.include_usage |
boolean | No | Default: false. When true, token usage appears in the last chunk of the streaming response. |
Message structure (OpenAI compatible)
System message (optional)
Provide background text for context biasing (entity vocabularies, domain terms, or reference info) to improve accuracy. Place it at the beginning of the messages array.
| Property | Type | Required | Description |
|---|---|---|---|
role |
string | Yes | Set to system. |
content |
array | Yes | One message element allowed. |
content[].text |
string | No | Context text. 10,000-token limit. See Context biasing. |
User message (required)
| Property | Type | Required | Description |
|---|---|---|---|
role |
string | Yes | Set to user. |
content |
array | Yes | One message element allowed. |
content[].type |
string | Yes | Set to input_audio. |
content[].input_audio.data |
string | Yes | Audio input. Accepts a publicly accessible URL or a Base64-encoded Data URL. See Audio input formats. |
asr_options (OpenAI compatible)
| Parameter | Type | Required | Description |
|---|---|---|---|
language |
string | No | No default. Specify audio language to improve accuracy (one language per request). Omit for multilingual audio (e.g., mixed Chinese, English, Japanese, Korean). See Supported languages. |
enable_itn |
boolean | No | Default: false. Enable Inverse Text Normalization (ITN) to convert spoken forms to written forms (e.g., "one hundred" → "100"). Chinese and English only. |
Audio input formats
Qwen3-ASR-Flash in OpenAI compatible mode accepts two input formats:
-
URL: A publicly accessible audio file URL.
-
Base64-encoded Data URL: Format:
data:<mediatype>;base64,<data>. Common MIME types:audio/wav(WAV),audio/mpeg(MP3). Base64 encoding increases file size—keep original file small enough to stay within 10 MB limit.
OSS URL restrictions:
-
SDK calls: Cannot use temporary
oss://prefix URLs. Use standard HTTP URLs. -
RESTful API calls: Can use
oss://prefix URLs, but note:
-
Temporary upload URLs expire after 48 hours—do not use in production.
-
Upload credential API: 100 QPS limit (no scale-out). Do not use in production, high-concurrency, or stress testing.
-
For production, store audio in Object Storage Service (OSS) for reliable, long-term availability.
Sample code
All examples send an audio file URL to Qwen3-ASR-Flash using the OpenAI compatible endpoint.
Replace the endpoint and API key for your region before running. Set DASHSCOPE_API_KEY environment variable or replace os.getenv("DASHSCOPE_API_KEY") with your key.
Input: audio file URL
Python
from openai import OpenAI
import os
try:
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
# International (Singapore). Beijing: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
stream_enabled = False
completion = client.chat.completions.create(
model="qwen3-asr-flash",
messages=[
{
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
}
],
"role": "user"
}
],
stream=stream_enabled,
extra_body={
"asr_options": {
# "language": "zh",
"enable_itn": False
}
}
)
if stream_enabled:
full_content = ""
for chunk in completion:
# When stream_options.include_usage is True, the last chunk has an empty choices list
if chunk.choices and chunk.choices[0].delta.content:
full_content += chunk.choices[0].delta.content
print(f"Result: {full_content}")
else:
print(f"Result: {completion.choices[0].message.content}")
except Exception as e:
print(f"Error: {e}")
Node.js
import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
// International (Singapore). Beijing: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
});
async function main() {
try {
const streamEnabled = false;
const completion = await client.chat.completions.create({
model: "qwen3-asr-flash",
messages: [
{
role: "user",
content: [
{
type: "input_audio",
input_audio: {
data: "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
}
]
}
],
stream: streamEnabled,
extra_body: {
asr_options: {
// language: "zh",
enable_itn: false
}
}
});
if (streamEnabled) {
let fullContent = "";
for await (const chunk of completion) {
if (chunk.choices && chunk.choices.length > 0) {
const delta = chunk.choices[0].delta;
if (delta && delta.content) {
fullContent += delta.content;
}
}
}
console.log(`Result: ${fullContent}`);
} else {
console.log(`Result: ${completion.choices[0].message.content}`);
}
} catch (err) {
console.error(`Error: ${err}`);
}
}
main();
cURL
Use the text field in the system message to provide context for custom recognition.
curl -X POST 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-asr-flash",
"messages": [
{
"content": [
{
"type": "input_audio",
"input_audio": {
"data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
}
],
"role": "user"
}
],
"stream": false,
"asr_options": {
"enable_itn": false
}
}'
Input: Base64-encoded audio file
Encode local audio file as Base64 Data URL before sending: data:<mediatype>;base64,<data>.
Encoding examples:
Python:
import base64, pathlib
file_path = pathlib.Path("input.mp3")
base64_str = base64.b64encode(file_path.read_bytes()).decode()
data_uri = f"data:audio/mpeg;base64,{base64_str}"
Java:
import java.nio.file.*;
import java.util.Base64;
public class Main {
public static String toDataUrl(String filePath) throws Exception {
byte[] bytes = Files.readAllBytes(Paths.get(filePath));
String encoded = Base64.getEncoder().encodeToString(bytes);
return "data:audio/mpeg;base64," + encoded;
}
public static void main(String[] args) throws Exception {
System.out.println(toDataUrl("input.mp3"));
}
}
After encoding, pass the data_uri as the input_audio.data value. The following examples use the sample file welcome.mp3.
Python
import base64
from openai import OpenAI
import os
import pathlib
try:
# Replace with the path and MIME type of your audio file
file_path = "welcome.mp3"
audio_mime_type = "audio/mpeg"
file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
raise FileNotFoundError(f"Audio file not found: {file_path}")
base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"
client = OpenAI(
api_key=os.getenv("DASHSCOPE_API_KEY"),
# International (Singapore). Beijing: https://dashscope.aliyuncs.com/compatible-mode/v1
base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
)
stream_enabled = False
completion = client.chat.completions.create(
model="qwen3-asr-flash",
messages=[
{
"content": [
{
"type": "input_audio",
"input_audio": {
"data": data_uri
}
}
],
"role": "user"
}
],
stream=stream_enabled,
extra_body={
"asr_options": {
# "language": "zh",
"enable_itn": False
}
}
)
if stream_enabled:
full_content = ""
for chunk in completion:
if chunk.choices and chunk.choices[0].delta.content:
full_content += chunk.choices[0].delta.content
print(f"Result: {full_content}")
else:
print(f"Result: {completion.choices[0].message.content}")
except Exception as e:
print(f"Error: {e}")
Node.js
import OpenAI from "openai";
import { readFileSync } from 'fs';
const client = new OpenAI({
apiKey: process.env.DASHSCOPE_API_KEY,
// International (Singapore). Beijing: https://dashscope.aliyuncs.com/compatible-mode/v1
baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
});
const encodeAudioFile = (audioFilePath) => {
const audioFile = readFileSync(audioFilePath);
return audioFile.toString('base64');
};
// Replace with the path to your audio file
const dataUri = `data:audio/mpeg;base64,${encodeAudioFile("welcome.mp3")}`;
async function main() {
try {
const streamEnabled = false;
const completion = await client.chat.completions.create({
model: "qwen3-asr-flash",
messages: [
{
role: "user",
content: [
{
type: "input_audio",
input_audio: {
data: dataUri
}
}
]
}
],
stream: streamEnabled,
extra_body: {
asr_options: {
// language: "zh",
enable_itn: false
}
}
});
if (streamEnabled) {
let fullContent = "";
for await (const chunk of completion) {
if (chunk.choices && chunk.choices.length > 0) {
const delta = chunk.choices[0].delta;
if (delta && delta.content) {
fullContent += delta.content;
}
}
}
console.log(`Result: ${fullContent}`);
} else {
console.log(`Result: ${completion.choices[0].message.content}`);
}
} catch (err) {
console.error(`Error: ${err}`);
}
}
main();
Response body
| Field | Type | Description |
|---|---|---|
id |
string | Unique identifier for this request. |
model |
string | Model used for this request. |
object |
string | Always chat.completion. |
created |
integer | UNIX timestamp (seconds) when the request was created. |
choices |
array | Model output. See choices fields (OpenAI compatible). |
usage |
object | Token consumption. See usage fields (OpenAI compatible). |
choices fields (OpenAI compatible)
| Field | Type | Description |
|---|---|---|
index |
integer | Position in the choices array. |
finish_reason |
string | null: generation in progress. stop: finished naturally. length: output exceeded maximum length. |
message.role |
string | Always assistant. |
message.content |
string | The transcribed text. |
message.annotations |
array | Metadata about the recognized audio. |
message.annotations[].type |
string | Always audio_info. |
message.annotations[].language |
string | Detected language code. If language was specified in the request, this matches that value. See Supported languages. |
message.annotations[].emotion |
string | Detected emotion: surprised, neutral, happy, sad, disgusted, angry, or fearful. |
usage fields (OpenAI compatible)
| Field | Type | Description |
|---|---|---|
prompt_tokens |
integer | Total input tokens. |
prompt_tokens_details.audio_tokens |
integer | Audio input tokens. Each second = 25 tokens. Audio shorter than 1 second counts as 1 second. |
prompt_tokens_details.text_tokens |
integer | Ignore this field. |
completion_tokens |
integer | Output tokens. |
completion_tokens_details.text_tokens |
integer | Output text tokens. |
seconds |
integer | Audio duration in seconds. |
total_tokens |
integer | prompt_tokens + completion_tokens. |
Response examples
Non-streaming output
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"annotations": [
{
"emotion": "neutral",
"language": "zh",
"type": "audio_info"
}
],
"content": "Welcome to Alibaba Cloud.",
"role": "assistant"
}
}
],
"created": 1767683986,
"id": "chatcmpl-487abe5f-d4f2-9363-a877-xxxxxxx",
"model": "qwen3-asr-flash",
"object": "chat.completion",
"usage": {
"completion_tokens": 12,
"completion_tokens_details": {
"text_tokens": 12
},
"prompt_tokens": 42,
"prompt_tokens_details": {
"audio_tokens": 42,
"text_tokens": 0
},
"seconds": 1,
"total_tokens": 54
}
}
Streaming output
data: {"model":"qwen3-asr-flash","id":"chatcmpl-3fb97803-d27f-9289-8889-xxxxx","created":1767685989,"object":"chat.completion.chunk","usage":null,"choices":[{"logprobs":null,"index":0,"delta":{"content":"","role":"assistant"}}]}
data: {"model":"qwen3-asr-flash","id":"chatcmpl-3fb97803-d27f-9289-8889-xxxxx","choices":[{"delta":{"annotations":[{"type":"audio_info","language":"zh","emotion":"neutral"}],"content":"Welcome","role":null},"index":0}],"created":1767685989,"object":"chat.completion.chunk","usage":null}
data: {"model":"qwen3-asr-flash","id":"chatcmpl-3fb97803-d27f-9289-8889-xxxxx","choices":[{"delta":{"annotations":[{"type":"audio_info","language":"zh","emotion":"neutral"}],"content":" to","role":null},"index":0}],"created":1767685989,"object":"chat.completion.chunk","usage":null}
data: {"model":"qwen3-asr-flash","id":"chatcmpl-3fb97803-d27f-9289-8889-xxxxx","choices":[{"delta":{"annotations":[{"type":"audio_info","language":"zh","emotion":"neutral"}],"content":" Alibaba","role":null},"index":0}],"created":1767685989,"object":"chat.completion.chunk","usage":null}
data: {"model":"qwen3-asr-flash","id":"chatcmpl-3fb97803-d27f-9289-8889-xxxxx","choices":[{"delta":{"annotations":[{"type":"audio_info","language":"zh","emotion":"neutral"}],"content":" Cloud","role":null},"index":0}],"created":1767685989,"object":"chat.completion.chunk","usage":null}
data: {"model":"qwen3-asr-flash","id":"chatcmpl-3fb97803-d27f-9289-8889-xxxxx","choices":[{"delta":{"annotations":[{"type":"audio_info","language":"zh","emotion":"neutral"}],"content":".","role":null},"index":0}],"created":1767685989,"object":"chat.completion.chunk","usage":null}
data: {"model":"qwen3-asr-flash","id":"chatcmpl-3fb97803-d27f-9289-8889-xxxxx","choices":[{"delta":{"role":null},"index":0,"finish_reason":"stop"}],"created":1767685989,"object":"chat.completion.chunk","usage":null}
data: [DONE]
DashScope synchronous
Endpoints
| Deployment mode | HTTP endpoint | SDK base_url |
|---|---|---|
| International (Singapore) | POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation |
https://dashscope-intl.aliyuncs.com/api/v1 |
| US (Virginia) | POST https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation |
https://dashscope-us.aliyuncs.com/api/v1 |
| Chinese Mainland (Beijing) | POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation |
https://dashscope.aliyuncs.com/api/v1 |
-
International: Endpoint and data storage in Singapore. Global inference (excluding Chinese mainland).
-
US: Endpoint and data storage in Virginia. Inference restricted to United States.
-
Chinese Mainland: Endpoint and data storage in Beijing. Inference restricted to Chinese mainland.
-
US region models: Append
-usto model name (e.g.,qwen3-asr-flash-us).
Request body
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | The model name. Set to qwen3-asr-flash. For the US region, use qwen3-asr-flash-us. |
messages |
array | Yes | Message list. Place inside input object for HTTP calls (see Message structure (DashScope synchronous)). |
asr_options |
object | No | Recognition options. Place inside parameters for HTTP calls. See asr_options (DashScope synchronous). Supported by Qwen3-ASR-Flash only. |
Message structure (DashScope synchronous)
System message (optional, Qwen3-ASR-Flash only)
Provide context for customized recognition. Place it at the beginning of the messages array.
| Property | Type | Required | Description |
|---|---|---|---|
role |
string | Yes | Set to system. |
content |
array | Yes | One message element allowed. |
content[].text |
string | No | Context text. 10,000-token limit. See Context biasing. |
User message (required)
| Property | Type | Required | Description |
|---|---|---|---|
role |
string | Yes | Set to user. |
content |
array | Yes | One message element allowed. |
content[].audio |
string | Yes | Audio to recognize. Accepts Base64-encoded files, absolute paths of local files, or publicly accessible URLs. See Audio input formats for OSS URL restrictions. For getting started examples, see QuickStart. |
asr_options (DashScope synchronous)
Same parameters as asr_options (OpenAI compatible): language and enable_itn.
Sample code
The following examples recognize audio from a URL. For local audio file examples, see QuickStart.
cURL
curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3-asr-flash",
"input": {
"messages": [
{
"content": [
{
"text": ""
}
],
"role": "system"
},
{
"content": [
{
"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
}
],
"role": "user"
}
]
},
"parameters": {
"asr_options": {
"enable_itn": false
}
}
}'
Java
import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;
public class Main {
public static void simpleMultiModalConversationCall()
throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalMessage userMessage = MultiModalMessage.builder()
.role(Role.USER.getValue())
.content(Arrays.asList(
Collections.singletonMap("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3")))
.build();
MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
.content(Arrays.asList(Collections.singletonMap("text", "")))
.build();
Map<String, Object> asrOptions = new HashMap<>();
asrOptions.put("enable_itn", false);
// asrOptions.put("language", "zh");
MultiModalConversationParam param = MultiModalConversationParam.builder()
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
// US region: use "qwen3-asr-flash-us"
.model("qwen3-asr-flash")
.message(sysMessage)
.message(userMessage)
.parameter("asr_options", asrOptions)
.build();
MultiModalConversationResult result = conv.call(param);
System.out.println(JsonUtils.toJson(result));
}
public static void main(String[] args) {
try {
// International (Singapore). Beijing: https://dashscope.aliyuncs.com/api/v1
// US: https://dashscope-us.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
simpleMultiModalConversationCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
Python
import os
import dashscope
# International (Singapore). Beijing: https://dashscope.aliyuncs.com/api/v1
# US: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
messages = [
{"role": "system", "content": [{"text": ""}]},
{"role": "user", "content": [{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"}]}
]
response = dashscope.MultiModalConversation.call(
api_key=os.getenv("DASHSCOPE_API_KEY"),
# US region: use "qwen3-asr-flash-us"
model="qwen3-asr-flash",
messages=messages,
result_format="message",
asr_options={
#"language": "zh",
"enable_itn": False
}
)
print(response)
Response body
| Field | Type | Description |
|---|---|---|
request_id |
string | Unique identifier for this request. The Java SDK returns this as requestId. |
output.choices |
array | Model output. Returned when result_format is message. |
output.choices[].finish_reason |
string | null: generation in progress. stop: finished naturally. length: output exceeded maximum length. |
output.choices[].message.role |
string | Always assistant. |
output.choices[].message.content[].text |
string | The transcribed text. |
output.choices[].message.annotations |
array | Audio metadata. Same structure as choices fields (OpenAI compatible): type, language, emotion. |
usage.input_tokens_details.text_tokens |
integer | Ignore this field. |
usage.output_tokens_details.text_tokens |
integer | Output text token count. |
usage.seconds |
integer | Audio duration in seconds. |
Response example
{
"output": {
"choices": [
{
"finish_reason": "stop",
"message": {
"annotations": [
{
"language": "zh",
"type": "audio_info",
"emotion": "neutral"
}
],
"content": [
{
"text": "Welcome to Alibaba Cloud."
}
],
"role": "assistant"
}
}
]
},
"usage": {
"input_tokens_details": {
"text_tokens": 0
},
"output_tokens_details": {
"text_tokens": 6
},
"seconds": 1
},
"request_id": "568e2bf0-d6f2-97f8-9f15-a57b11dc6977"
}
DashScope asynchronous
Use asynchronous mode (Qwen3-ASR-Flash-Filetrans) for long audio files or batch processing. Uses submit-poll workflow to avoid timeouts.
How it works
-
Submit task: Send audio file URL. Server validates and returns
task_id. -
Poll for result: Query result endpoint with
task_iduntil status isSUCCEEDED.
SDK vs. RESTful API:
| Approach | Submit | Poll |
|---|---|---|
| SDK | Call async_call() (Python) or asyncCall() (Java). Returns task object with task_id. |
Call fetch() with the task object. The SDK handles polling automatically. |
| RESTful API | POST to the submit endpoint. Parse task_id from the response. |
GET the result endpoint with task_id. Implement polling logic manually. |
For SDK examples, see Getting started.
Step 1: Submit a task
Endpoints
| Deployment mode | HTTP endpoint | SDK base_url |
|---|---|---|
| International (Singapore) | POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription |
https://dashscope-intl.aliyuncs.com/api/v1 |
| Chinese Mainland (Beijing) | POST https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription |
https://dashscope.aliyuncs.com/api/v1 |
Request body
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | Set to qwen3-asr-flash-filetrans. |
input.file_url |
string | Yes | Publicly accessible URL of the audio file. See Audio input formats for OSS URL restrictions. |
parameters.language |
string | No | No default. Audio language hint. See Supported languages. |
parameters.enable_itn |
boolean | No | Default: false. Enable Inverse Text Normalization (ITN). Chinese and English only. |
parameters.enable_words |
boolean | No | Default: false. Returns word-level timestamps. When true, sentence segmentation uses VAD and punctuation; when false, VAD only. Supported languages: Chinese, English, Japanese, Korean, German, French, Spanish, Italian, Portuguese, Russian. |
parameters.text |
string | No | Context text for biasing. 10,000-token limit. See Context biasing. |
parameters.channel_id |
array | No | Default: [0]. Audio track indexes to recognize in multi-channel audio (index starts from 0). Example: [0, 1] recognizes first two tracks. |
Each audio track is billed separately. Example: [0, 1] incurs two charges.
Sample code
cURL
curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header "Content-Type: application/json" \
--header "X-DashScope-Async: enable" \
--data '{
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
},
"parameters": {
"channel_id": [0],
"enable_itn": false
}
}'
Java
For SDK examples, see Getting started.
import com.google.gson.Gson;
import com.google.gson.annotations.SerializedName;
import okhttp3.*;
import java.io.IOException;
public class Main {
// International (Singapore). Beijing: https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
private static final String API_URL = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription";
public static void main(String[] args) {
String apiKey = System.getenv("DASHSCOPE_API_KEY");
OkHttpClient client = new OkHttpClient();
Gson gson = new Gson();
String payloadJson = """
{
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
},
"parameters": {
"channel_id": [0],
"enable_itn": false
}
}
""";
RequestBody body = RequestBody.create(payloadJson, MediaType.get("application/json; charset=utf-8"));
Request request = new Request.Builder()
.url(API_URL)
.addHeader("Authorization", "Bearer " + apiKey)
.addHeader("Content-Type", "application/json")
.addHeader("X-DashScope-Async", "enable")
.post(body)
.build();
try (Response response = client.newCall(request).execute()) {
if (response.isSuccessful() && response.body() != null) {
String respBody = response.body().string();
ApiResponse apiResp = gson.fromJson(respBody, ApiResponse.class);
if (apiResp.output != null) {
System.out.println("task_id: " + apiResp.output.taskId);
} else {
System.out.println(respBody);
}
} else {
System.out.println("Task failed. HTTP code: " + response.code());
if (response.body() != null) {
System.out.println(response.body().string());
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
static class ApiResponse {
@SerializedName("request_id")
String requestId;
Output output;
}
static class Output {
@SerializedName("task_id")
String taskId;
@SerializedName("task_status")
String taskStatus;
}
}
Python
For SDK examples, see Getting started.
import requests
import json
import os
# International (Singapore). Beijing: https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription"
DASHSCOPE_API_KEY = os.getenv("DASHSCOPE_API_KEY")
headers = {
"Authorization": f"Bearer {DASHSCOPE_API_KEY}",
"Content-Type": "application/json",
"X-DashScope-Async": "enable"
}
payload = {
"model": "qwen3-asr-flash-filetrans",
"input": {
"file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
},
"parameters": {
"channel_id": [0],
# "language": "zh",
"enable_itn": False
}
}
response = requests.post(url, headers=headers, data=json.dumps(payload))
if response.status_code == 200:
print(f"task_id: {response.json()['output']['task_id']}")
else:
print("Task failed.")
print(response.json())
Response body
{
"request_id": "92e3decd-0c69-47a8-************",
"output": {
"task_id": "8fab76d0-0eed-4d20-************",
"task_status": "PENDING"
}
}
| Field | Type | Description |
|---|---|---|
request_id |
string | Unique identifier for this request. |
output.task_id |
string | Task ID. Use this to poll for results. |
output.task_status |
string | Task status: PENDING, RUNNING, SUCCEEDED, FAILED, or UNKNOWN. |
Step 2: Get the result
Endpoints
| Deployment mode | HTTP endpoint | SDK base_url |
|---|---|---|
| International (Singapore) | GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id} |
https://dashscope-intl.aliyuncs.com/api/v1 |
| Chinese Mainland (Beijing) | GET https://dashscope.aliyuncs.com/api/v1/tasks/{task_id} |
https://dashscope.aliyuncs.com/api/v1 |
Replace {task_id} with task_id from Step 1.
Sample code
cURL
curl --location --request GET 'https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header "X-DashScope-Async: enable" \
--header "Content-Type: application/json"
Java
For SDK examples, see Getting started.
import okhttp3.*;
import java.io.IOException;
public class Main {
public static void main(String[] args) {
String taskId = "<your-task-id>";
String apiKey = System.getenv("DASHSCOPE_API_KEY");
// International (Singapore). Beijing: https://dashscope.aliyuncs.com/api/v1/tasks/
String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/tasks/" + taskId;
OkHttpClient client = new OkHttpClient();
Request request = new Request.Builder()
.url(apiUrl)
.addHeader("Authorization", "Bearer " + apiKey)
.addHeader("X-DashScope-Async", "enable")
.addHeader("Content-Type", "application/json")
.get()
.build();
try (Response response = client.newCall(request).execute()) {
if (response.body() != null) {
System.out.println(response.body().string());
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
Python
For SDK examples, see Getting started.
import os
import requests
DASHSCOPE_API_KEY = os.getenv("DASHSCOPE_API_KEY")
task_id = "<your-task-id>"
# International (Singapore). Beijing: https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}
url = f"https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}"
headers = {
"Authorization": f"Bearer {DASHSCOPE_API_KEY}",
"X-DashScope-Async": "enable",
"Content-Type": "application/json"
}
response = requests.get(url, headers=headers)
print(response.json())
Response body
| Field | Type | Description |
|---|---|---|
request_id |
string | Unique identifier for this request. |
output.task_id |
string | Task ID. |
output.task_status |
string | PENDING, RUNNING, SUCCEEDED, FAILED, or UNKNOWN (task does not exist). |
output.result.transcription_url |
string | Download URL for transcription result JSON file (valid 24 hours). See Transcription result format. |
output.submit_time |
string | Task submission time. |
output.scheduled_time |
string | Task execution start time. |
output.end_time |
string | Task completion time. |
output.task_metrics.TOTAL |
integer | Total subtask count. |
output.task_metrics.SUCCEEDED |
integer | Successful subtask count. |
output.task_metrics.FAILED |
integer | Failed subtask count. |
output.code |
string | Error code. Returned only on failure. |
output.message |
string | Error message. Returned only on failure. |
usage.seconds |
integer | Audio duration in seconds. |
Response examples
RUNNING
{
"request_id": "6769df07-2768-4fb0-ad59-************",
"output": {
"task_id": "9be1700a-0f8e-4778-be74-************",
"task_status": "RUNNING",
"submit_time": "2025-10-27 14:19:31.150",
"scheduled_time": "2025-10-27 14:19:31.233",
"task_metrics": {
"TOTAL": 1,
"SUCCEEDED": 0,
"FAILED": 0
}
}
}
SUCCEEDED
{
"request_id": "1dca6c0a-0ed1-4662-aa39-************",
"output": {
"task_id": "8fab76d0-0eed-4d20-929f-************",
"task_status": "SUCCEEDED",
"submit_time": "2025-10-27 13:57:45.948",
"scheduled_time": "2025-10-27 13:57:46.018",
"end_time": "2025-10-27 13:57:47.079",
"result": {
"transcription_url": "http://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/..."
}
},
"usage": {
"seconds": 3
}
}
FAILED
{
"request_id": "3d141841-858a-466a-9ff9-************",
"output": {
"task_id": "c58c7951-7789-4557-9ea3-************",
"task_status": "FAILED",
"submit_time": "2025-10-27 15:06:06.915",
"scheduled_time": "2025-10-27 15:06:06.967",
"end_time": "2025-10-27 15:06:07.584",
"code": "FILE_403_FORBIDDEN",
"message": "FILE_403_FORBIDDEN"
}
}
Transcription result format
transcription_url returns a JSON file with complete results. Download or read via HTTP GET (expires after 24 hours).
{
"file_url": "https://***.wav",
"audio_info": {
"format": "wav",
"sample_rate": 16000
},
"transcripts": [
{
"channel_id": 0,
"text": "Senior staff, Principal Doris Jackson, Wakefield faculty, and of course my fellow classmates. I am honored to have been chosen to speak before my classmates along with the students across America today.",
"sentences": [
{
"sentence_id": 0,
"begin_time": 240,
"end_time": 6720,
"language": "en",
"emotion": "happy",
"text": "Senior staff, Principal Doris Jackson, Wakefield faculty, and of course my fellow classmates.",
"words": [
{
"begin_time": 240,
"end_time": 1120,
"text": "Senior ",
"punctuation": ""
},
{
"begin_time": 1120,
"end_time": 1200,
"text": "staff",
"punctuation": ","
}
]
}
]
}
]
}
| Field | Type | Description |
|---|---|---|
file_url |
string | URL of the recognized audio file. |
audio_info.format |
string | Audio format (for example, wav, mp3). |
audio_info.sample_rate |
integer | Audio sampling rate in Hz. |
transcripts |
array | Recognition results, one element per audio track. |
transcripts[].channel_id |
integer | Audio track index, starting from 0. |
transcripts[].text |
string | Full transcribed text for this track. |
transcripts[].sentences |
array | Sentence-level results. |
transcripts[].sentences[].sentence_id |
integer | Sentence index, starting from 0. |
transcripts[].sentences[].begin_time |
integer | Sentence start timestamp in milliseconds. |
transcripts[].sentences[].end_time |
integer | Sentence end timestamp in milliseconds. |
transcripts[].sentences[].text |
string | Transcribed text for this sentence. |
transcripts[].sentences[].language |
string | Detected language code. See Supported languages. |
transcripts[].sentences[].emotion |
string | Detected emotion: surprised, neutral, happy, sad, disgusted, angry, or fearful. |
transcripts[].sentences[].words |
array | Word-level results. Returned only when enable_words is true. |
transcripts[].sentences[].words[].begin_time |
integer | Word start timestamp in milliseconds. |
transcripts[].sentences[].words[].end_time |
integer | Word end timestamp in milliseconds. |
transcripts[].sentences[].words[].text |
string | Recognized word. |
transcripts[].sentences[].words[].punctuation |
string | Punctuation mark following this word. |
Supported languages
Specify a language code in language to improve accuracy. Omit for multilingual audio.
| Code | Language |
|---|---|
zh |
Chinese (Mandarin, Sichuanese, Minnan, and Wu) |
yue |
Cantonese |
en |
English |
ja |
Japanese |
de |
German |
ko |
Korean |
ru |
Russian |
fr |
French |
pt |
Portuguese |
ar |
Arabic |
it |
Italian |
es |
Spanish |
hi |
Hindi |
id |
Indonesian |
th |
Thai |
tr |
Turkish |
uk |
Ukrainian |
vi |
Vietnamese |
cs |
Czech |
da |
Danish |
fil |
Filipino |
fi |
Finnish |
is |
Icelandic |
ms |
Malay |
no |
Norwegian |
pl |
Polish |
sv |
Swedish |