Audio file recognition - Qwen - Alibaba Cloud Model Studio

Qwen's audio file recognition models convert recorded audio into text. They support features such as multilingual recognition, singing voice recognition, and noise rejection.

Core features

Multilingual recognition: Supports speech recognition for multiple languages, including Mandarin and various dialects such as Cantonese and Sichuanese.
Adaptation to complex environments: Can handle complex acoustic environments. Supports automatic language detection and intelligent filtering of non-human sounds.
Singing voice recognition: Can transcribe an entire song, even with background music (BGM).
Emotion recognition: Supports recognition of multiple emotional states, including surprise, calm, happiness, sadness, disgust, anger, and fear.

Availability

Supported models:

The service provides two core models:

Qwen3-ASR-Flash-Filetrans: Designed for asynchronous recognition of long audio files up to 12 hours. It is suitable for scenarios such as transcribing meetings and interviews.
Qwen3-ASR-Flash: Designed for synchronous or streaming recognition of short audio files up to 5 minutes. It is suitable for scenarios such as voice messaging and real-time captions.

International

Under the international deployment mode, both the endpoint and data storage are located in the Singapore region. Model inference compute resources are dynamically scheduled globally (excluding the Chinese mainland).

When you call the following models, select an API key from the Singapore region:

Qwen3-ASR-Flash-Filetrans: qwen3-asr-flash-filetrans (stable version, currently equivalent to qwen3-asr-flash-filetrans-2025-11-17), qwen3-asr-flash-filetrans-2025-11-17 (snapshot version)
Qwen3-ASR-Flash: qwen3-asr-flash (stable version, currently equivalent to qwen3-asr-flash-2025-09-08), qwen3-asr-flash-2026-02-10 (latest snapshot version), qwen3-asr-flash-2025-09-08 (snapshot version)

US

Under the US deployment mode, both the endpoint and data storage are located in the US (Virginia) region. Model inference compute resources are restricted to the US.

When you call the following model, select an API key from the US region:

Qwen3-ASR-Flash: qwen3-asr-flash-us (stable version, currently equivalent to qwen3-asr-flash-2025-09-08-us), qwen3-asr-flash-2025-09-08-us (snapshot version)

Chinese Mainland

Under the Chinese mainland deployment mode, both the endpoint and data storage are located in the Beijing region. Model inference compute resources are restricted to the Chinese mainland.

When you call the following models, select an API key from the Beijing region:

Qwen3-ASR-Flash-Filetrans: qwen3-asr-flash-filetrans (stable version, currently equivalent to qwen3-asr-flash-filetrans-2025-11-17), qwen3-asr-flash-filetrans-2025-11-17 (snapshot version)
Qwen3-ASR-Flash: qwen3-asr-flash (stable version, currently equivalent to qwen3-asr-flash-2025-09-08), qwen3-asr-flash-2026-02-10 (latest snapshot version), qwen3-asr-flash-2025-09-08 (snapshot version)

For more information, see Model list.

Model selection

Scenario	Recommended	Reason	Notes
Long audio recognition	qwen3-asr-flash-filetrans	Supports recordings up to 12 hours long. Provides emotion recognition and sentence/word-level timestamps, suitable for later indexing and analysis.	The audio file size cannot exceed 2 GB, and the duration cannot exceed 12 hours.
Short audio recognition	qwen3-asr-flash or qwen3-asr-flash-us	Short audio recognition, low latency.	The audio file size cannot exceed 10 MB, and the duration cannot exceed 5 minutes.
Customer service quality inspection	qwen3-asr-flash-filetrans, qwen3-asr-flash, or qwen3-asr-flash-us	Can analyze customer emotions.	Does not support sensitive words filter or speaker diarization. Select the appropriate model based on the audio duration.
Caption generation for news or interviews	qwen3-asr-flash-filetrans	Long audio, punctuation prediction, and timestamps allow for direct generation of structured captions.	Requires post-processing to generate standard subtitle files. Select the appropriate model based on the audio duration.
Multilingual video localization	qwen3-asr-flash-filetrans, qwen3-asr-flash, or qwen3-asr-flash-us	Covers multiple languages and dialects, suitable for cross-language caption production.	Select the appropriate model based on the audio duration.
Singing audio analysis	qwen3-asr-flash-filetrans, qwen3-asr-flash, or qwen3-asr-flash-us	Recognizes lyrics and analyzes emotions, suitable for song indexing and recommendations.	Select the appropriate model based on the audio duration.

For more information, see Compare models.

Getting started

Before you begin, get an API key. To use an SDK, install the latest version of the SDK.

DashScope

Qwen3-ASR-Flash-Filetrans

Qwen3-ASR-Flash-Filetrans is designed for asynchronous transcription of audio files and supports recordings up to 12 hours long. This model requires a publicly accessible URL of an audio file as input and does not support direct uploads of local files. It is a non-streaming API that returns the complete recognition result after the task completes.

cURL

When you use cURL for speech recognition, first submit a task to get a task ID (task_id), and then use the ID to retrieve the task result.

Submit a task

# ======= Important =======
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
# The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===

curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-Async: enable" \
-d '{
    "model": "qwen3-asr-flash-filetrans",
    "input": {
        "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
    },
    "parameters": {
        "channel_id":[
            0
        ], 
        "enable_itn": false,
        "enable_words": true
    }
}'

Get the task result

# ======= Important =======
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}. Replace {task_id} with your actual task ID.
# The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===

curl -X GET 'https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "X-DashScope-Async: enable" \
-H "Content-Type: application/json"

Complete example

Java

import com.google.gson.Gson;
import com.google.gson.annotations.SerializedName;
import okhttp3.*;

import java.io.IOException;
import java.util.concurrent.TimeUnit;

public class Main {
    // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
    private static final String API_URL_SUBMIT = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription";
    // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/tasks/
    private static final String API_URL_QUERY = "https://dashscope-intl.aliyuncs.com/api/v1/tasks/";
    private static final Gson gson = new Gson();

    public static void main(String[] args) {
        // The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        // If you have not configured environment variables, replace the following line with: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        OkHttpClient client = new OkHttpClient();

        // 1. Submit task
        /*String payloadJson = """
                {
                    "model": "qwen3-asr-flash-filetrans",
                    "input": {
                        "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    },
                    "parameters": {
                        "channel_id": [0],
                        "enable_itn": false,
                        "language": "zh"
                    }
                }
                """;*/
        String payloadJson = """
                {
                    "model": "qwen3-asr-flash-filetrans",
                    "input": {
                        "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    },
                    "parameters": {
                        "channel_id": [0],
                        "enable_itn": false,
                        "enable_words": true
                    }
                }
                """;

        RequestBody body = RequestBody.create(payloadJson, MediaType.get("application/json; charset=utf-8"));
        Request submitRequest = new Request.Builder()
                .url(API_URL_SUBMIT)
                .addHeader("Authorization", "Bearer " + apiKey)
                .addHeader("Content-Type", "application/json")
                .addHeader("X-DashScope-Async", "enable")
                .post(body)
                .build();

        String taskId = null;

        try (Response response = client.newCall(submitRequest).execute()) {
            if (response.isSuccessful() && response.body() != null) {
                String respBody = response.body().string();
                ApiResponse apiResp = gson.fromJson(respBody, ApiResponse.class);
                if (apiResp.output != null) {
                    taskId = apiResp.output.taskId;
                    System.out.println("Task submitted. task_id: " + taskId);
                } else {
                    System.out.println("Submission response content: " + respBody);
                    return;
                }
            } else {
                System.out.println("Task submission failed! HTTP code: " + response.code());
                if (response.body() != null) {
                    System.out.println(response.body().string());
                }
                return;
            }
        } catch (IOException e) {
            e.printStackTrace();
            return;
        }

        // 2. Poll task status
        boolean finished = false;
        while (!finished) {
            try {
                TimeUnit.SECONDS.sleep(2);  // Wait for 2 seconds before querying again
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                return;
            }

            String queryUrl = API_URL_QUERY + taskId;
            Request queryRequest = new Request.Builder()
                    .url(queryUrl)
                    .addHeader("Authorization", "Bearer " + apiKey)
                    .addHeader("X-DashScope-Async", "enable")
                    .addHeader("Content-Type", "application/json")
                    .get()
                    .build();

            try (Response response = client.newCall(queryRequest).execute()) {
                if (response.body() != null) {
                    String queryResponse = response.body().string();
                    ApiResponse apiResp = gson.fromJson(queryResponse, ApiResponse.class);

                    if (apiResp.output != null && apiResp.output.taskStatus != null) {
                        String status = apiResp.output.taskStatus;
                        System.out.println("Current task status: " + status);
                        if ("SUCCEEDED".equalsIgnoreCase(status)
                                || "FAILED".equalsIgnoreCase(status)
                                || "UNKNOWN".equalsIgnoreCase(status)) {
                            finished = true;
                            System.out.println("Task completed. Final result: ");
                            System.out.println(queryResponse);
                        }
                    } else {
                        System.out.println("Query response content: " + queryResponse);
                    }
                }
            } catch (IOException e) {
                e.printStackTrace();
                return;
            }
        }
    }

    static class ApiResponse {
        @SerializedName("request_id")
        String requestId;
        Output output;
    }

    static class Output {
        @SerializedName("task_id")
        String taskId;
        @SerializedName("task_status")
        String taskStatus;
    }
}

Python

import os
import time
import requests
import json

# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
API_URL_SUBMIT = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription"
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/tasks/
API_URL_QUERY_BASE = "https://dashscope-intl.aliyuncs.com/api/v1/tasks/"


def main():
    # The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you have not configured environment variables, replace the following line with: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")

    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
        "X-DashScope-Async": "enable"
    }

    # 1. Submit the task
    payload = {
        "model": "qwen3-asr-flash-filetrans",
        "input": {
            "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
        },
        "parameters": {
            "channel_id": [0],
            # "language": "zh",
            "enable_itn": False,
            "enable_words": True
        }
    }

    print("Submitting ASR transcription task...")
    try:
        submit_resp = requests.post(API_URL_SUBMIT, headers=headers, data=json.dumps(payload))
    except requests.RequestException as e:
        print(f"Failed to submit task request: {e}")
        return

    if submit_resp.status_code != 200:
        print(f"Task submission failed! HTTP code: {submit_resp.status_code}")
        print(submit_resp.text)
        return

    resp_data = submit_resp.json()
    output = resp_data.get("output")
    if not output or "task_id" not in output:
        print("Abnormal submission response content:", resp_data)
        return

    task_id = output["task_id"]
    print(f"Task submitted. task_id: {task_id}")

    # 2. Poll the task status
    finished = False
    while not finished:
        time.sleep(2)  # Wait for 2 seconds before querying again

        query_url = API_URL_QUERY_BASE + task_id
        try:
            query_resp = requests.get(query_url, headers=headers)
        except requests.RequestException as e:
            print(f"Failed to query task: {e}")
            return

        if query_resp.status_code != 200:
            print(f"Task query failed! HTTP code: {query_resp.status_code}")
            print(query_resp.text)
            return

        query_data = query_resp.json()
        output = query_data.get("output")
        if output and "task_status" in output:
            status = output["task_status"]
            print(f"Current task status: {status}")

            if status.upper() in ("SUCCEEDED", "FAILED", "UNKNOWN"):
                finished = True
                print("Task completed. Final result:")
                print(json.dumps(query_data, indent=2, ensure_ascii=False))
        else:
            print("Query response content:", query_data)


if __name__ == "__main__":
    main()

Java SDK

import com.alibaba.dashscope.audio.qwen_asr.*;
import com.alibaba.dashscope.utils.Constants;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import com.google.gson.JsonObject;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.ArrayList;
import java.util.HashMap;

public class Main {
    public static void main(String[] args) {
        // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
        QwenTranscriptionParam param =
                QwenTranscriptionParam.builder()
                        // The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
                        // If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
                        .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                        .model("qwen3-asr-flash-filetrans")
                        .fileUrl("https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_example_1.wav")
                        //.parameter("language", "zh")
                        //.parameter("channel_id", new ArrayList<String>(){{add("0");add("1");}})
                        .parameter("enable_itn", false)
                        .parameter("enable_words", true)
                        .build();
        try {
            QwenTranscription transcription = new QwenTranscription();
            // Submit the task
            QwenTranscriptionResult result = transcription.asyncCall(param);
            System.out.println("create task result: " + result);
            // Query the task status
            result = transcription.fetch(QwenTranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
            System.out.println("task status: " + result);
            // Wait for the task to complete
            result =
                    transcription.wait(
                            QwenTranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
            System.out.println("task result: " + result);
            // Get the speech recognition result
            QwenTranscriptionTaskResult taskResult = result.getResult();
            if (taskResult != null) {
                // Get the URL of the recognition result
                String transcriptionUrl = taskResult.getTranscriptionUrl();
                // Get the result from the URL
                HttpURLConnection connection =
                        (HttpURLConnection) new URL(transcriptionUrl).openConnection();
                connection.setRequestMethod("GET");
                connection.connect();
                BufferedReader reader =
                        new BufferedReader(new InputStreamReader(connection.getInputStream()));
                // Format and print the JSON result
                Gson gson = new GsonBuilder().setPrettyPrinting().create();
                System.out.println(gson.toJson(gson.fromJson(reader, JsonObject.class)));
            }
        } catch (Exception e) {
            System.out.println("error: " + e);
        }
    }
}

Python SDK

import json
import os
import sys
from http import HTTPStatus

import dashscope
from dashscope.audio.qwen_asr import QwenTranscription
from dashscope.api_entities.dashscope_response import TranscriptionResponse


# run the transcription script
if __name__ == '__main__':
    # The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you have not configured environment variables, replace the following line with: dashscope.api_key = "sk-xxx"
    dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

    # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
    dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
    task_response = QwenTranscription.async_call(
        model='qwen3-asr-flash-filetrans',
        file_url='https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_example_1.wav',
        #language="",
        enable_itn=False,
        enable_words=True
    )
    print(f'task_response: {task_response}')
    print(task_response.output.task_id)
    query_response = QwenTranscription.fetch(task=task_response.output.task_id)
    print(f'query_response: {query_response}')
    task_result = QwenTranscription.wait(task=task_response.output.task_id)
    print(f'task_result: {task_result}')

Qwen3-ASR-Flash

Qwen3-ASR-Flash supports recordings up to 5 minutes long. This model accepts a publicly accessible audio file URL or a direct upload of a local file as input. It can also return recognition results as a stream.

Input: Audio file URL

Python SDK

import os
import dashscope

# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {"role": "user", "content": [{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"}]}
]

response = dashscope.MultiModalConversation.call(
    # The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you have not configured environment variables, replace the following line with: api_key = "sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
    model="qwen3-asr-flash",
    messages=messages,
    result_format="message",
    asr_options={
        #"language": "zh", # Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
        "enable_itn":False
    }
)
print(response)

Java SDK

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;

public class Main {
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3")))
                .build();

        MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
                .build();

        Map<String, Object> asrOptions = new HashMap<>();
        asrOptions.put("enable_itn", false);
        // asrOptions.put("language", "zh"); // Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                // If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
                .model("qwen3-asr-flash")
                .message(sysMessage)
                .message(userMessage)
                .parameter("asr_options", asrOptions)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(JsonUtils.toJson(result));
    }
    public static void main(String[] args) {
        try {
            // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
            Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

cURL

# ======= Important =======
# The following URL is for the Singapore region. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you use a model in the US region, add the "us" suffix
# === Delete this comment before execution ===

curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-asr-flash",
    "input": {
        "messages": [
            {
                "content": [
                    {
                        "text": ""
                    }
                ],
                "role": "system"
            },
            {
                "content": [
                    {
                        "audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    }
                ],
                "role": "user"
            }
        ]
    },
    "parameters": {
        "asr_options": {
            "enable_itn": false
        }
    }
}'

Input: Base64-encoded audio file

Input Base64-encoded data (Data URL) in the format: data:<mediatype>;base64,<data>.

<mediatype>: MIME type
Varies by audio format, for example:
- WAV: audio/wav
- MP3: audio/mpeg
<data>: Base64-encoded string of the audio
Base64 encoding increases file size. Keep the original file small enough so the encoded data stays within the 10 MB input limit.

Example: data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9

See example code

Python

import base64, pathlib

# input.mp3 is a local audio file for voice cloning. Replace it with your own audio file path and ensure it meets audio requirements
file_path = pathlib.Path("input.mp3")
base64_str = base64.b64encode(file_path.read_bytes()).decode()
data_uri = f"data:audio/mpeg;base64,{base64_str}"

Java

import java.nio.file.*;
import java.util.Base64;

public class Main {
    /**
     * filePath is a local audio file for voice cloning. Replace it with your own audio file path and ensure it meets audio requirements
     */
    public static String toDataUrl(String filePath) throws Exception {
        byte[] bytes = Files.readAllBytes(Paths.get(filePath));
        String encoded = Base64.getEncoder().encodeToString(bytes);
        return "data:audio/mpeg;base64," + encoded;
    }

    // Usage example
    public static void main(String[] args) throws Exception {
        System.out.println(toDataUrl("input.mp3"));
    }
}

Python SDK

The example uses the audio file: welcome.mp3.

import base64
import dashscope
import os
import pathlib

# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# Replace with your actual audio file path
file_path = "welcome.mp3"
# Replace with your actual audio file MIME type
audio_mime_type = "audio/mpeg"

file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
    raise FileNotFoundError(f"Audio file not found: {file_path}")

base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"

messages = [
    {"role": "user", "content": [{"audio": data_uri}]}
]
response = dashscope.MultiModalConversation.call(
    # The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you have not configured environment variables, replace the following line with: api_key = "sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
    model="qwen3-asr-flash",
    messages=messages,
    result_format="message",
    asr_options={
        # "language": "zh", # Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
        "enable_itn":False
    }
)
print(response)

Java SDK

The example uses the audio file: welcome.mp3.

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.*;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;

public class Main {
    // Replace with your actual audio file path
    private static final String AUDIO_FILE = "welcome.mp3";
    // Replace with your actual audio file MIME type
    private static final String AUDIO_MIME_TYPE = "audio/mpeg";

    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException, IOException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("audio", toDataUrl())))
                .build();

        MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
                .build();

        Map<String, Object> asrOptions = new HashMap<>();
        asrOptions.put("enable_itn", false);
        // asrOptions.put("language", "zh"); // Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                // If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
                .model("qwen3-asr-flash")
                .message(sysMessage)
                .message(userMessage)
                .parameter("asr_options", asrOptions)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(JsonUtils.toJson(result));
    }

    public static void main(String[] args) {
        try {
            // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
            Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }

    // Generate data URI
    public static String toDataUrl() throws IOException {
        byte[] bytes = Files.readAllBytes(Paths.get(AUDIO_FILE));
        String encoded = Base64.getEncoder().encodeToString(bytes);
        return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
    }
}

Input: Absolute path to local audio file

When using the DashScope SDK to process local image files, you must provide the file path. The following table shows how to construct the file path for your specific scenario and operating system.

System	SDK	Input file path	Example
Linux or macOS	Python SDK	file://{absolute file path}	file:///home/images/test.png
Linux or macOS	Java SDK	file://{absolute file path}	file:///home/images/test.png
Windows	Python SDK	file://{absolute file path}	file://D:/images/test.png
Windows	Java SDK	file:///{absolute file path}	file:///D:images/test.png

Important

When using local files, the API call limit is 100 QPS and cannot be scaled. Do not use this method in production environments, high-concurrency scenarios, or stress testing. For higher concurrency, upload files to OSS and call the API using the audio file URL.

Python SDK

The example uses the audio file: welcome.mp3.

import os
import dashscope

# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path to your local audio file
audio_file_path = "file://ABSOLUTE_PATH/welcome.mp3"

messages = [
    {"role": "user", "content": [{"audio": audio_file_path}]}
]
response = dashscope.MultiModalConversation.call(
    # The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you have not configured environment variables, replace the following line with: api_key = "sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
    model="qwen3-asr-flash",
    messages=messages,
    result_format="message",
    asr_options={
        # "language": "zh", # Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
        "enable_itn":False
    }
)
print(response)

Java SDK

The example uses the audio file: welcome.mp3.

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;

public class Main {
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        // Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path to your local file
        String localFilePath = "file://ABSOLUTE_PATH/welcome.mp3";
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("audio", localFilePath)))
                .build();

        MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
                .build();

        Map<String, Object> asrOptions = new HashMap<>();
        asrOptions.put("enable_itn", false);
        // asrOptions.put("language", "zh"); // Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                // If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
                .model("qwen3-asr-flash")
                .message(sysMessage)
                .message(userMessage)
                .parameter("asr_options", asrOptions)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(JsonUtils.toJson(result));
    }
    public static void main(String[] args) {
        try {
            // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
            Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

Streaming output

The model generates results incrementally rather than all at once. Non-streaming output waits until the model finishes generating and then returns the complete result. Streaming output returns intermediate results in real time, letting you read results as they are generated and reducing wait time. Set parameters differently based on your calling method to enable streaming output:

DashScope Python SDK: Set the stream parameter to true.
DashScope Java SDK: Use the streamCall interface.
DashScope HTTP: Set the X-DashScope-SSE header to enable.

Python SDK

import os
import dashscope

# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {"role": "user", "content": [{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"}]}
]
response = dashscope.MultiModalConversation.call(
    # The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you have not configured environment variables, replace the following line with: api_key = "sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
    model="qwen3-asr-flash",
    messages=messages,
    result_format="message",
    asr_options={
        # "language": "zh", # Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
        "enable_itn":False
    },
    stream=True
)

for response in response:
    try:
        print(response["output"]["choices"][0]["message"].content[0]["text"])
    except:
        pass

Java SDK

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import io.reactivex.Flowable;

public class Main {
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3")))
                .build();

        MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
                .build();

        Map<String, Object> asrOptions = new HashMap<>();
        asrOptions.put("enable_itn", false);
        // asrOptions.put("language", "zh"); // Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                // If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
                .model("qwen3-asr-flash")
                .message(sysMessage)
                .message(userMessage)
                .parameter("asr_options", asrOptions)
                .build();
        Flowable<MultiModalConversationResult> resultFlowable = conv.streamCall(param);
        resultFlowable.blockingForEach(item -> {
            try {
                System.out.println(item.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
            } catch (Exception e){
                System.exit(0);
            }
        });
    }

    public static void main(String[] args) {
        try {
            // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
            Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

cURL

# ======= Important =======
# The following URL is for the Singapore region. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you use a model in the US region, add the "us" suffix
# === Delete this comment before execution ===

curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
    "model": "qwen3-asr-flash",
    "input": {
        "messages": [
            {
                "content": [
                    {
                        "text": ""
                    }
                ],
                "role": "system"
            },
            {
                "content": [
                    {
                        "audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    }
                ],
                "role": "user"
            }
        ]
    },
    "parameters": {
        "incremental_output": true,
        "asr_options": {
            "enable_itn": false
        }
    }
}'

OpenAI compatible

Important

The US region does not support OpenAI-compatible mode.

Only the Qwen3-ASR-Flash series models support OpenAI-compatible calls. OpenAI-compatible mode only accepts publicly accessible audio file URLs and does not support local file paths.

Use OpenAI Python SDK version 1.52.0 or later, and Node.js SDK version 4.68.0 or later.

The asr_options parameter is not part of the OpenAI standard. When using the OpenAI SDK, pass it through extra_body.

Input: Audio file URL

Python SDK

from openai import OpenAI
import os

try:
    client = OpenAI(
        # The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        # If you have not configured environment variables, replace the following line with: api_key = "sk-xxx",
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # The following URL is for the Singapore/US region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    

    stream_enabled = False  # Enable streaming output
    completion = client.chat.completions.create(
        model="qwen3-asr-flash",
        messages=[
            {
                "content": [
                    {
                        "type": "input_audio",
                        "input_audio": {
                            "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                        }
                    }
                ],
                "role": "user"
            }
        ],
        stream=stream_enabled,
        # Do not set stream_options when stream is False
        # stream_options={"include_usage": True},
        extra_body={
            "asr_options": {
                # "language": "zh",
                "enable_itn": False
            }
        }
    )
    if stream_enabled:
        full_content = ""
        print("Streaming output:")
        for chunk in completion:
            # If stream_options.include_usage is True, the last chunk's choices field is an empty list and should be skipped (token usage can be obtained via chunk.usage)
            print(chunk)
            if chunk.choices and chunk.choices[0].delta.content:
                full_content += chunk.choices[0].delta.content
        print(f"Full content: {full_content}")
    else:
        print(f"Non-streaming output: {completion.choices[0].message.content}")
except Exception as e:
    print(f"Error: {e}")

Node.js SDK

// Preparation before running:
// Works on Windows/Mac/Linux:
// 1. Ensure Node.js is installed (version >= 14 recommended)
// 2. Run this command to install dependencies: npm install openai

import OpenAI from "openai";

const client = new OpenAI({
  // The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
  // If you have not configured environment variables, replace the following line with: apiKey: "sk-xxx",
  apiKey: process.env.DASHSCOPE_API_KEY,
  // The following URL is for the Singapore/US region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
  baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1", 
});

async function main() {
  try {
    const streamEnabled = false; // Enable streaming output
    const completion = await client.chat.completions.create({
      model: "qwen3-asr-flash",
      messages: [
        {
          role: "user",
          content: [
            {
              type: "input_audio",
              input_audio: {
                data: "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
              }
            }
          ]
        }
      ],
      stream: streamEnabled,
      // Do not set stream_options when stream is False
      // stream_options: {
      //   "include_usage": true
      // },
      extra_body: {
        asr_options: {
          // language: "zh",
          enable_itn: false
        }
      }
    });

    if (streamEnabled) {
      let fullContent = "";
      console.log("Streaming output:");
      for await (const chunk of completion) {
        console.log(JSON.stringify(chunk));
        if (chunk.choices && chunk.choices.length > 0) {
          const delta = chunk.choices[0].delta;
          if (delta && delta.content) {
            fullContent += delta.content;
          }
        }
      }
      console.log(`Full content: ${fullContent}`);
    } else {
      console.log(`Non-streaming output: ${completion.choices[0].message.content}`);
    }
  } catch (err) {
    console.error(`Error: ${err}`);
  }
}

main();

cURL

# ======= Important =======
# The following URL is for the Singapore/US region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===

curl -X POST 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-asr-flash",
    "messages": [
        {
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    }
                }
            ],
            "role": "user"
        }
    ],
    "stream":false,
    "asr_options": {
        "enable_itn": false
    }
}'

Input: Base64-encoded audio file

Input Base64-encoded data (Data URL) in the format: data:<mediatype>;base64,<data>.

<mediatype>: MIME type
Varies by audio format, for example:
- WAV: audio/wav
- MP3: audio/mpeg
<data>: Base64-encoded string of the audio
Base64 encoding increases file size. Keep the original file small enough so the encoded data stays within the 10 MB input limit.

Example: data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9

See example code

Python

import base64, pathlib

# input.mp3 is a local audio file for voice cloning. Replace it with your own audio file path and ensure it meets audio requirements
file_path = pathlib.Path("input.mp3")
base64_str = base64.b64encode(file_path.read_bytes()).decode()
data_uri = f"data:audio/mpeg;base64,{base64_str}"

Java

import java.nio.file.*;
import java.util.Base64;

public class Main {
    /**
     * filePath is a local audio file for voice cloning. Replace it with your own audio file path and ensure it meets audio requirements
     */
    public static String toDataUrl(String filePath) throws Exception {
        byte[] bytes = Files.readAllBytes(Paths.get(filePath));
        String encoded = Base64.getEncoder().encodeToString(bytes);
        return "data:audio/mpeg;base64," + encoded;
    }

    // Usage example
    public static void main(String[] args) throws Exception {
        System.out.println(toDataUrl("input.mp3"));
    }
}

Python SDK

The example uses the audio file: welcome.mp3.

import base64
from openai import OpenAI
import os
import pathlib

try:
    # Replace with your actual audio file path
    file_path = "welcome.mp3"
    # Replace with your actual audio file MIME type
    audio_mime_type = "audio/mpeg"

    file_path_obj = pathlib.Path(file_path)
    if not file_path_obj.exists():
        raise FileNotFoundError(f"Audio file not found: {file_path}")

    base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
    data_uri = f"data:{audio_mime_type};base64,{base64_str}"

    client = OpenAI(
        # The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        # If you have not configured environment variables, replace the following line with: api_key = "sk-xxx",
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # The following URL is for the Singapore/US region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    

    stream_enabled = False  # Enable streaming output
    completion = client.chat.completions.create(
        model="qwen3-asr-flash",
        messages=[
            {
                "content": [
                    {
                        "type": "input_audio",
                        "input_audio": {
                            "data": data_uri
                        }
                    }
                ],
                "role": "user"
            }
        ],
        stream=stream_enabled,
        # Do not set stream_options when stream is False
        # stream_options={"include_usage": True},
        extra_body={
            "asr_options": {
                # "language": "zh",
                "enable_itn": False
            }
        }
    )
    if stream_enabled:
        full_content = ""
        print("Streaming output:")
        for chunk in completion:
            # If stream_options.include_usage is True, the last chunk's choices field is an empty list and should be skipped (token usage can be obtained via chunk.usage)
            print(chunk)
            if chunk.choices and chunk.choices[0].delta.content:
                full_content += chunk.choices[0].delta.content
        print(f"Full content: {full_content}")
    else:
        print(f"Non-streaming output: {completion.choices[0].message.content}")
except Exception as e:
    print(f"Error: {e}")

Node.js SDK

The example uses the audio file: welcome.mp3.

// Preparation before running:
// Works on Windows/Mac/Linux:
// 1. Ensure Node.js is installed (version >= 14 recommended)
// 2. Run this command to install dependencies: npm install openai

import OpenAI from "openai";
import { readFileSync } from 'fs';

const client = new OpenAI({
  // The API keys for the Singapore/US and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
  // If you have not configured environment variables, replace the following line with: apiKey: "sk-xxx",
  apiKey: process.env.DASHSCOPE_API_KEY,
  // The following URL is for the Singapore/US region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
  baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1", 
});

const encodeAudioFile = (audioFilePath) => {
    const audioFile = readFileSync(audioFilePath);
    return audioFile.toString('base64');
};

// Replace with your actual audio file path
const dataUri = `data:audio/mpeg;base64,${encodeAudioFile("welcome.mp3")}`;

async function main() {
  try {
    const streamEnabled = false; // Enable streaming output
    const completion = await client.chat.completions.create({
      model: "qwen3-asr-flash",
      messages: [
        {
          role: "user",
          content: [
            {
              type: "input_audio",
              input_audio: {
                data: dataUri
              }
            }
          ]
        }
      ],
      stream: streamEnabled,
      // Do not set stream_options when stream is False
      // stream_options: {
      //   "include_usage": true
      // },
      extra_body: {
        asr_options: {
          // language: "zh",
          enable_itn: false
        }
      }
    });

    if (streamEnabled) {
      let fullContent = "";
      console.log("Streaming output:");
      for await (const chunk of completion) {
        console.log(JSON.stringify(chunk));
        if (chunk.choices && chunk.choices.length > 0) {
          const delta = chunk.choices[0].delta;
          if (delta && delta.content) {
            fullContent += delta.content;
          }
        }
      }
      console.log(`Full content: ${fullContent}`);
    } else {
      console.log(`Non-streaming output: ${completion.choices[0].message.content}`);
    }
  } catch (err) {
    console.error(`Error: ${err}`);
  }
}

main();

API reference

Audio file recognition - Qwen API reference

Compare models

The feature set for qwen3-asr-flash and qwen3-asr-flash-2025-09-08 also applies to their US (Virginia) region counterparts: qwen3-asr-flash-us and qwen3-asr-flash-2025-09-08-us.

Feature	qwen3-asr-flash-filetrans, qwen3-asr-flash-filetrans-2025-11-17	qwen3-asr-flash,qwen3-asr-flash-2026-02-10, qwen3-asr-flash-2025-09-08
Supported languages	Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish
Audio formats	aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv	aac, amr, avi, aiff, flac, flv, mkv, mp3, mpeg, ogg, opus, wav, webm, wma, wmv
Sample rate	Depends on audio format: PCM audio: 16 kHz Other formats: Any (the server resamples audio to 16 kHz before recognition)
Sound channels	Any Models handle multi-channel audio differently: Qwen3-ASR-Flash-Filetrans: Specify track index using the `channel_id` parameter Qwen3-ASR-Flash: Automatically merges multi-channel audio before processing
Input method	Publicly accessible file URL	Base64-encoded file, absolute local file path, publicly accessible file URL
Audio size/duration	File size ≤ 2 GB, duration ≤ 12 hours	File size ≤ 10 MB, duration ≤ 5 minutes
Emotion recognition	Supported Always enabled. View results via the response parameter `emotion`
Timestamps	Supported Always enabled. Control timestamp granularity via the request parameter `enable_words` Character-level timestamps are only guaranteed for: Chinese, English, Japanese, Korean, German, French, Spanish, Italian, Portuguese, Russian. Accuracy may vary for other languages.	Not supported
Punctuation prediction	Supported Always enabled
ITN	Supported Disabled by default, can be enabled. Only for Chinese and English.
Singing recognition	Supported Always enabled
Noise rejection	Supported Always enabled
Sensitive words filter	Not supported
Speaker diarization	Not supported
Filler word filtering	Not supported
VAD	Supported Always enabled	Not supported
Rate limit (RPM)	100
Connection type	DashScope: Java/Python SDK, RESTful API	DashScope: Java/Python SDK, RESTful API OpenAI: Python/Node.js SDK, RESTful API
Pricing	International: $0.000035/second US: $0.000032/second Chinese mainland: $0.000032/second

FAQ

Q: How do I provide a publicly accessible audio URL for the API?

We recommend using Object Storage Service (OSS), which provides highly available and reliable storage and easily generates public URLs.

Verify your URL is publicly accessible: Open the URL in a browser or use curl to ensure the audio file downloads or plays successfully (HTTP status code 200).

Q: How do I check if my audio format meets requirements?

Use the open-source tool ffprobe to quickly get detailed audio information:

# Check container format (format_name), codec (codec_name), sample rate (sample_rate), and number of channels (channels)
ffprobe -v error -show_entries format=format_name -show_entries stream=codec_name,sample_rate,channels -of default=noprint_wrappers=1 your_audio_file.mp3

Q: How do I process audio to meet model requirements?

Use the open-source tool FFmpeg to trim or convert audio formats:

Audio trimming: Extract a segment from a long audio file

# -i: Input file
# -ss 00:01:30: Start time (1 minute 30 seconds)
# -t 00:02:00: Duration (2 minutes)
# -c copy: Copy audio stream without re-encoding (fast)
# output_clip.wav: Output file
ffmpeg -i long_audio.wav -ss 00:01:30 -t 00:02:00 -c copy output_clip.wav

Format conversion

For example, convert any audio to 16 kHz, 16-bit, mono WAV

# -i: Input file
# -ac 1: Set to 1 channel (mono)
# -ar 16000: Set sample rate to 16000 Hz (16 kHz)
# -sample_fmt s16: Set sample format to 16-bit signed integer PCM
# output.wav: Output file
ffmpeg -i input.mp3 -ac 1 -ar 16000 -sample_fmt s16 output.wav