All Products
Search
Document Center

Alibaba Cloud Model Studio:Audio file recognition - Qwen

Last Updated:Mar 15, 2026

Qwen's audio file recognition converts recorded audio to text with multilingual recognition, singing voice transcription, and noise rejection.

Core features

  • Multilingual recognition: Supports multiple languages including Mandarin and dialects such as Cantonese and Sichuanese.

  • Adaptation to complex environments: Supports complex acoustic environments with automatic language detection and intelligent filtering of non-human sounds.

  • Singing voice recognition: Transcribe entire songs with background music (BGM).

  • Emotion recognition: Recognizes multiple emotional states including surprise, calm, happiness, sadness, disgust, anger, and fear.

Availability

Supported models:

Qwen provides two core models:

  • Qwen3-ASR-Flash-Filetrans: Designed for asynchronous recognition of long audio files up to 12 hours. Ideal for transcribing meetings and interviews.

  • Qwen3-ASR-Flash: Designed for synchronous or streaming recognition of short audio files up to 5 minutes. Ideal for voice messaging and real-time captions.

International

international deployment mode: Endpoint and data in Singapore region. Compute resources scheduled globally (excluding Chinese mainland).

Select an API key from the Singapore region:

  • Qwen3-ASR-Flash-Filetrans: qwen3-asr-flash-filetrans (stable, currently qwen3-asr-flash-filetrans-2025-11-17), qwen3-asr-flash-filetrans-2025-11-17 (snapshot)

  • Qwen3-ASR-Flash: qwen3-asr-flash (stable, currently qwen3-asr-flash-2025-09-08), qwen3-asr-flash-2026-02-10 (latest snapshot), qwen3-asr-flash-2025-09-08 (snapshot)

US

US deployment mode: Endpoint and data in US (Virginia) region. Compute resources restricted to US.

Select an API key from the US region:

Qwen3-ASR-Flash: qwen3-asr-flash-us (stable, currently qwen3-asr-flash-2025-09-08-us), qwen3-asr-flash-2025-09-08-us (snapshot)

Chinese Mainland

Chinese mainland deployment mode: Endpoint and data in Beijing region. Compute resources restricted to Chinese mainland.

Select an API key from the Beijing region:

  • Qwen3-ASR-Flash-Filetrans: qwen3-asr-flash-filetrans (stable, currently qwen3-asr-flash-filetrans-2025-11-17), qwen3-asr-flash-filetrans-2025-11-17 (snapshot)

  • Qwen3-ASR-Flash: qwen3-asr-flash (stable, currently qwen3-asr-flash-2025-09-08), qwen3-asr-flash-2026-02-10 (latest snapshot), qwen3-asr-flash-2025-09-08 (snapshot)

See Model list.

Model selection

Scenario

Recommended

Reason

Notes

Long audio recognition

qwen3-asr-flash-filetrans

Supports recordings up to 12 hours. Provides emotion recognition and timestamps for indexing and analysis.

Audio file size cannot exceed 2 GB, and duration cannot exceed 12 hours.

Short audio recognition

qwen3-asr-flash or qwen3-asr-flash-us

Short audio recognition, low latency.

Audio file size cannot exceed 10 MB, and duration cannot exceed 5 minutes.

Customer service quality inspection

qwen3-asr-flash-filetrans, qwen3-asr-flash, or qwen3-asr-flash-us

Can analyze customer emotions.

No sensitive words filter or speaker diarization. Select model by audio duration.

Caption generation for news or interviews

qwen3-asr-flash-filetrans

Long audio with punctuation and timestamps for structured captions.

Requires post-processing for standard subtitle files. Select model by audio duration.

Multilingual video localization

qwen3-asr-flash-filetrans, qwen3-asr-flash, or qwen3-asr-flash-us

Covers multiple languages and dialects, suitable for cross-language caption production.

Select model by audio duration.

Singing audio analysis

qwen3-asr-flash-filetrans, qwen3-asr-flash, or qwen3-asr-flash-us

Recognizes lyrics and analyzes emotions, suitable for song indexing and recommendations.

Select model by audio duration.

See Compare models.

Getting started

Get an API key and install the latest SDK.

DashScope

Qwen3-ASR-Flash-Filetrans

Asynchronous transcription of audio files up to 12 hours. Requires publicly accessible URL (no local file uploads). Non-streaming API returns complete result after task completion.

cURL

Submit a task to get task_id, then retrieve the result.

Submit a task

# IMPORTANT: Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Production: Use environment variables, not hardcoded keys

curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-Async: enable" \
-d '{
    "model": "qwen3-asr-flash-filetrans",
    "input": {
        "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
    },
    "parameters": {
        "channel_id":[
            0
        ], 
        "enable_itn": false,
        "enable_words": true
    }
}'

Get the task result

# IMPORTANT: Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Production: Use environment variables, not hardcoded keys

curl -X GET 'https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "X-DashScope-Async: enable" \
-H "Content-Type: application/json"

Complete example

Java

import com.google.gson.Gson;
import com.google.gson.annotations.SerializedName;
import okhttp3.*;

import java.io.IOException;
import java.util.concurrent.TimeUnit;

public class Main {
    // Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
    private static final String API_URL_SUBMIT = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription";
    // Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1/tasks/
    private static final String API_URL_QUERY = "https://dashscope-intl.aliyuncs.com/api/v1/tasks/";
    private static final Gson gson = new Gson();

    public static void main(String[] args) {
        // API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
        // If not using environment variables, replace the line below with: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        OkHttpClient client = new OkHttpClient();

        // 1. Submit task
        /*String payloadJson = """
                {
                    "model": "qwen3-asr-flash-filetrans",
                    "input": {
                        "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    },
                    "parameters": {
                        "channel_id": [0],
                        "enable_itn": false,
                        "language": "zh"
                    }
                }
                """;*/
        String payloadJson = """
                {
                    "model": "qwen3-asr-flash-filetrans",
                    "input": {
                        "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    },
                    "parameters": {
                        "channel_id": [0],
                        "enable_itn": false,
                        "enable_words": true
                    }
                }
                """;

        RequestBody body = RequestBody.create(payloadJson, MediaType.get("application/json; charset=utf-8"));
        Request submitRequest = new Request.Builder()
                .url(API_URL_SUBMIT)
                .addHeader("Authorization", "Bearer " + apiKey)
                .addHeader("Content-Type", "application/json")
                .addHeader("X-DashScope-Async", "enable")
                .post(body)
                .build();

        String taskId = null;

        try (Response response = client.newCall(submitRequest).execute()) {
            if (response.isSuccessful() && response.body() != null) {
                String respBody = response.body().string();
                ApiResponse apiResp = gson.fromJson(respBody, ApiResponse.class);
                if (apiResp.output != null) {
                    taskId = apiResp.output.taskId;
                    System.out.println("Task submitted. task_id: " + taskId);
                } else {
                    System.out.println("Submission response content: " + respBody);
                    return;
                }
            } else {
                System.out.println("Task submission failed! HTTP code: " + response.code());
                if (response.body() != null) {
                    System.out.println(response.body().string());
                }
                return;
            }
        } catch (IOException e) {
            e.printStackTrace();
            return;
        }

        // 2. Poll task status
        boolean finished = false;
        while (!finished) {
            try {
                TimeUnit.SECONDS.sleep(2);  // Wait 2 seconds before querying again
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                return;
            }

            String queryUrl = API_URL_QUERY + taskId;
            Request queryRequest = new Request.Builder()
                    .url(queryUrl)
                    .addHeader("Authorization", "Bearer " + apiKey)
                    .addHeader("X-DashScope-Async", "enable")
                    .addHeader("Content-Type", "application/json")
                    .get()
                    .build();

            try (Response response = client.newCall(queryRequest).execute()) {
                if (response.body() != null) {
                    String queryResponse = response.body().string();
                    ApiResponse apiResp = gson.fromJson(queryResponse, ApiResponse.class);

                    if (apiResp.output != null && apiResp.output.taskStatus != null) {
                        String status = apiResp.output.taskStatus;
                        System.out.println("Current task status: " + status);
                        if ("SUCCEEDED".equalsIgnoreCase(status)
                                || "FAILED".equalsIgnoreCase(status)
                                || "UNKNOWN".equalsIgnoreCase(status)) {
                            finished = true;
                            System.out.println("Task completed. Final result: ");
                            System.out.println(queryResponse);
                        }
                    } else {
                        System.out.println("Query response content: " + queryResponse);
                    }
                }
            } catch (IOException e) {
                e.printStackTrace();
                return;
            }
        }
    }

    static class ApiResponse {
        @SerializedName("request_id")
        String requestId;
        Output output;
    }

    static class Output {
        @SerializedName("task_id")
        String taskId;
        @SerializedName("task_status")
        String taskStatus;
    }
}

Python

import os
import time
import requests
import json

# Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
API_URL_SUBMIT = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription"
# Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1/tasks/
API_URL_QUERY_BASE = "https://dashscope-intl.aliyuncs.com/api/v1/tasks/"


def main():
    # API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If not using environment variables, replace the line below with: api_key = "sk-xxx"
    # Production: Use environment variables, not hardcoded keys
    api_key = os.getenv("DASHSCOPE_API_KEY")

    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
        "X-DashScope-Async": "enable"
    }

    # 1. Submit the task
    payload = {
        "model": "qwen3-asr-flash-filetrans",
        "input": {
            "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
        },
        "parameters": {
            "channel_id": [0],
            # "language": "zh",
            "enable_itn": False,
            "enable_words": True
        }
    }

    print("Submitting ASR transcription task...")
    try:
        submit_resp = requests.post(API_URL_SUBMIT, headers=headers, data=json.dumps(payload))
    except requests.RequestException as e:
        print(f"Failed to submit task request: {e}")
        return

    if submit_resp.status_code != 200:
        print(f"Task submission failed! HTTP code: {submit_resp.status_code}")
        print(submit_resp.text)
        return

    resp_data = submit_resp.json()
    output = resp_data.get("output")
    if not output or "task_id" not in output:
        print("Abnormal submission response content:", resp_data)
        return

    task_id = output["task_id"]
    print(f"Task submitted. task_id: {task_id}")

    # 2. Poll the task status
    finished = False
    while not finished:
        time.sleep(2)  # Wait 2 seconds before querying again

        query_url = API_URL_QUERY_BASE + task_id
        try:
            query_resp = requests.get(query_url, headers=headers)
        except requests.RequestException as e:
            print(f"Failed to query task: {e}")
            return

        if query_resp.status_code != 200:
            print(f"Task query failed! HTTP code: {query_resp.status_code}")
            print(query_resp.text)
            return

        query_data = query_resp.json()
        output = query_data.get("output")
        if output and "task_status" in output:
            status = output["task_status"]
            print(f"Current task status: {status}")

            if status.upper() in ("SUCCEEDED", "FAILED", "UNKNOWN"):
                finished = True
                print("Task completed. Final result:")
                print(json.dumps(query_data, indent=2, ensure_ascii=False))
        else:
            print("Query response content:", query_data)


if __name__ == "__main__":
    main()

Java SDK

import com.alibaba.dashscope.audio.qwen_asr.*;
import com.alibaba.dashscope.utils.Constants;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import com.google.gson.JsonObject;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.ArrayList;
import java.util.HashMap;

public class Main {
    public static void main(String[] args) {
        // Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
        QwenTranscriptionParam param =
                QwenTranscriptionParam.builder()
                        // API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
                        // If not using environment variables, replace the line below with: .apiKey("sk-xxx")
                        .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                        .model("qwen3-asr-flash-filetrans")
                        .fileUrl("https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_example_1.wav")
                        //.parameter("language", "zh")
                        //.parameter("channel_id", new ArrayList<String>(){{add("0");add("1");}})
                        .parameter("enable_itn", false)
                        .parameter("enable_words", true)
                        .build();
        try {
            QwenTranscription transcription = new QwenTranscription();
            // Submit the task
            QwenTranscriptionResult result = transcription.asyncCall(param);
            System.out.println("create task result: " + result);
            // Query the task status
            result = transcription.fetch(QwenTranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
            System.out.println("task status: " + result);
            // Wait for the task to complete
            result =
                    transcription.wait(
                            QwenTranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
            System.out.println("task result: " + result);
            // Get the speech recognition result
            QwenTranscriptionTaskResult taskResult = result.getResult();
            if (taskResult != null) {
                // Get the URL of the recognition result
                String transcriptionUrl = taskResult.getTranscriptionUrl();
                // Get the result from the URL
                HttpURLConnection connection =
                        (HttpURLConnection) new URL(transcriptionUrl).openConnection();
                connection.setRequestMethod("GET");
                connection.connect();
                BufferedReader reader =
                        new BufferedReader(new InputStreamReader(connection.getInputStream()));
                // Format and print the JSON result
                Gson gson = new GsonBuilder().setPrettyPrinting().create();
                System.out.println(gson.toJson(gson.fromJson(reader, JsonObject.class)));
            }
        } catch (Exception e) {
            System.out.println("error: " + e);
        }
    }
}

Python SDK

import json
import os
import sys
from http import HTTPStatus

import dashscope
from dashscope.audio.qwen_asr import QwenTranscription
from dashscope.api_entities.dashscope_response import TranscriptionResponse


# run the transcription script
if __name__ == '__main__':
    # API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If not using environment variables, replace the line below with: dashscope.api_key = "sk-xxx"
    # Production: Use environment variables, not hardcoded keys
    dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

    # Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1
    dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
    task_response = QwenTranscription.async_call(
        model='qwen3-asr-flash-filetrans',
        file_url='https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_example_1.wav',
        #language="",
        enable_itn=False,
        enable_words=True
    )
    print(f'task_response: {task_response}')
    print(task_response.output.task_id)
    query_response = QwenTranscription.fetch(task=task_response.output.task_id)
    print(f'query_response: {query_response}')
    task_result = QwenTranscription.wait(task=task_response.output.task_id)
    print(f'task_result: {task_result}')

Qwen3-ASR-Flash

Qwen3-ASR-Flash supports recordings up to 5 minutes long. This model accepts a publicly accessible audio file URL or a direct upload of a local file as input. It can also return recognition results as a stream.

Input: Audio file URL

Python SDK

import os
import dashscope

# Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {"role": "user", "content": [{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"}]}
]

response = dashscope.MultiModalConversation.call(
    # API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If not using environment variables, replace the line below with: api_key = "sk-xxx"
    # Production: Use environment variables, not hardcoded keys
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
    model="qwen3-asr-flash",
    messages=messages,
    result_format="message",
    asr_options={
        #"language": "zh", # Optional. Specify known audio language to improve accuracy.
        "enable_itn":False
    }
)
print(response)

Java SDK

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;

public class Main {
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3")))
                .build();

        MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
                .build();

        Map<String, Object> asrOptions = new HashMap<>();
        asrOptions.put("enable_itn", false);
        // asrOptions.put("language", "zh"); // Optional. Specify known audio language to improve accuracy.
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                // If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
                .model("qwen3-asr-flash")
                .message(sysMessage)
                .message(userMessage)
                .parameter("asr_options", asrOptions)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(JsonUtils.toJson(result));
    }
    public static void main(String[] args) {
        try {
            // Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
            Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

cURL

# ======= Important =======
# Singapore region URL. US: https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation | Beijing: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you use a model in the US region, add the "us" suffix
# Production: Use environment variables, not hardcoded keys

curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-asr-flash",
    "input": {
        "messages": [
            {
                "content": [
                    {
                        "text": ""
                    }
                ],
                "role": "system"
            },
            {
                "content": [
                    {
                        "audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    }
                ],
                "role": "user"
            }
        ]
    },
    "parameters": {
        "asr_options": {
            "enable_itn": false
        }
    }
}'

Input: Base64-encoded audio file

Input Base64-encoded data (Data URL) in the format: data:<mediatype>;base64,<data>.

  • <mediatype>: MIME type

    Varies by audio format, for example:

    • WAV: audio/wav

    • MP3: audio/mpeg

  • <data>: Base64-encoded string of the audio

    Base64 encoding increases file size. Keep the original file small enough so the encoded data stays within the 10 MB input limit.

  • Example: data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9

    See example code

    import base64, pathlib
    
    # input.mp3 is a local audio file for voice cloning. Replace it with your own audio file path and ensure it meets audio requirements
    file_path = pathlib.Path("input.mp3")
    base64_str = base64.b64encode(file_path.read_bytes()).decode()
    data_uri = f"data:audio/mpeg;base64,{base64_str}"
    import java.nio.file.*;
    import java.util.Base64;
    
    public class Main {
        /**
         * filePath is a local audio file for voice cloning. Replace it with your own audio file path and ensure it meets audio requirements
         */
        public static String toDataUrl(String filePath) throws Exception {
            byte[] bytes = Files.readAllBytes(Paths.get(filePath));
            String encoded = Base64.getEncoder().encodeToString(bytes);
            return "data:audio/mpeg;base64," + encoded;
        }
    
        // Usage example
        public static void main(String[] args) throws Exception {
            System.out.println(toDataUrl("input.mp3"));
        }
    }

Python SDK

The example uses the audio file: welcome.mp3.

import base64
import dashscope
import os
import pathlib

# Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# Replace with your actual audio file path
file_path = "welcome.mp3"
# Replace with your actual audio file MIME type
audio_mime_type = "audio/mpeg"

file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
    raise FileNotFoundError(f"Audio file not found: {file_path}")

base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"

messages = [
    {"role": "user", "content": [{"audio": data_uri}]}
]
response = dashscope.MultiModalConversation.call(
    # API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If not using environment variables, replace the line below with: api_key = "sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
    model="qwen3-asr-flash",
    messages=messages,
    result_format="message",
    asr_options={
        # "language": "zh", # Optional. Specify known audio language to improve accuracy.
        "enable_itn":False
    }
)
print(response)

Java SDK

The example uses the audio file: welcome.mp3.

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.*;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;

public class Main {
    // Replace with actual file path
    private static final String AUDIO_FILE = "welcome.mp3";
    // Replace with your actual audio file MIME type
    private static final String AUDIO_MIME_TYPE = "audio/mpeg";

    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException, IOException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("audio", toDataUrl())))
                .build();

        MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
                .build();

        Map<String, Object> asrOptions = new HashMap<>();
        asrOptions.put("enable_itn", false);
        // asrOptions.put("language", "zh"); // Optional. Specify known audio language to improve accuracy.
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                // If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
                .model("qwen3-asr-flash")
                .message(sysMessage)
                .message(userMessage)
                .parameter("asr_options", asrOptions)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(JsonUtils.toJson(result));
    }

    public static void main(String[] args) {
        try {
            // Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
            Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }

    // Generate data URI
    public static String toDataUrl() throws IOException {
        byte[] bytes = Files.readAllBytes(Paths.get(AUDIO_FILE));
        String encoded = Base64.getEncoder().encodeToString(bytes);
        return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
    }
}

Input: Absolute path to local audio file

DashScope SDK requires file paths for local files. Construct paths per scenario and OS:

System

SDK

Input file path

Example

Linux or macOS

Python SDK

file://{absolute file path}

file:///home/images/test.png

Java SDK

Windows

Python SDK

file://{absolute file path}

file://D:/images/test.png

Java SDK

file:///{absolute file path}

file:///D:images/test.png

Important

When using local files, the API call limit is 100 QPS and cannot be scaled. Do not use this method in production environments, high-concurrency scenarios, or stress testing. For higher concurrency, upload files to OSS and call the API using the audio file URL.

Python SDK

The example uses the audio file: welcome.mp3.

import os
import dashscope

# Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

# Replace ABSOLUTE_PATH with your local file path
audio_file_path = "file://ABSOLUTE_PATH/welcome.mp3"

messages = [
    {"role": "user", "content": [{"audio": audio_file_path}]}
]
response = dashscope.MultiModalConversation.call(
    # API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If not using environment variables, replace the line below with: api_key = "sk-xxx"
    # Production: Use environment variables, not hardcoded keys
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
    model="qwen3-asr-flash",
    messages=messages,
    result_format="message",
    asr_options={
        # "language": "zh", # Optional. Specify known audio language to improve accuracy.
        "enable_itn":False
    }
)
print(response)

Java SDK

The example uses the audio file: welcome.mp3.

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;

public class Main {
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        // Replace ABSOLUTE_PATH with your local file path
        String localFilePath = "file://ABSOLUTE_PATH/welcome.mp3";
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("audio", localFilePath)))
                .build();

        MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
                .build();

        Map<String, Object> asrOptions = new HashMap<>();
        asrOptions.put("enable_itn", false);
        // asrOptions.put("language", "zh"); // Optional. Specify known audio language to improve accuracy.
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                // If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
                .model("qwen3-asr-flash")
                .message(sysMessage)
                .message(userMessage)
                .parameter("asr_options", asrOptions)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(JsonUtils.toJson(result));
    }
    public static void main(String[] args) {
        try {
            // Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
            Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

Streaming output

The model generates results incrementally rather than all at once. Non-streaming output waits until the model finishes generating and then returns the complete result. Streaming output returns intermediate results in real time, letting you read results as they are generated and reducing wait time. Set parameters differently based on your calling method to enable streaming output:

  • DashScope Python SDK: Set the stream parameter to true.

  • DashScope Java SDK: Use the streamCall interface.

  • DashScope HTTP: Set the X-DashScope-SSE header to enable.

Python SDK

import os
import dashscope

# Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {"role": "user", "content": [{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"}]}
]
response = dashscope.MultiModalConversation.call(
    # API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If not using environment variables, replace the line below with: api_key = "sk-xxx"
    # Production: Use environment variables, not hardcoded keys
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
    model="qwen3-asr-flash",
    messages=messages,
    result_format="message",
    asr_options={
        # "language": "zh", # Optional. Specify known audio language to improve accuracy.
        "enable_itn":False
    },
    stream=True
)

for response in response:
    try:
        print(response["output"]["choices"][0]["message"].content[0]["text"])
    except:
        pass

Java SDK

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import io.reactivex.Flowable;

public class Main {
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3")))
                .build();

        MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
                .build();

        Map<String, Object> asrOptions = new HashMap<>();
        asrOptions.put("enable_itn", false);
        // asrOptions.put("language", "zh"); // Optional. Specify known audio language to improve accuracy.
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                // If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
                .model("qwen3-asr-flash")
                .message(sysMessage)
                .message(userMessage)
                .parameter("asr_options", asrOptions)
                .build();
        Flowable<MultiModalConversationResult> resultFlowable = conv.streamCall(param);
        resultFlowable.blockingForEach(item -> {
            try {
                System.out.println(item.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
            } catch (Exception e){
                System.exit(0);
            }
        });
    }

    public static void main(String[] args) {
        try {
            // Singapore region URL. For Beijing:https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
            Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

cURL

# ======= Important =======
# Singapore region URL. US: https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation | Beijing: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you use a model in the US region, add the "us" suffix
# Production: Use environment variables, not hardcoded keys

curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
    "model": "qwen3-asr-flash",
    "input": {
        "messages": [
            {
                "content": [
                    {
                        "text": ""
                    }
                ],
                "role": "system"
            },
            {
                "content": [
                    {
                        "audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    }
                ],
                "role": "user"
            }
        ]
    },
    "parameters": {
        "incremental_output": true,
        "asr_options": {
            "enable_itn": false
        }
    }
}'

OpenAI compatible

Important

US region: No OpenAI-compatible mode support.

Qwen3-ASR-Flash series only. OpenAI-compatible mode accepts public URLs only (no local paths).

Use OpenAI Python SDK version 1.52.0 or later, and Node.js SDK version 4.68.0 or later.

The asr_options parameter is not part of the OpenAI standard. When using the OpenAI SDK, pass it through extra_body.

Input: Audio file URL

Python SDK

from openai import OpenAI
import os

try:
    client = OpenAI(
        # API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
        # If not using environment variables, replace the line below with: api_key = "sk-xxx",
        # Production: Use environment variables, not hardcoded keys
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # Singapore/US region URL. For Beijing:https://dashscope.aliyuncs.com/compatible-mode/v1
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    

    stream_enabled = False  # Enable streaming output
    completion = client.chat.completions.create(
        model="qwen3-asr-flash",
        messages=[
            {
                "content": [
                    {
                        "type": "input_audio",
                        "input_audio": {
                            "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                        }
                    }
                ],
                "role": "user"
            }
        ],
        stream=stream_enabled,
        # Do not set stream_options when stream is False
        # stream_options={"include_usage": True},
        extra_body={
            "asr_options": {
                # "language": "zh",
                "enable_itn": False
            }
        }
    )
    if stream_enabled:
        full_content = ""
        print("Streaming output:")
        for chunk in completion:
            # With stream_options.include_usage=True, skip last chunk's empty choices (usage in chunk.usage)
            print(chunk)
            if chunk.choices and chunk.choices[0].delta.content:
                full_content += chunk.choices[0].delta.content
        print(f"Full content: {full_content}")
    else:
        print(f"Non-streaming output: {completion.choices[0].message.content}")
except Exception as e:
    print(f"Error: {e}")

Node.js SDK

// Prerequisites (Windows/Mac/Linux):
// 1. Node.js (version ≥ 14)
// 2. npm install openai

import OpenAI from "openai";

const client = new OpenAI({
  // API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
  // If not using environment variables, replace the line below with: apiKey: "sk-xxx",
  apiKey: process.env.DASHSCOPE_API_KEY,
  // Singapore/US region URL. For Beijing:https://dashscope.aliyuncs.com/compatible-mode/v1
  baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1", 
});

async function main() {
  try {
    const streamEnabled = false; // Enable streaming output
    const completion = await client.chat.completions.create({
      model: "qwen3-asr-flash",
      messages: [
        {
          role: "user",
          content: [
            {
              type: "input_audio",
              input_audio: {
                data: "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
              }
            }
          ]
        }
      ],
      stream: streamEnabled,
      // Do not set stream_options when stream is False
      // stream_options: {
      //   "include_usage": true
      // },
      extra_body: {
        asr_options: {
          // language: "zh",
          enable_itn: false
        }
      }
    });

    if (streamEnabled) {
      let fullContent = "";
      console.log("Streaming output:");
      for await (const chunk of completion) {
        console.log(JSON.stringify(chunk));
        if (chunk.choices && chunk.choices.length > 0) {
          const delta = chunk.choices[0].delta;
          if (delta && delta.content) {
            fullContent += delta.content;
          }
        }
      }
      console.log(`Full content: ${fullContent}`);
    } else {
      console.log(`Non-streaming output: ${completion.choices[0].message.content}`);
    }
  } catch (err) {
    console.error(`Error: ${err}`);
  }
}

main();

cURL

# ======= Important =======
# Singapore/US region URL. For Beijing:https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# Production: Use environment variables, not hardcoded keys

curl -X POST 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-asr-flash",
    "messages": [
        {
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    }
                }
            ],
            "role": "user"
        }
    ],
    "stream":false,
    "asr_options": {
        "enable_itn": false
    }
}'

Input: Base64-encoded audio file

Input Base64-encoded data (Data URL) in the format: data:<mediatype>;base64,<data>.

  • <mediatype>: MIME type

    Varies by audio format, for example:

    • WAV: audio/wav

    • MP3: audio/mpeg

  • <data>: Base64-encoded string of the audio

    Base64 encoding increases file size. Keep the original file small enough so the encoded data stays within the 10 MB input limit.

  • Example: data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9

    See example code

    import base64, pathlib
    
    # input.mp3 is a local audio file for voice cloning. Replace it with your own audio file path and ensure it meets audio requirements
    file_path = pathlib.Path("input.mp3")
    base64_str = base64.b64encode(file_path.read_bytes()).decode()
    data_uri = f"data:audio/mpeg;base64,{base64_str}"
    import java.nio.file.*;
    import java.util.Base64;
    
    public class Main {
        /**
         * filePath is a local audio file for voice cloning. Replace it with your own audio file path and ensure it meets audio requirements
         */
        public static String toDataUrl(String filePath) throws Exception {
            byte[] bytes = Files.readAllBytes(Paths.get(filePath));
            String encoded = Base64.getEncoder().encodeToString(bytes);
            return "data:audio/mpeg;base64," + encoded;
        }
    
        // Usage example
        public static void main(String[] args) throws Exception {
            System.out.println(toDataUrl("input.mp3"));
        }
    }

Python SDK

The example uses the audio file: welcome.mp3.

import base64
from openai import OpenAI
import os
import pathlib

try:
    # Replace with your actual audio file path
    file_path = "welcome.mp3"
    # Replace with your actual audio file MIME type
    audio_mime_type = "audio/mpeg"

    file_path_obj = pathlib.Path(file_path)
    if not file_path_obj.exists():
        raise FileNotFoundError(f"Audio file not found: {file_path}")

    base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
    data_uri = f"data:{audio_mime_type};base64,{base64_str}"

    client = OpenAI(
        # API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
        # If not using environment variables, replace the line below with: api_key = "sk-xxx",
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # Singapore/US region URL. For Beijing:https://dashscope.aliyuncs.com/compatible-mode/v1
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    

    stream_enabled = False  # Enable streaming output
    completion = client.chat.completions.create(
        model="qwen3-asr-flash",
        messages=[
            {
                "content": [
                    {
                        "type": "input_audio",
                        "input_audio": {
                            "data": data_uri
                        }
                    }
                ],
                "role": "user"
            }
        ],
        stream=stream_enabled,
        # Do not set stream_options when stream is False
        # stream_options={"include_usage": True},
        extra_body={
            "asr_options": {
                # "language": "zh",
                "enable_itn": False
            }
        }
    )
    if stream_enabled:
        full_content = ""
        print("Streaming output:")
        for chunk in completion:
            # With stream_options.include_usage=True, skip last chunk's empty choices (usage in chunk.usage)
            print(chunk)
            if chunk.choices and chunk.choices[0].delta.content:
                full_content += chunk.choices[0].delta.content
        print(f"Full content: {full_content}")
    else:
        print(f"Non-streaming output: {completion.choices[0].message.content}")
except Exception as e:
    print(f"Error: {e}")

Node.js SDK

The example uses the audio file: welcome.mp3.

// Prerequisites (Windows/Mac/Linux):
// 1. Node.js (version ≥ 14)
// 2. npm install openai

import OpenAI from "openai";
import { readFileSync } from 'fs';

const client = new OpenAI({
  // API keys for Beijing and Singapore regions are different. Get keys: https://www.alibabacloud.com/help/en/model-studio/get-api-key
  // If not using environment variables, replace the line below with: apiKey: "sk-xxx",
  apiKey: process.env.DASHSCOPE_API_KEY,
  // Singapore/US region URL. For Beijing:https://dashscope.aliyuncs.com/compatible-mode/v1
  baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1", 
});

const encodeAudioFile = (audioFilePath) => {
    const audioFile = readFileSync(audioFilePath);
    return audioFile.toString('base64');
};

// Replace with actual file path
const dataUri = `data:audio/mpeg;base64,${encodeAudioFile("welcome.mp3")}`;

async function main() {
  try {
    const streamEnabled = false; // Enable streaming output
    const completion = await client.chat.completions.create({
      model: "qwen3-asr-flash",
      messages: [
        {
          role: "user",
          content: [
            {
              type: "input_audio",
              input_audio: {
                data: dataUri
              }
            }
          ]
        }
      ],
      stream: streamEnabled,
      // Do not set stream_options when stream is False
      // stream_options: {
      //   "include_usage": true
      // },
      extra_body: {
        asr_options: {
          // language: "zh",
          enable_itn: false
        }
      }
    });

    if (streamEnabled) {
      let fullContent = "";
      console.log("Streaming output:");
      for await (const chunk of completion) {
        console.log(JSON.stringify(chunk));
        if (chunk.choices && chunk.choices.length > 0) {
          const delta = chunk.choices[0].delta;
          if (delta && delta.content) {
            fullContent += delta.content;
          }
        }
      }
      console.log(`Full content: ${fullContent}`);
    } else {
      console.log(`Non-streaming output: ${completion.choices[0].message.content}`);
    }
  } catch (err) {
    console.error(`Error: ${err}`);
  }
}

main();

API reference

Audio file recognition - Qwen API reference

Compare models

The feature set for qwen3-asr-flash and qwen3-asr-flash-2025-09-08 also applies to their US (Virginia) region counterparts: qwen3-asr-flash-us and qwen3-asr-flash-2025-09-08-us.

Feature

qwen3-asr-flash-filetrans, qwen3-asr-flash-filetrans-2025-11-17

qwen3-asr-flash,qwen3-asr-flash-2026-02-10, qwen3-asr-flash-2025-09-08

Supported languages

Chinese (Mandarin, Sichuanese, Minnan, Wu, Cantonese), English, Japanese, German, Korean, Russian, French, Portuguese, Arabic, Italian, Spanish, Hindi, Indonesian, Thai, Turkish, Ukrainian, Vietnamese, Czech, Danish, Filipino, Finnish, Icelandic, Malay, Norwegian, Polish, Swedish

Audio formats

aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, wmv

aac, amr, avi, aiff, flac, flv, mkv, mp3, mpeg, ogg, opus, wav, webm, wma, wmv

Sample rate

Depends on audio format:

  • PCM audio: 16 kHz

  • Other formats: Any (resampled to 16 kHz)

Sound channels

Any

Multi-channel handling:

  • Qwen3-ASR-Flash-Filetrans: Specify track via channel_id parameter

  • Qwen3-ASR-Flash: Merges multi-channel audio before processing

Input method

Public URL

Base64-encoded file, absolute local path, or public URL

Audio size/duration

Audio file size cannot exceed 2 GB, and duration cannot exceed 12 hours

Audio file size cannot exceed 10 MB, and duration cannot exceed 5 minutes

Emotion recognition

Supported Always enabled. View results via the response parameter emotion

Timestamps

Supported Always enabled. Control timestamp granularity via the request parameter enable_words

Character-level timestamps are only guaranteed for: Chinese, English, Japanese, Korean, German, French, Spanish, Italian, Portuguese, Russian. Accuracy may vary for other languages.

Not supported

Punctuation prediction

Supported Always enabled

ITN

Supported Disabled by default, can be enabled. Only for Chinese and English.

Singing recognition

Supported Always enabled

Noise rejection

Supported Always enabled

Sensitive words filter

Not supported

Speaker diarization

Not supported

Filler word filtering

Not supported

VAD

Supported Always enabled

Not supported

Rate limit (RPM)

100

Connection type

DashScope: Java/Python SDK, RESTful API

DashScope: Java/Python SDK, RESTful API

OpenAI: Python/Node.js SDK, RESTful API

Pricing

International: $0.000035/second

US: $0.000032/second

Chinese mainland: $0.000032/second

FAQ

Q: How do I provide a publicly accessible audio URL for the API?

Use Object Storage Service (OSS) for highly available storage and easy public URL generation.

Verify URL accessibility: Test with browser or curl (expect HTTP 200).

Q: How do I check if my audio format meets requirements?

Use ffprobe to get audio information:

# Check container format (format_name), codec (codec_name), sample rate (sample_rate), and number of channels (channels)
ffprobe -v error -show_entries format=format_name -show_entries stream=codec_name,sample_rate,channels -of default=noprint_wrappers=1 your_audio_file.mp3

Q: How do I process audio to meet model requirements?

Use FFmpeg to trim or convert audio:

  • Audio trimming: Extract segment from long file

    # -i: Input file
    # -ss 00:01:30: Start time (1 minute 30 seconds)
    # -t 00:02:00: Duration (2 minutes)
    # -c copy: Copy audio without re-encoding (fast)
    # output_clip.wav: Output file
    ffmpeg -i long_audio.wav -ss 00:01:30 -t 00:02:00 -c copy output_clip.wav
  • Format conversion

    Convert any audio to 16 kHz, 16-bit, mono WAV:

    # -i: Input file
    # -ac 1: Set to 1 channel (mono)
    # -ar 16000: Set sample rate to 16000 Hz (16 kHz)
    # -sample_fmt s16: Set sample format to 16-bit signed integer PCM
    # output.wav: Output file
    ffmpeg -i input.mp3 -ac 1 -ar 16000 -sample_fmt s16 output.wav