Non-real-time speech recognition - Alibaba Cloud Model Studio

Overview

Transcribe pre-recorded audio and video files in bulk by submitting asynchronous tasks.

Supports context enhancement, which lets you provide contextual hints to improve recognition accuracy (fun-asr-flash-2026-06-15 only).
Supports custom hotwords to boost recognition accuracy for domain-specific terms via a predefined word list.
Configurable features include speaker diarization, sensitive-word filtering, and sentence- and word-level timestamps.
Asynchronously transcribes a single audio file of up to 12 hours and 2 GB.
Accepts any sample rate and works with common audio and video formats, including AAC, WAV, and MP3.

For real-time scenarios such as live captioning, online meetings, or voice assistants, use Real-time speech recognition instead. For guidance on choosing the right model, see Speech-to-text.

Prerequisites

You have Obtain an API key and stored the API key as an environment variable.
To call the API through the DashScope SDK, install the latest SDK.

Note

Obtaining your WorkspaceId: Log in to the Model Studio console, navigate to Workspace Management in the left sidebar, and find your workspace ID on the workspace list page. The workspace ID is a string identifier used in certain API endpoint URLs.

Alternative endpoint: For most API calls, you can use https://dashscope.aliyuncs.com as the base URL without requiring a WorkspaceId prefix. For example: https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription. This simplifies the setup when you do not need region-specific routing.

Quick start

Fun-ASR

Audio and video files are typically large, so the file-transcription API is asynchronous: submit the task, poll the query endpoint for its status, and retrieve the recognition result after the task completes.

cURL

When you call the API with cURL, first submit the task to obtain a task_id, then use that ID to query the result.

Submit a task

The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.

# Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
curl -X POST 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/audio/asr/transcription' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-Async: enable" \
-d '{
    "model": "fun-asr",
    "input": {
        "file_urls": [
            "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav"
        ]
    },
    "parameters": {
        "channel_id": [0],
        "language_hints": ["zh", "en"]
    }
}'

Get the task result

This query endpoint defaults to 20 QPS and can be scaled up to 100 QPS. For higher throughput, or to avoid polling-induced throttling, configure asynchronous task callbacks (see Replace polling with callbacks for high-concurrency workloads).

The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.

# Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
curl -X GET 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/tasks/{task_id}' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json"

Download the recognition result

After the task succeeds, the output.results[].transcription_url field returned by the query endpoint points to a publicly downloadable JSON file that contains the full recognition result. The URL is valid for 24 hours by default, so download and persist the file promptly.

# Replace {transcription_url} with the transcription_url value returned by the query endpoint
curl -sS '{transcription_url}' -o transcription.json
cat transcription.json | jq .

Python

from http import HTTPStatus
from dashscope.audio.asr import Transcription
from urllib import request
import dashscope
import os
import json

# The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
# Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
dashscope.base_http_api_url = 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1'

# API Keys for the Singapore and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If the environment variable is not configured, replace the following line with your Model Studio API Key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

task_response = Transcription.async_call(
    model='fun-asr',
    file_urls=['https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav'],
    language_hints=['zh', 'en']  # language_hints is an optional parameter used to specify the language code of the audio to be recognized. For the value range, see the API reference documentation.
)

transcription_response = Transcription.wait(task=task_response.output.task_id)

if transcription_response.status_code == HTTPStatus.OK:
    for transcription in transcription_response.output['results']:
        if transcription['subtask_status'] == 'SUCCEEDED':
            url = transcription['transcription_url']
            result = json.loads(request.urlopen(url).read().decode('utf8'))
            print(json.dumps(result, indent=4,
                            ensure_ascii=False))
        else:
            print('transcription failed!')
            print(transcription)
else:
    print('Error: ', transcription_response.output.message)

Java

import com.alibaba.dashscope.audio.asr.transcription.*;
import com.alibaba.dashscope.utils.Constants;
import com.google.gson.*;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Arrays;
import java.util.List;

public class Main {
    public static void main(String[] args) {
        // The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
        // Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
        Constants.baseHttpApiUrl = "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1";
        // Create the transcription request parameters.
        TranscriptionParam param =
                TranscriptionParam.builder()
                        // API Keys for the Singapore and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
                        // If the environment variable is not configured, replace the following line with your Model Studio API Key: .apiKey("sk-xxx")
                        .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                        .model("fun-asr")
                        // language_hints is an optional parameter used to specify the language code of the audio to be recognized. For the value range, see the API reference documentation.
                        .parameter("language_hints", new String[]{"zh", "en"})
                        .fileUrls(
                                Arrays.asList(
                                        "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav"))
                        .build();
        try {
            Transcription transcription = new Transcription();
            // Submit the transcription request
            TranscriptionResult result = transcription.asyncCall(param);
            System.out.println("RequestId: " + result.getRequestId());
            // Block and wait for the task to complete, then get the result
            result = transcription.wait(
                    TranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
            // Get the transcription result
            List<TranscriptionTaskResult> taskResultList = result.getResults();
            if (taskResultList != null && taskResultList.size() > 0) {
                for (TranscriptionTaskResult taskResult : taskResultList) {
                    String transcriptionUrl = taskResult.getTranscriptionUrl();
                    HttpURLConnection connection =
                            (HttpURLConnection) new URL(transcriptionUrl).openConnection();
                    connection.setRequestMethod("GET");
                    connection.connect();
                    BufferedReader reader =
                            new BufferedReader(new InputStreamReader(connection.getInputStream()));
                    Gson gson = new GsonBuilder().setPrettyPrinting().create();
                    JsonElement jsonResult = gson.fromJson(reader, JsonObject.class);
                    System.out.println(gson.toJson(jsonResult));
                }
            }
        } catch (Exception e) {
            System.out.println("error: " + e);
        }
        System.exit(0);
    }
}

The full recognition result is printed to the console as JSON. It contains the transcribed text along with the start and end time of each segment within the audio or video file, in milliseconds.

Recognition result

{
    "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
    "properties": {
        "audio_format": "pcm_s16le",
        "channels": [
            0
        ],
        "original_sampling_rate": 16000,
        "original_duration_in_milliseconds": 3834
    },
    "transcripts": [
        {
            "channel_id": 0,
            "content_duration_in_milliseconds": 2480,
            "text": "Hello World, this is Alibaba Speech Lab.",
            "sentences": [
                {
                    "begin_time": 760,
                    "end_time": 3240,
                    "text": "Hello World, this is Alibaba Speech Lab.",
                    "sentence_id": 1,
                    "words": [
                        {
                            "begin_time": 760,
                            "end_time": 1000,
                            "text": "Hello",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 1000,
                            "end_time": 1120,
                            "text": " World",
                            "punctuation": ","
                        },
                        {
                            "begin_time": 1400,
                            "end_time": 1920,
                            "text": "this is",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 1920,
                            "end_time": 2520,
                            "text": "Alibaba",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 2520,
                            "end_time": 2840,
                            "text": "Speech",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 2840,
                            "end_time": 3240,
                            "text": "Lab",
                            "punctuation": "."
                        }
                    ]
                }
            ]
        }
    ]
}

Fun-ASR-Flash

fun-asr-flash-2026-06-15 supports synchronous calls for audio files up to 5 minutes. Results can be returned in streaming or non-streaming mode.

The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region. The API keys for the Singapore and Beijing regions are different.

# Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
curl --location --request POST 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
     --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
     --header "Content-Type: application/json" \
     --header "X-DashScope-SSE: disable" \
     --data '{
    "model": "fun-asr-flash-2026-06-15",
    "input": {
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "input_audio",
                        "input_audio": {
                            "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav"
                        }
                    }
                ]
            }
        ]
    },
    "parameters": {
        "format": "wav",
        "sample_rate": "16000"
    }
}'

Note

Response structure note for fun-asr-flash: The fun-asr-flash-2026-06-15 model returns a response structure that differs from the standard DashScope multimodal-generation format. The recognized text is available at output.output.sentence.text and output.text in the response, not at output.choices[].message.content. When parsing responses from this model, use the output.output path to access the recognition results.

Example response excerpt:

{
  "output": {
    "output": {
      "sentence": {
        "text": "Hello World, this is Alibaba Speech Lab."
      }
    },
    "text": "Hello World, this is Alibaba Speech Lab."
  },
  "request_id": "..."
}

Qwen3-ASR-Flash-Filetrans

Qwen3-ASR-Flash-Filetrans is purpose-built for asynchronous transcription of audio files. It supports recordings of up to 12 hours, accepts only publicly accessible audio file URLs (local file upload is not supported), and returns the full recognition result in a single response after the task completes.

cURL

When you call the API with cURL, submit the task first to obtain a task_id, then use that ID to query the result.

Submit a task

The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.

# Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
curl -X POST 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/audio/asr/transcription' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-Async: enable" \
-d '{
    "model": "qwen3-asr-flash-filetrans",
    "input": {
        "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
    },
    "parameters": {
        "channel_id":[
            0
        ], 
        "enable_itn": false,
        "enable_words": true
    }
}'

Get the task result

This query endpoint defaults to 20 QPS and can be scaled up to 100 QPS. For higher throughput, or to avoid polling-induced throttling, configure asynchronous task callbacks (see Replace polling with callbacks for high-concurrency workloads).

The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.

# Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
curl -X GET 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/tasks/{task_id}' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json"

Download the recognition result

After the task succeeds, the output.result.transcription_url field returned by the query endpoint points to a publicly downloadable JSON file that contains the full recognition result. The URL is valid for 24 hours by default, so download and persist the file promptly.

# Replace {transcription_url} with the transcription_url value returned by the query endpoint
curl -sS '{transcription_url}' -o transcription.json
cat transcription.json | jq .

Complete example

Java

import com.google.gson.Gson;
import com.google.gson.annotations.SerializedName;
import okhttp3.*;

import java.io.IOException;
import java.util.concurrent.TimeUnit;

public class Main {
    // The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
    // Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
    private static final String API_URL_SUBMIT = "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/audio/asr/transcription";
    // The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
    // Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
    private static final String API_URL_QUERY = "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/tasks/";
    private static final Gson gson = new Gson();

    public static void main(String[] args) {
        // API Keys for the Singapore and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
        // If the environment variable is not configured, replace the following line with your Model Studio API Key: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        OkHttpClient client = new OkHttpClient();

        // 1. Submit the task
        /*String payloadJson = """
                {
                    "model": "qwen3-asr-flash-filetrans",
                    "input": {
                        "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    },
                    "parameters": {
                        "channel_id": [0],
                        "enable_itn": false,
                        "language": "zh"
                    }
                }
                """;*/
        String payloadJson = """
                {
                    "model": "qwen3-asr-flash-filetrans",
                    "input": {
                        "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    },
                    "parameters": {
                        "channel_id": [0],
                        "enable_itn": false,
                        "enable_words": true
                    }
                }
                """;

        RequestBody body = RequestBody.create(payloadJson, MediaType.get("application/json; charset=utf-8"));
        Request submitRequest = new Request.Builder()
                .url(API_URL_SUBMIT)
                .addHeader("Authorization", "Bearer " + apiKey)
                .addHeader("Content-Type", "application/json")
                .addHeader("X-DashScope-Async", "enable")
                .post(body)
                .build();

        String taskId = null;

        try (Response response = client.newCall(submitRequest).execute()) {
            if (response.isSuccessful() && response.body() != null) {
                String respBody = response.body().string();
                ApiResponse apiResp = gson.fromJson(respBody, ApiResponse.class);
                if (apiResp.output != null) {
                    taskId = apiResp.output.taskId;
                    System.out.println("Task submitted, task_id: " + taskId);
                } else {
                    System.out.println("Submit response content: " + respBody);
                    return;
                }
            } else {
                System.out.println("Task submission failed! HTTP code: " + response.code());
                if (response.body() != null) {
                    System.out.println(response.body().string());
                }
                return;
            }
        } catch (IOException e) {
            e.printStackTrace();
            return;
        }

        // 2. Poll the task status
        boolean finished = false;
        while (!finished) {
            try {
                TimeUnit.SECONDS.sleep(2);  // Wait 2 seconds before querying again
            } catch (InterruptedException e) {
                Thread.currentThread().interrupt();
                return;
            }

            String queryUrl = API_URL_QUERY + taskId;
            Request queryRequest = new Request.Builder()
                    .url(queryUrl)
                    .addHeader("Authorization", "Bearer " + apiKey)
                    .addHeader("X-DashScope-Async", "enable")
                    .addHeader("Content-Type", "application/json")
                    .get()
                    .build();

            try (Response response = client.newCall(queryRequest).execute()) {
                if (response.body() != null) {
                    String queryResponse = response.body().string();
                    ApiResponse apiResp = gson.fromJson(queryResponse, ApiResponse.class);

                    if (apiResp.output != null && apiResp.output.taskStatus != null) {
                        String status = apiResp.output.taskStatus;
                        System.out.println("Current task status: " + status);
                        if ("SUCCEEDED".equalsIgnoreCase(status)
                                || "FAILED".equalsIgnoreCase(status)
                                || "UNKNOWN".equalsIgnoreCase(status)) {
                            finished = true;
                            System.out.println("Task completed. Final result: ");
                            System.out.println(queryResponse);
                        }
                    } else {
                        System.out.println("Query response content: " + queryResponse);
                    }
                }
            } catch (IOException e) {
                e.printStackTrace();
                return;
            }
        }
    }

    static class ApiResponse {
        @SerializedName("request_id")
        String requestId;
        Output output;
    }

    static class Output {
        @SerializedName("task_id")
        String taskId;
        @SerializedName("task_status")
        String taskStatus;
    }
}

Python

import os
import time
import requests
import json

# The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
# Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
API_URL_SUBMIT = "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/audio/asr/transcription"
# The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
# Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
API_URL_QUERY_BASE = "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/tasks/"

def main():
    # API Keys for the Singapore and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If the environment variable is not configured, replace the following line with your Model Studio API Key: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")

    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
        "X-DashScope-Async": "enable"
    }

    # 1. Submit the task
    payload = {
        "model": "qwen3-asr-flash-filetrans",
        "input": {
            "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
        },
        "parameters": {
            "channel_id": [0],
            # "language": "zh",
            "enable_itn": False,
            "enable_words": True
        }
    }

    print("Submitting ASR transcription task...")
    try:
        submit_resp = requests.post(API_URL_SUBMIT, headers=headers, data=json.dumps(payload))
    except requests.RequestException as e:
        print(f"Request to submit task failed: {e}")
        return

    if submit_resp.status_code != 200:
        print(f"Task submission failed! HTTP code: {submit_resp.status_code}")
        print(submit_resp.text)
        return

    resp_data = submit_resp.json()
    output = resp_data.get("output")
    if not output or "task_id" not in output:
        print("Unexpected submit response content:", resp_data)
        return

    task_id = output["task_id"]
    print(f"Task submitted, task_id: {task_id}")

    # 2. Poll the task status
    finished = False
    while not finished:
        time.sleep(2)  # Wait 2 seconds before querying again

        query_url = API_URL_QUERY_BASE + task_id
        try:
            query_resp = requests.get(query_url, headers=headers)
        except requests.RequestException as e:
            print(f"Request to query task failed: {e}")
            return

        if query_resp.status_code != 200:
            print(f"Task query failed! HTTP code: {query_resp.status_code}")
            print(query_resp.text)
            return

        query_data = query_resp.json()
        output = query_data.get("output")
        if output and "task_status" in output:
            status = output["task_status"]
            print(f"Current task status: {status}")

            if status.upper() in ("SUCCEEDED", "FAILED", "UNKNOWN"):
                finished = True
                print("Task completed. Final result:")
                print(json.dumps(query_data, indent=2, ensure_ascii=False))
        else:
            print("Query response content:", query_data)

if __name__ == "__main__":
    main()

Java SDK

import com.alibaba.dashscope.audio.qwen_asr.*;
import com.alibaba.dashscope.utils.Constants;
import com.google.gson.Gson;
import com.google.gson.GsonBuilder;
import com.google.gson.JsonObject;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.ArrayList;
import java.util.HashMap;

public class Main {
    public static void main(String[] args) {
        // The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
        // Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
        Constants.baseHttpApiUrl = "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1";
        QwenTranscriptionParam param =
                QwenTranscriptionParam.builder()
                        // API Keys for the Singapore and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
                        // If the environment variable is not configured, replace the following line with your Model Studio API Key: .apiKey("sk-xxx")
                        .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                        .model("qwen3-asr-flash-filetrans")
                        .fileUrl("https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_example_1.wav")
                        //.parameter("language", "zh")
                        //.parameter("channel_id", new ArrayList<String>(){{add("0");add("1");}})
                        .parameter("enable_itn", false)
                        .parameter("enable_words", true)
                        .build();
        try {
            QwenTranscription transcription = new QwenTranscription();
            // Submit the task
            QwenTranscriptionResult result = transcription.asyncCall(param);
            System.out.println("create task result: " + result);
            // Query the task status
            result = transcription.fetch(QwenTranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
            System.out.println("task status: " + result);
            // Wait for the task to complete
            result =
                    transcription.wait(
                            QwenTranscriptionQueryParam.FromTranscriptionParam(param, result.getTaskId()));
            System.out.println("task result: " + result);
            // Get the speech recognition result
            QwenTranscriptionTaskResult taskResult = result.getResult();
            if (taskResult != null) {
                // Get the URL of the recognition result
                String transcriptionUrl = taskResult.getTranscriptionUrl();
                // Fetch the content at the URL
                HttpURLConnection connection =
                        (HttpURLConnection) new URL(transcriptionUrl).openConnection();
                connection.setRequestMethod("GET");
                connection.connect();
                BufferedReader reader =
                        new BufferedReader(new InputStreamReader(connection.getInputStream()));
                // Pretty-print the JSON result
                Gson gson = new GsonBuilder().setPrettyPrinting().create();
                System.out.println(gson.toJson(gson.fromJson(reader, JsonObject.class)));
            }
        } catch (Exception e) {
            System.out.println("error: " + e);
        }
    }
}

Python SDK

import json
import os
import sys
from http import HTTPStatus

import dashscope
from dashscope.audio.qwen_asr import QwenTranscription
from dashscope.api_entities.dashscope_response import TranscriptionResponse

# run the transcription script
if __name__ == '__main__':
    # API Keys for the Singapore and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If the environment variable is not configured, replace the following line with your Model Studio API Key: dashscope.api_key = "sk-xxx"
    dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

    # The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
    # Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
    dashscope.base_http_api_url = 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1'
    task_response = QwenTranscription.async_call(
        model='qwen3-asr-flash-filetrans',
        file_url='https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_example_1.wav',
        #language="",
        enable_itn=False,
        enable_words=True
    )
    print(f'task_response: {task_response}')
    print(task_response.output.task_id)
    query_response = QwenTranscription.fetch(task=task_response.output.task_id)
    print(f'query_response: {query_response}')
    task_result = QwenTranscription.wait(task=task_response.output.task_id)
    print(f'task_result: {task_result}')

Qwen3-ASR-Flash

Qwen3-ASR-Flash supports recordings of up to 5 minutes. It accepts either a publicly accessible audio file URL or a local file upload, and can stream the recognition result back to you.

Input: audio file URL

Python SDK

import os
import dashscope

# The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
# Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
dashscope.base_http_api_url = 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1'

messages = [
    {"role": "user", "content": [{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"}]}
]

response = dashscope.MultiModalConversation.call(
    # API Keys for the Singapore/US and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If the environment variable is not configured, replace the following line with your Model Studio API Key: api_key = "sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # To use a model in the US region, append the "-us" suffix to the model name, for example, qwen3-asr-flash-us
    model="qwen3-asr-flash",
    messages=messages,
    result_format="message",
    asr_options={
        #"language": "zh", # Optional. If the audio language is known, use this parameter to specify the language to improve recognition accuracy.
        "enable_itn":False
    }
)
print(response)

Java SDK

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;

public class Main {
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3")))
                .build();

        Map<String, Object> asrOptions = new HashMap<>();
        asrOptions.put("enable_itn", false);
        // asrOptions.put("language", "zh"); // Optional. If the audio language is known, use this parameter to specify the language to improve recognition accuracy.
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API Keys for the Singapore/US and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // If the environment variable is not configured, replace the following line with your Model Studio API Key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                // To use a model in the US region, append the "-us" suffix to the model name, for example, qwen3-asr-flash-us
                .model("qwen3-asr-flash")
                .message(userMessage)
                .parameter("asr_options", asrOptions)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(JsonUtils.toJson(result));
    }
    public static void main(String[] args) {
        try {
            // The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
            // Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
            Constants.baseHttpApiUrl = "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1";
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

cURL

The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.

# Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
curl -X POST "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-asr-flash",
    "input": {
        "messages": [
            {
                "content": [
                    {
                        "text": ""
                    }
                ],
                "role": "system"
            },
            {
                "content": [
                    {
                        "audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    }
                ],
                "role": "user"
            }
        ]
    },
    "parameters": {
        "asr_options": {
            "enable_itn": false
        }
    }
}'

Input: Base64-encoded audio file

Pass Base64-encoded audio as a data URL in the form data:<mediatype>;base64,<data>.

<mediatype>: the MIME type.

The value depends on the audio format. For example:
- WAV: audio/wav
- MP3: audio/mpeg
<data>: the audio data encoded as a Base64 string.

Base64 encoding increases the payload size, so keep the source file small enough that the encoded result stays within the 10 MB input limit.

Example: data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9

Click to view example code

import base64, pathlib

# input.mp3 is the local audio file to transcribe. Replace it with the path to your own audio file and make sure it meets the audio requirements.
file_path = pathlib.Path("input.mp3")
base64_str = base64.b64encode(file_path.read_bytes()).decode()
data_uri = f"data:audio/mpeg;base64,{base64_str}"

import java.nio.file.*;
import java.util.Base64;

public class Main {
    /**
     * filePath is the local audio file to transcribe. Replace it with the path to your own audio file and make sure it meets the audio requirements.
     */
    public static String toDataUrl(String filePath) throws Exception {
        byte[] bytes = Files.readAllBytes(Paths.get(filePath));
        String encoded = Base64.getEncoder().encodeToString(bytes);
        return "data:audio/mpeg;base64," + encoded;
    }

    // Usage example
    public static void main(String[] args) throws Exception {
        System.out.println(toDataUrl("input.mp3"));
    }
}

Python SDK

The example uses this audio file: welcome.mp3.

import base64
import dashscope
import os
import pathlib

# The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
# Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
dashscope.base_http_api_url = 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1'

# Replace with the actual path to your audio file
file_path = "welcome.mp3"
# Replace with the actual MIME type of your audio file
audio_mime_type = "audio/mpeg"

file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
    raise FileNotFoundError(f"Audio file not found: {file_path}")

base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"

messages = [
    {"role": "user", "content": [{"audio": data_uri}]}
]
response = dashscope.MultiModalConversation.call(
    # API Keys for the Singapore/US and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If the environment variable is not configured, replace the following line with your Model Studio API Key: api_key = "sk-xxx",
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # To use a model in the US region, append the "-us" suffix to the model name, for example, qwen3-asr-flash-us
    model="qwen3-asr-flash",
    messages=messages,
    result_format="message",
    asr_options={
        # "language": "zh", # Optional. If the audio language is known, use this parameter to specify the language to improve recognition accuracy.
        "enable_itn":False
    }
)
print(response)

Java SDK

The example uses this audio file: welcome.mp3.

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.*;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;

public class Main {
    // Replace with the actual path to your audio file
    private static final String AUDIO_FILE = "welcome.mp3";
    // Replace with the actual MIME type of your audio file
    private static final String AUDIO_MIME_TYPE = "audio/mpeg";

    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException, IOException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("audio", toDataUrl())))
                .build();

        Map<String, Object> asrOptions = new HashMap<>();
        asrOptions.put("enable_itn", false);
        // asrOptions.put("language", "zh"); // Optional. If the audio language is known, use this parameter to specify the language to improve recognition accuracy.
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API Keys for the Singapore/US and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // If the environment variable is not configured, replace the following line with your Model Studio API Key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                // To use a model in the US region, append the "-us" suffix to the model name, for example, qwen3-asr-flash-us
                .model("qwen3-asr-flash")
                .message(userMessage)
                .parameter("asr_options", asrOptions)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(JsonUtils.toJson(result));
    }

    public static void main(String[] args) {
        try {
            // The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
            // Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
            Constants.baseHttpApiUrl = "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1";
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException | IOException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }

    // Generate a data URI
    public static String toDataUrl() throws IOException {
        byte[] bytes = Files.readAllBytes(Paths.get(AUDIO_FILE));
        String encoded = Base64.getEncoder().encodeToString(bytes);
        return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
    }
}

Input: absolute path to a local audio file

When you process a local audio file with the DashScope SDK, pass the file path as input. Build the path according to your SDK and operating system, as shown in the following table.

Operating system	SDK	File path format	Example
Linux or macOS	Python SDK	file://{absolute_path_to_file}	file:///home/audio/test.wav
Linux or macOS	Java SDK	file://{absolute_path_to_file}	file:///home/audio/test.wav
Windows	Python SDK	file://{absolute_path_to_file}	file://D:/audio/test.wav
Windows	Java SDK	file:///{absolute_path_to_file}	file:///D:/audio/test.wav

Important

Local-file calls are capped at 100 QPS and the limit cannot be increased, so they are not suitable for production, high-concurrency, or load-testing workloads. For higher concurrency, upload the file to OSS and call the API with its URL.

Python SDK

The example uses this audio file: welcome.mp3.

import os
import dashscope

# The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
# Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
dashscope.base_http_api_url = 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1'

# Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path to your local audio file
audio_file_path = "file://ABSOLUTE_PATH/welcome.mp3"

messages = [
    {"role": "user", "content": [{"audio": audio_file_path}]}
]
response = dashscope.MultiModalConversation.call(
    # API Keys for the Singapore/US and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If the environment variable is not configured, replace the following line with your Model Studio API Key: api_key = "sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # To use a model in the US region, append the "-us" suffix to the model name, for example, qwen3-asr-flash-us
    model="qwen3-asr-flash",
    messages=messages,
    result_format="message",
    asr_options={
        # "language": "zh", # Optional. If the audio language is known, use this parameter to specify the language to improve recognition accuracy.
        "enable_itn":False
    }
)
print(response)

Java SDK

The example uses this audio file: welcome.mp3.

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;

public class Main {
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        // Replace ABSOLUTE_PATH/welcome.mp3 with the absolute path to your local file
        String localFilePath = "file://ABSOLUTE_PATH/welcome.mp3";
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("audio", localFilePath)))
                .build();

        Map<String, Object> asrOptions = new HashMap<>();
        asrOptions.put("enable_itn", false);
        // asrOptions.put("language", "zh"); // Optional. If the audio language is known, use this parameter to specify the language to improve recognition accuracy.
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API Keys for the Singapore and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // If the environment variable is not configured, replace the following line with your Model Studio API Key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                // To use a model in the US region, append the "-us" suffix to the model name, for example, qwen3-asr-flash-us
                .model("qwen3-asr-flash")
                .message(userMessage)
                .parameter("asr_options", asrOptions)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(JsonUtils.toJson(result));
    }
    public static void main(String[] args) {
        try {
            // The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
            // Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
            Constants.baseHttpApiUrl = "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1";
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

Streaming output

The model generates intermediate results step by step, and the final result is assembled from them. A non-streaming call waits for the full result and returns it in one response, while a streaming call returns results as they are generated, which significantly reduces time to first token. Choose the streaming parameter that matches your call method:

DashScope Python SDK: set stream to true.
DashScope Java SDK: call the streamCall method.
DashScope HTTP: set the X-DashScope-SSE header to enable.

Python SDK

import os
import dashscope

# The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
# Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
dashscope.base_http_api_url = 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1'

messages = [
    {"role": "user", "content": [{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"}]}
]
response = dashscope.MultiModalConversation.call(
    # API Keys for the Singapore/US and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If the environment variable is not configured, replace the following line with your Model Studio API Key: api_key = "sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # To use a model in the US region, append the "-us" suffix to the model name, for example, qwen3-asr-flash-us
    model="qwen3-asr-flash",
    messages=messages,
    result_format="message",
    asr_options={
        # "language": "zh", # Optional. If the audio language is known, use this parameter to specify the language to improve recognition accuracy.
        "enable_itn":False
    },
    stream=True
)

for response in response:
    try:
        print(response["output"]["choices"][0]["message"].content[0]["text"])
    except:
        pass

Java SDK

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import io.reactivex.Flowable;

public class Main {
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3")))
                .build();

        Map<String, Object> asrOptions = new HashMap<>();
        asrOptions.put("enable_itn", false);
        // asrOptions.put("language", "zh"); // Optional. If the audio language is known, use this parameter to specify the language to improve recognition accuracy.
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API Keys for the Singapore and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // If the environment variable is not configured, replace the following line with your Model Studio API Key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                // To use a model in the US region, append the "-us" suffix to the model name, for example, qwen3-asr-flash-us
                .model("qwen3-asr-flash")
                .message(userMessage)
                .parameter("asr_options", asrOptions)
                .build();
        Flowable<MultiModalConversationResult> resultFlowable = conv.streamCall(param);
        resultFlowable.blockingForEach(item -> {
            try {
                System.out.println(item.getOutput().getChoices().get(0).getMessage().getContent().get(0).get("text"));
            } catch (Exception e){
                System.exit(0);
            }
        });
    }

    public static void main(String[] args) {
        try {
            // The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
            // Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
            Constants.baseHttpApiUrl = "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1";
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

cURL

The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.

# Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
curl -X POST "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-H "X-DashScope-SSE: enable" \
-d '{
    "model": "qwen3-asr-flash",
    "input": {
        "messages": [
            {
                "content": [
                    {
                        "text": ""
                    }
                ],
                "role": "system"
            },
            {
                "content": [
                    {
                        "audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    }
                ],
                "role": "user"
            }
        ]
    },
    "parameters": {
        "incremental_output": true,
        "asr_options": {
            "enable_itn": false
        }
    }
}'

Paraformer

The example code for Paraformer is similar to the asynchronous call of Fun-ASR. Replace the model name with a Paraformer model name.

Advanced features

Use the OpenAI-compatible API

Important

The OpenAI-compatible mode is not available in the US region.

Only the Qwen3-ASR-Flash model series supports OpenAI-compatible calls. This mode accepts only publicly accessible audio file URLs; absolute paths to local audio files are not accepted.

The OpenAI Python SDK must be 1.52.0 or later, and the Node.js SDK must be 4.68.0 or later. To install or upgrade:

# Python
pip install -U "openai>=1.52.0"

# Node.js
npm install openai@^4.68.0

asr_options is not a standard OpenAI parameter. When using the OpenAI Python SDK, pass it through extra_body. When using the Node.js OpenAI SDK (v4.x), pass asr_options directly in the request body object instead of wrapping it in extra_body, as the Node.js SDK does not support the extra_body field.

Input: audio file URL

Python SDK

from openai import OpenAI
import os

try:
    client = OpenAI(
        # API Keys for the Singapore/US and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
        # If the environment variable is not configured, replace the following line with your Model Studio API Key: api_key = "sk-xxx",
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
        # Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
        base_url="https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1",
    )

    stream_enabled = False  # Whether to enable streaming output
    completion = client.chat.completions.create(
        model="qwen3-asr-flash",
        messages=[
            {
                "content": [
                    {
                        "type": "input_audio",
                        "input_audio": {
                            "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                        }
                    }
                ],
                "role": "user"
            }
        ],
        stream=stream_enabled,
        # When stream is False, stream_options cannot be set
        # stream_options={"include_usage": True},
        extra_body={
            "asr_options": {
                # "language": "zh",
                "enable_itn": False
            }
        }
    )
    if stream_enabled:
        full_content = ""
        print("Streaming output:")
        for chunk in completion:
            # When stream_options.include_usage is True, the choices field of the last chunk is an empty list and must be skipped (you can get token usage via chunk.usage)
            print(chunk)
            if chunk.choices and chunk.choices[0].delta.content:
                full_content += chunk.choices[0].delta.content
        print(f"Full content: {full_content}")
    else:
        print(f"Non-streaming output: {completion.choices[0].message.content}")
except Exception as e:
    print(f"Error: {e}")

Node.js SDK

// Preparations before running:
// Common to Windows/Mac/Linux:
// 1. Make sure Node.js is installed (version >= 14 recommended)
// 2. Run the following command to install the required dependency: npm install openai

import OpenAI from "openai";

const client = new OpenAI({
  // API Keys for the Singapore/US and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
  // If the environment variable is not configured, replace the following line with your Model Studio API Key: apiKey: "sk-xxx",
  apiKey: process.env.DASHSCOPE_API_KEY,
  // The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
  // Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
  baseURL: "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1",
});

async function main() {
  try {
    const streamEnabled = false; // Whether to enable streaming output
    const completion = await client.chat.completions.create({
      model: "qwen3-asr-flash",
      messages: [
        {
          role: "user",
          content: [
            {
              type: "input_audio",
              input_audio: {
                data: "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
              }
            }
          ]
        }
      ],
      stream: streamEnabled,
      // When stream is False, stream_options cannot be set
      // stream_options: {
      //   "include_usage": true
      // },
      asr_options: {
        // language: "zh",
        enable_itn: false
      }
    });

    if (streamEnabled) {
      let fullContent = "";
      console.log("Streaming output:");
      for await (const chunk of completion) {
        console.log(JSON.stringify(chunk));
        if (chunk.choices && chunk.choices.length > 0) {
          const delta = chunk.choices[0].delta;
          if (delta && delta.content) {
            fullContent += delta.content;
          }
        }
      }
      console.log(`Full content: ${fullContent}`);
    } else {
      console.log(`Non-streaming output: ${completion.choices[0].message.content}`);
    }
  } catch (err) {
    console.error(`Error: ${err}`);
  }
}

main();

cURL

The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.

# Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
curl -X POST 'https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1/chat/completions' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-asr-flash",
    "messages": [
        {
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    }
                }
            ],
            "role": "user"
        }
    ],
    "stream":false,
    "asr_options": {
        "enable_itn": false
    }
}'

Input: Base64-encoded audio file

You can also pass Base64-encoded audio as a data URL, in the format data:<mediatype>;base64,<data>.

<mediatype>: The MIME type.

The value varies by audio format. For example:
- WAV: audio/wav
- MP3: audio/mpeg
<data>: The Base64-encoded string of the audio data.

Base64 encoding inflates the payload size. Keep the source file small enough that the encoded data still fits within the 10 MB input limit.

Example: data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9

View sample code

import base64, pathlib

# input.mp3 is the local audio file. Replace the path with your own file and verify that it meets the audio requirements.
file_path = pathlib.Path("input.mp3")
base64_str = base64.b64encode(file_path.read_bytes()).decode()
data_uri = f"data:audio/mpeg;base64,{base64_str}"

import java.nio.file.*;
import java.util.Base64;

public class Main {
    /**
     * filePath is the local audio file. Replace it with the path to your own file and verify that it meets the audio requirements.
     */
    public static String toDataUrl(String filePath) throws Exception {
        byte[] bytes = Files.readAllBytes(Paths.get(filePath));
        String encoded = Base64.getEncoder().encodeToString(bytes);
        return "data:audio/mpeg;base64," + encoded;
    }

    // Example usage
    public static void main(String[] args) throws Exception {
        System.out.println(toDataUrl("input.mp3"));
    }
}

Python SDK

The example uses this audio file: welcome.mp3.

import base64
from openai import OpenAI
import os
import pathlib

try:
    # Replace with the actual path to your audio file
    file_path = "welcome.mp3"
    # Replace with the actual MIME type of your audio file
    audio_mime_type = "audio/mpeg"

    file_path_obj = pathlib.Path(file_path)
    if not file_path_obj.exists():
        raise FileNotFoundError(f"Audio file not found: {file_path}")

    base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
    data_uri = f"data:{audio_mime_type};base64,{base64_str}"

    client = OpenAI(
        # API Keys for the Singapore/US and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
        # If the environment variable is not configured, replace the following line with your Model Studio API Key: api_key = "sk-xxx",
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
        # Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
        base_url="https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1",
    )

    stream_enabled = False  # Whether to enable streaming output
    completion = client.chat.completions.create(
        model="qwen3-asr-flash",
        messages=[
            {
                "content": [
                    {
                        "type": "input_audio",
                        "input_audio": {
                            "data": data_uri
                        }
                    }
                ],
                "role": "user"
            }
        ],
        stream=stream_enabled,
        # When stream is False, stream_options cannot be set
        # stream_options={"include_usage": True},
        extra_body={
            "asr_options": {
                # "language": "zh",
                "enable_itn": False
            }
        }
    )
    if stream_enabled:
        full_content = ""
        print("Streaming output:")
        for chunk in completion:
            # When stream_options.include_usage is True, the choices field of the last chunk is an empty list and must be skipped (you can get token usage via chunk.usage)
            print(chunk)
            if chunk.choices and chunk.choices[0].delta.content:
                full_content += chunk.choices[0].delta.content
        print(f"Full content: {full_content}")
    else:
        print(f"Non-streaming output: {completion.choices[0].message.content}")
except Exception as e:
    print(f"Error: {e}")

Node.js SDK

The example uses this audio file: welcome.mp3.

// Preparations before running:
// Common to Windows/Mac/Linux:
// 1. Make sure Node.js is installed (version >= 14 recommended)
// 2. Run the following command to install the required dependency: npm install openai

import OpenAI from "openai";
import { readFileSync } from 'fs';

const client = new OpenAI({
  // API Keys for the Singapore/US and Beijing regions are different. To obtain an API Key, see: https://www.alibabacloud.com/help/en/model-studio/get-api-key
  // If the environment variable is not configured, replace the following line with your Model Studio API Key: apiKey: "sk-xxx",
  apiKey: process.env.DASHSCOPE_API_KEY,
  // The following configuration is for the Singapore region. Replace "{WorkspaceId}" with your actual workspace ID. Configurations vary by region.
  // Alternative: use "https://dashscope.aliyuncs.com" without WorkspaceId prefix
  baseURL: "https://{WorkspaceId}.ap-southeast-1.maas.aliyuncs.com/compatible-mode/v1",
});

const encodeAudioFile = (audioFilePath) => {
    const audioFile = readFileSync(audioFilePath);
    return audioFile.toString('base64');
};

// Replace with the actual path to your audio file
const dataUri = `data:audio/mpeg;base64,${encodeAudioFile("welcome.mp3")}`;

async function main() {
  try {
    const streamEnabled = false; // Whether to enable streaming output
    const completion = await client.chat.completions.create({
      model: "qwen3-asr-flash",
      messages: [
        {
          role: "user",
          content: [
            {
              type: "input_audio",
              input_audio: {
                data: dataUri
              }
            }
          ]
        }
      ],
      stream: streamEnabled,
      // When stream is False, stream_options cannot be set
      // stream_options: {
      //   "include_usage": true
      // },
      asr_options: {
        // language: "zh",
        enable_itn: false
      }
    });

    if (streamEnabled) {
      let fullContent = "";
      console.log("Streaming output:");
      for await (const chunk of completion) {
        console.log(JSON.stringify(chunk));
        if (chunk.choices && chunk.choices.length > 0) {
          const delta = chunk.choices[0].delta;
          if (delta && delta.content) {
            fullContent += delta.content;
          }
        }
      }
      console.log(`Full content: ${fullContent}`);
    } else {
      console.log(`Non-streaming output: ${completion.choices[0].message.content}`);
    }
  } catch (err) {
    console.error(`Error: ${err}`);
  }
}

main();

Process long audio files

Non-real-time speech recognition transcribes long audio files asynchronously, making it well suited for producing meeting minutes, interview transcripts, and reviewing call recordings.

Limitations:

Qwen3-ASR-Flash-Filetrans, Fun-ASR, and Paraformer: Each audio file is capped at 2 GB in size and 12 hours in duration.
Qwen3-ASR-Flash: Each audio file is capped at 10 MB in size and 5 minutes in duration. For longer audio, use Qwen3-ASR-Flash-Filetrans or Fun-ASR.
When speaker diarization is enabled: Keep the audio duration under 2 hours to avoid recognition failures or timeouts. For details, see Speaker diarization.

How it works: Long-audio transcription runs as an asynchronous task in three steps:

Submit a transcription task to receive a task_id.
Poll the task status, or call the SDK's wait method to block until the task completes.
After the task completes, download the result JSON from the returned URL.

For code samples, see the quick start of Qwen3-ASR-Flash-Filetrans.

Streaming output

Qwen3-ASR-Flash supports streaming output: intermediate results are returned while the audio is being processed, which is well suited for use cases that require real-time progress feedback.

Fun-ASR, Paraformer, and Qwen3-ASR-Flash-Filetrans are asynchronous transcription models and do not support streaming output. Retrieve their final results through task polling (see Process long audio files).

To enable streaming output:

DashScope Python SDK: set stream to True.
DashScope Java SDK: call the API through the streamCall method.
DashScope HTTP: set the X-DashScope-SSE header to enable.
OpenAI-compatible SDK: set stream to True.

For a streaming code sample, see Streaming output in the Qwen3-ASR-Flash quick start.

Improve accuracy with hotwords

Fun-ASR and Paraformer improve recognition accuracy for domain-specific proper nouns (names, locations, product names) through hotwords. Create a hotword list in the Model Studio console, then pass its ID to the API through the vocabulary_id parameter.

For instructions on creating and using hotword lists, see Improve recognition accuracy.

SDK naming conventions for these parameters vary (dictionary keys, object attributes, or methods). For the full field mapping, see the API reference for each SDK.

Improve accuracy with context enhancement

The fun-asr-flash-2026-06-15 model improves recognition accuracy for proper nouns (names, locations, product terms) through context enhancement — passing conversation history or domain text to the model.

For usage and examples, see Context enhancement.

Speaker diarization

Speaker diarization identifies the different speakers in an audio file and tags each sentence in the transcript with a speaker label. It is well suited for multi-party meetings and interview recordings.

Supported models: Fun-ASR and Paraformer support speaker diarization (off by default). The Qwen-ASR series does not yet support it.

To enable: Set diarization_enabled to true in the API request. Each sentence in the result then includes a speaker_id field that identifies the speaker.

Response structure (excerpt):

{
  "transcripts": [
    {
      "sentences": [
        { "begin_time": 100, "end_time": 3820, "text": "Hello, let's discuss the project progress today.", "speaker_id": 0 },
        { "begin_time": 3820, "end_time": 6500, "text": "Sure, I'll give the update first.", "speaker_id": 1 }
      ]
    }
  ]
}

SDK naming conventions for these fields vary (dictionary keys, object attributes, or methods). For the full field mapping, see the API reference for each SDK.

Important

When speaker diarization is enabled, keep the audio duration under 2 hours to avoid recognition failures or timeouts. (For the audio duration limit when diarization is not enabled, see Process long audio files.) Diarization is supported only for mono audio.

For complete field definitions, see the API reference.

Sensitive word filter

The sensitive word filter replaces or removes sensitive words from recognition results. It is well suited for customer service quality assurance (QA), content compliance, and subtitle moderation.

Supported models: Fun-ASR and Paraformer support the sensitive word filter. The Qwen-ASR series (Qwen3-ASR-Flash and Qwen3-ASR-Flash-Filetrans) does not yet support it.

Default behavior: When the special_word_filter parameter is not specified, the system applies the built-in Alibaba Cloud Model Studio sensitive word list. Matched words are replaced with the same number of * characters.

Custom configuration: special_word_filter is a JSON object with three fields:

filter_with_signed.word_list: An array of strings whose matches are replaced with the same number of * characters. For example, with ["test"], "please help me test it" becomes "please help me **** it".
filter_with_empty.word_list: An array of strings whose matches are removed from the result. For example, with ["start"], "is the game about to start now" becomes "is the game about to now".
system_reserved_filter: A boolean. Defaults to true. Controls whether the built-in sensitive word list is applied in addition to the custom lists.

Example configuration:

{
  "special_word_filter": {
    "filter_with_signed": {
      "word_list": ["test"]
    },
    "filter_with_empty": {
      "word_list": ["start", "happen"]
    },
    "system_reserved_filter": true
  }
}

SDK naming conventions for these parameters vary (dictionary keys, object attributes, or methods). For the full field mapping, see the API reference.

Emotion recognition

Qwen3-ASR-Flash-Filetrans and Qwen3-ASR-Flash have emotion recognition always on, with no additional configuration required. The result includes an emotion tag for the speaker, drawn from seven fine-grained categories: surprised, neutral, happy, sad, disgusted, angry, and fearful.

Field paths (vary by interface):

OpenAI-compatible interface (Qwen3-ASR-Flash real-time transcription): nested at choices[].delta.annotations[].emotion (streaming) or choices[].message.annotations[].emotion (non-streaming).
DashScope synchronous interface (Qwen3-ASR-Flash): nested at output.choices[].message.annotations[].emotion.
DashScope asynchronous task interface (Qwen3-ASR-Flash-Filetrans, audio file transcription): nested at transcripts[].sentences[].emotion, alongside the timestamp and speaker fields on each sentence object.

Response structure (excerpt from the DashScope asynchronous task interface):

{
  "transcripts": [{
    "sentences": [{
      "begin_time": 0,
      "end_time": 1440,
      "text": "Welcome to Alibaba Cloud.",
      "emotion": "neutral",
      "language": "en"
    }]
  }]
}

SDK naming conventions for these fields vary (dictionary keys, object attributes, or methods). For the full field mapping, see the API reference.

Important

The Fun-ASR and Paraformer non-real-time models do not yet support emotion recognition. To use emotion recognition with real-time recognition, see the corresponding section in Real-time speech recognition.

Get timestamps

Non-real-time speech recognition can return timestamps in the transcript, which supports subtitle generation, keyword highlighting, and audio or video editing. All three asynchronous transcription models—Fun-ASR, Paraformer, and Qwen3-ASR-Flash-Filetrans—support timestamps, but the default behavior and the control method differ by model:

Qwen3-ASR-Flash-Filetrans: Only the DashScope asynchronous interface supports timestamps; the feature is permanently on. The enable_words request parameter controls the granularity: false (default) returns sentence-level timestamps; true returns word-level timestamps. Word-level timestamps are supported only for Chinese, English, Japanese, Korean, German, French, Spanish, Italian, Portuguese, and Russian. Accuracy is not guaranteed for other languages.
Fun-ASR: Timestamps are permanently on and cannot be disabled.
Paraformer: Timestamps are off by default. To enable them, set the timestamp_alignment_enabled request parameter to true.

Important

When Qwen3-ASR-Flash is called through the OpenAI-compatible interface, the output is a chat.completion and does not include timestamp fields. For timestamps, use Qwen3-ASR-Flash-Filetrans (the asynchronous task interface).

Timestamps are returned in milliseconds at two levels:

Sentence level: sentences[].begin_time and sentences[].end_time mark the start and end of each sentence in the audio.
Word level: The sentences[].words[] array. Each element contains begin_time, end_time, and text (the word or character text).

Response structure (excerpt from the DashScope asynchronous task interface):

{
  "transcripts": [{
    "sentences": [{
      "begin_time": 100,
      "end_time": 3820,
      "text": "Hello, let's discuss the project progress today.",
      "words": [
        { "begin_time": 100, "end_time": 596, "text": "Hello" },
        { "begin_time": 596, "end_time": 844, "text": "let's" }
      ]
    }]
  }]
}

Important

The in-audio timestamps are integer milliseconds (for example, 100). They are not the same as the task-level end_time (the task completion time, a string such as "2024-09-12 15:11:40.903"). Do not confuse them.

SDK naming conventions for these fields vary (dictionary keys, object attributes, or methods). For the full field mapping, see the API reference.

Apply in production

The best practices below improve recognition quality and system stability when you use non-real-time speech recognition in production.

Production best practices

File hosting: Upload audio files to Alibaba Cloud OSS and call the API by URL. Avoid uploading local files (the local-file API is capped at 100 QPS and the limit cannot be increased).
Asynchronous polling: Long-audio transcription uses an asynchronous flow. Set a reasonable polling interval (for example, 2–5 seconds) to avoid burning through your quota with frequent queries. If you need higher throughput than the 100 QPS query ceiling, switch to event callback notifications. For details, see Replace polling with callbacks for high-concurrency workloads.
Error handling: Implement a robust retry mechanism. Retry network timeouts and transient server errors (5xx) with exponential backoff.
Noise reduction: For noisy audio, preprocess the file with tools such as FFmpeg before submitting it for recognition.
Model selection: Choose a model based on audio duration. Use Qwen3-ASR-Flash for short audio up to 5 minutes. Use Fun-ASR or Qwen3-ASR-Flash-Filetrans for longer audio.

Replace polling with callbacks for high-concurrency workloads

After you submit an asynchronous transcription task (Fun-ASR, Qwen3-ASR-Flash-Filetrans, or Paraformer) through POST /api/v1/services/audio/asr/transcription, the usual pattern is to poll GET /api/v1/tasks/{task_id} for the result. That query endpoint defaults to 20 QPS and scales up to 100 QPS, so high-concurrency batch workloads can easily trigger throttling.

Configure callback notifications through EventBridge instead: when a task finishes, Model Studio automatically pushes a dashscope:System:AsyncTaskFinish event to the target you configured (an HTTP/HTTPS endpoint or a RocketMQ topic). Your consumer reads the result directly from the event and no longer needs to call the query endpoint, eliminating the risk of polling-induced throttling. For setup details, see Configure EventBridge callback notifications.

Applicable models

Supported: Fun-ASR, Qwen3-ASR-Flash-Filetrans, and Paraformer (all asynchronous transcription tasks).
Not supported: Qwen3-ASR-Flash, which uses synchronous and streaming calls rather than asynchronous tasks.

Callback message body

For all three models, the callback message body has data.contain_result set to true, and data.output_result carries the transcription_url directly. Your consumer can fetch the recognition result as soon as it receives the callback, without calling GET /api/v1/tasks/{task_id} again. The result field paths and structure differ across the three models—see the table below.

Note

Pick the correct path based on the model you call; never hard-code a single path in your consumer. On failure, data.output_result.output no longer contains results or result. Instead, it contains code and message. Check data.task_status first, then read the result.

Model	Submit parameter	Result field path (in the callback body)	`usage` field
Fun-ASR	`input.file_urls` (an array; only 1 URL is supported per request).	`data.output_result.output.results[].transcription_url` (an array with one entry per file, including `subtask_status` and `task_metrics`).	`duration`
Paraformer	`input.file_urls` (an array; only 1 URL is supported per request).	Same as Fun-ASR: `data.output_result.output.results[].transcription_url`.	`duration`
Qwen3-ASR-Flash-Filetrans	`input.file_url` (a single object; only one URL per request).	`data.output_result.output.result.transcription_url` (a single object, with no `results[]` or `task_metrics`).	`seconds`

Considerations

Security (HTTP/HTTPS delivery): In production, validate the X-Eventbridge-Signature* headers on every callback request before processing the request. Without validation, any external IP can spoof AsyncTaskFinish events and inject fake recognition results. Allow your endpoint at least 5 seconds to respond to each callback request. RocketMQ delivery has no per-message signature; security is enforced by RocketMQ authentication.

Delivery latency: Expect roughly 1–90 seconds between task completion (end_time) and message arrival at your target (HTTP/HTTPS endpoint or RocketMQ topic). The exact latency depends on the current EventBridge load.

Idempotency: The same event may be delivered more than once because of retries. Implement idempotent processing on the consumer side; we recommend using data.id or data.task_id from the CloudEvents envelope as the deduplication key.

Supported models and regions

Singapore

To call the models below, use an API key in the Singapore region:

Fun-ASR: fun-asr (stable, currently equivalent to fun-asr-2025-11-07), fun-asr-2025-11-07 (snapshot), fun-asr-2025-08-25 (snapshot), fun-asr-mtl (stable, currently equivalent to fun-asr-mtl-2025-08-25), fun-asr-mtl-2025-08-25 (snapshot)
Fun-ASR-Flash: fun-asr-flash-2026-06-15
Qwen3-ASR-Flash-Filetrans: qwen3-asr-flash-filetrans (stable, currently equivalent to qwen3-asr-flash-filetrans-2025-11-17), qwen3-asr-flash-filetrans-2025-11-17 (snapshot)
Qwen3-ASR-Flash: qwen3-asr-flash (stable, currently equivalent to qwen3-asr-flash-2025-09-08), qwen3-asr-flash-2026-02-10 (latest snapshot), qwen3-asr-flash-2025-09-08 (snapshot)

US (Virginia)

To call the models below, use an API key in the US region:

Qwen3-ASR-Flash: qwen3-asr-flash-us (stable, currently equivalent to qwen3-asr-flash-2025-09-08-us), qwen3-asr-flash-2025-09-08-us (snapshot)

China (Beijing)

To call the models below, use an API key in the Beijing region:

Fun-ASR: fun-asr (stable, currently equivalent to fun-asr-2025-11-07), fun-asr-2025-11-07 (snapshot), fun-asr-2025-08-25 (snapshot), fun-asr-mtl (stable, currently equivalent to fun-asr-mtl-2025-08-25), fun-asr-mtl-2025-08-25 (snapshot)
Fun-ASR-Flash: fun-asr-flash-2026-06-15
Qwen3-ASR-Flash-Filetrans: qwen3-asr-flash-filetrans (stable, currently equivalent to qwen3-asr-flash-filetrans-2025-11-17), qwen3-asr-flash-filetrans-2025-11-17 (snapshot)
Qwen3-ASR-Flash: qwen3-asr-flash (stable, currently equivalent to qwen3-asr-flash-2025-09-08), qwen3-asr-flash-2026-02-10 (latest snapshot), qwen3-asr-flash-2025-09-08 (snapshot)
Paraformer: paraformer-v2, paraformer-8k-v2

API reference

FAQ

Q: How do I provide a publicly accessible audio URL to the API?

Use Alibaba Cloud Object Storage Service (OSS). OSS provides highly available and durable storage and can generate publicly accessible URLs.

Verify that the URL is reachable from the public internet: Open the URL in a browser or run curl against it to confirm the audio file downloads or plays successfully (HTTP status code 200).

Q: How do I check whether the audio format meets the requirements?

Use the open-source tool ffprobe to quickly inspect audio details:

# Inspect the container format (format_name), codec (codec_name), sample rate (sample_rate), and channel count (channels)
ffprobe -v error -show_entries format=format_name -show_entries stream=codec_name,sample_rate,channels -of default=noprint_wrappers=1 your_audio_file.mp3

Q: How do I process audio to meet the model requirements?

Use the open-source tool FFmpeg to clip audio or convert formats:

Audio clipping: extract a segment from a long audio file

# -i: Input file
# -ss 00:01:30: Set the clip start time (start at 1 minute 30 seconds)
# -t 00:02:00: Set the clip duration (clip 2 minutes)
# -c copy: Copy the audio stream directly without re-encoding; faster
# output_clip.wav: Output file
ffmpeg -i long_audio.wav -ss 00:01:30 -t 00:02:00 -c copy output_clip.wav

Format conversion

For example, convert any audio to a 16 kHz, 16-bit, mono WAV file:

# -i: Input file
# -ac 1: Set the channel count to 1 (mono)
# -ar 16000: Set the sample rate to 16000 Hz (16 kHz)
# -sample_fmt s16: Set the sample format to 16-bit signed integer PCM
# output.wav: Output file
ffmpeg -i input.mp3 -ac 1 -ar 16000 -sample_fmt s16 output.wav

Q: How do I improve recognition accuracy?

The following factors affect recognition accuracy. Check each one and tune accordingly.

Main factors:

Sound quality: Recording equipment, sample rate, and ambient noise directly affect audio clarity. High-quality input is the foundation of accurate recognition.
Speaker characteristics: Variations in pitch, speaking rate, accent, and dialect—especially uncommon dialects and strong accents—make recognition harder.
Language and vocabulary: Mixed languages, technical terms, and slang make recognition harder. Configure hotwords to improve accuracy for domain-specific terminology.

How to optimize:

Improve audio quality: Use a high-quality microphone, record at the recommended sample rate, and minimize ambient noise and echo.
Adapt to the speaker: For audio with strong accents or distinct dialects, choose a model that supports the relevant dialect.
Configure hotwords: Set hotwords for technical terms and proper nouns.