All Products
Search
Document Center

Alibaba Cloud Model Studio:Audio file recognition (Qwen-ASR) API reference

Last Updated:Jan 30, 2026

This topic describes the input and output parameters for Qwen-ASR. Call the API using the OpenAI compatible protocol or the DashScope protocol.

User guide: For model details and how to select them, see Audio file recognition - Qwen.

Model connection types

Different models support different connection types. Select the appropriate integration method from the following table.

Model

Connection type

Qwen3-ASR-Flash-Filetrans

Only supports the DashScope asynchronous method

Qwen3-ASR-Flash

OpenAI compatible and DashScope synchronous

OpenAI compatible

Important

The US region does not support the OpenAI compatible mode.

URL

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region, and model inference compute resources are dynamically scheduled globally, excluding Mainland China.

HTTP endpoint: POST https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions

base_url for SDK: https://dashscope-intl.aliyuncs.com/compatible-mode/v1

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region, and model inference compute resources are restricted to Mainland China.

HTTP endpoint: POST https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions

base_url for SDK: https://dashscope.aliyuncs.com/compatible-mode/v1

Request body

Input: Audio file URL

Python SDK

from openai import OpenAI
import os

try:
    client = OpenAI(
        # The API keys for the Singapore/US and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        # If you have not configured environment variables, replace the following line with: api_key = "sk-xxx",
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # The following URL is for the Singapore/US region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    

    stream_enabled = False  # Whether to enable streaming output
    completion = client.chat.completions.create(
        model="qwen3-asr-flash",
        messages=[
            {
                "content": [
                    {
                        "type": "input_audio",
                        "input_audio": {
                            "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                        }
                    }
                ],
                "role": "user"
            }
        ],
        stream=stream_enabled,
        # When stream is set to False, you cannot set the stream_options parameter.
        # stream_options={"include_usage": True},
        extra_body={
            "asr_options": {
                # "language": "zh",
                "enable_itn": False
            }
        }
    )
    if stream_enabled:
        full_content = ""
        print("Streaming output content:")
        for chunk in completion:
            # If stream_options.include_usage is True, the choices field of the last chunk is an empty list and should be skipped. You can get token usage from chunk.usage.
            print(chunk)
            if chunk.choices and chunk.choices[0].delta.content:
                full_content += chunk.choices[0].delta.content
        print(f"Full content: {full_content}")
    else:
        print(f"Non-streaming output content: {completion.choices[0].message.content}")
except Exception as e:
    print(f"Error message: {e}")

Node.js SDK

// Preparations:
// For Windows/Mac/Linux:
// 1. Make sure Node.js is installed (version >= 14 is recommended).
// 2. Run the following command to install the necessary dependencies: npm install openai

import OpenAI from "openai";

const client = new OpenAI({
  // The API keys for the Singapore/US and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
  // If you have not configured environment variables, replace the following line with: apiKey: "sk-xxx",
  apiKey: process.env.DASHSCOPE_API_KEY,
  // The following URL is for the Singapore/US region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
  baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1", 
});

async function main() {
  try {
    const streamEnabled = false; // Whether to enable streaming output
    const completion = await client.chat.completions.create({
      model: "qwen3-asr-flash",
      messages: [
        {
          role: "user",
          content: [
            {
              type: "input_audio",
              input_audio: {
                data: "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
              }
            }
          ]
        }
      ],
      stream: streamEnabled,
      // When stream is set to False, you cannot set the stream_options parameter.
      // stream_options: {
      //   "include_usage": true
      // },
      extra_body: {
        asr_options: {
          // language: "zh",
          enable_itn: false
        }
      }
    });

    if (streamEnabled) {
      let fullContent = "";
      console.log("Streaming output content:");
      for await (const chunk of completion) {
        console.log(JSON.stringify(chunk));
        if (chunk.choices && chunk.choices.length > 0) {
          const delta = chunk.choices[0].delta;
          if (delta && delta.content) {
            fullContent += delta.content;
          }
        }
      }
      console.log(`Full content: ${fullContent}`);
    } else {
      console.log(`Non-streaming output content: ${completion.choices[0].message.content}`);
    }
  } catch (err) {
    console.error(`Error message: ${err}`);
  }
}

main();

cURL

You can configure the context for customized recognition using the text parameter of the System Message.

# ======= Important =======
# The following URL is for the Singapore/US region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
# The API keys for the Singapore/US and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===

curl -X POST 'https://dashscope-intl.aliyuncs.com/compatible-mode/v1/chat/completions' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-asr-flash",
    "messages": [
        {
            "content": [
                {
                    "type": "input_audio",
                    "input_audio": {
                        "data": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    }
                }
            ],
            "role": "user"
        }
    ],
    "stream":false,
    "asr_options": {
        "enable_itn": false
    }
}'

Input: Base64-encoded audio file

You can input Base64-encoded data (Data URL) in the format: data:<mediatype>;base64,<data>.

  • <mediatype>: The Multipurpose Internet Mail Extensions (MIME) type.

    This varies by audio format. For example:

    • WAV: audio/wav

    • MP3: audio/mpeg

  • <data>: The Base64-encoded string of the audio.

    Base64 encoding increases the file size. Ensure the original file size is small enough that the encoded file does not exceed the 10 MB input audio size limit.

  • Example: data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9

    Click to view sample code

    import base64, pathlib
    
    # input.mp3 is the local audio file for voice cloning. Replace it with the path to your own audio file and ensure it meets the audio requirements.
    file_path = pathlib.Path("input.mp3")
    base64_str = base64.b64encode(file_path.read_bytes()).decode()
    data_uri = f"data:audio/mpeg;base64,{base64_str}"
    import java.nio.file.*;
    import java.util.Base64;
    
    public class Main {
        /**
         * filePath is the local audio file for voice cloning. Replace it with the path to your own audio file and ensure it meets the audio requirements.
         */
        public static String toDataUrl(String filePath) throws Exception {
            byte[] bytes = Files.readAllBytes(Paths.get(filePath));
            String encoded = Base64.getEncoder().encodeToString(bytes);
            return "data:audio/mpeg;base64," + encoded;
        }
    
        // Example usage
        public static void main(String[] args) throws Exception {
            System.out.println(toDataUrl("input.mp3"));
        }
    }

Python SDK

The audio file used in the example is welcome.mp3.

import base64
from openai import OpenAI
import os
import pathlib

try:
    # Replace with the actual path to your audio file
    file_path = "welcome.mp3"
    # Replace with the actual MIME type of your audio file
    audio_mime_type = "audio/mpeg"

    file_path_obj = pathlib.Path(file_path)
    if not file_path_obj.exists():
        raise FileNotFoundError(f"Audio file not found: {file_path}")

    base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
    data_uri = f"data:{audio_mime_type};base64,{base64_str}"

    client = OpenAI(
        # The API keys for the Singapore/US and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        # If you have not configured environment variables, replace the following line with: api_key = "sk-xxx",
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        # The following URL is for the Singapore/US region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
        base_url="https://dashscope-intl.aliyuncs.com/compatible-mode/v1",
    )
    

    stream_enabled = False  # Whether to enable streaming output
    completion = client.chat.completions.create(
        model="qwen3-asr-flash",
        messages=[
            {
                "content": [
                    {
                        "type": "input_audio",
                        "input_audio": {
                            "data": data_uri
                        }
                    }
                ],
                "role": "user"
            }
        ],
        stream=stream_enabled,
        # When stream is set to False, you cannot set the stream_options parameter.
        # stream_options={"include_usage": True},
        extra_body={
            "asr_options": {
                # "language": "zh",
                "enable_itn": False
            }
        }
    )
    if stream_enabled:
        full_content = ""
        print("Streaming output content:")
        for chunk in completion:
            # If stream_options.include_usage is True, the choices field of the last chunk is an empty list and should be skipped. You can get token usage from chunk.usage.
            print(chunk)
            if chunk.choices and chunk.choices[0].delta.content:
                full_content += chunk.choices[0].delta.content
        print(f"Full content: {full_content}")
    else:
        print(f"Non-streaming output content: {completion.choices[0].message.content}")
except Exception as e:
    print(f"Error message: {e}")

Node.js SDK

The audio file used in the example is welcome.mp3.

// Preparations:
// For Windows/Mac/Linux:
// 1. Make sure Node.js is installed (version >= 14 is recommended).
// 2. Run the following command to install the necessary dependencies: npm install openai

import OpenAI from "openai";
import { readFileSync } from 'fs';

const client = new OpenAI({
  // The API keys for the Singapore/US and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
  // If you have not configured environment variables, replace the following line with: apiKey: "sk-xxx",
  apiKey: process.env.DASHSCOPE_API_KEY,
  // The following URL is for the Singapore/US region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/compatible-mode/v1
  baseURL: "https://dashscope-intl.aliyuncs.com/compatible-mode/v1", 
});

const encodeAudioFile = (audioFilePath) => {
    const audioFile = readFileSync(audioFilePath);
    return audioFile.toString('base64');
};

// Replace with the actual path to your audio file
const dataUri = `data:audio/mpeg;base64,${encodeAudioFile("welcome.mp3")}`;

async function main() {
  try {
    const streamEnabled = false; // Whether to enable streaming output
    const completion = await client.chat.completions.create({
      model: "qwen3-asr-flash",
      messages: [
        {
          role: "user",
          content: [
            {
              type: "input_audio",
              input_audio: {
                data: dataUri
              }
            }
          ]
        }
      ],
      stream: streamEnabled,
      // When stream is set to False, you cannot set the stream_options parameter.
      // stream_options: {
      //   "include_usage": true
      // },
      extra_body: {
        asr_options: {
          // language: "zh",
          enable_itn: false
        }
      }
    });

    if (streamEnabled) {
      let fullContent = "";
      console.log("Streaming output content:");
      for await (const chunk of completion) {
        console.log(JSON.stringify(chunk));
        if (chunk.choices && chunk.choices.length > 0) {
          const delta = chunk.choices[0].delta;
          if (delta && delta.content) {
            fullContent += delta.content;
          }
        }
      }
      console.log(`Full content: ${fullContent}`);
    } else {
      console.log(`Non-streaming output content: ${completion.choices[0].message.content}`);
    }
  } catch (err) {
    console.error(`Error message: ${err}`);
  }
}

main();

model string (Required)

The model name. This parameter is applicable only to Qwen3-ASR-Flash.

messages array (Required)

The list of messages.

Message type

System Message object (Optional)

The goal or role of the model. If you set a system message, place it at the beginning of the messages list.

Properties

content array (Required)

The content of the message. Only one set of messages is allowed.

Properties

text string

Specifies the context. Qwen3-ASR-Flash lets you provide background text, entity vocabularies, and other reference information (context) during speech recognition to obtain customized recognition results.

Length limit: 10,000 tokens.

For more information, see Context biasing.

role string (Required)

Set to system.

User Message object (Required)

The message sent by the user to the model.

Properties

content array (Required)

The content of the user message. Only one set of messages is allowed.

Properties

type string (Required)

Set to input_audio, which indicates that the input is audio.

input_audio string (Required)

The audio to be recognized. For more information, see Getting started.

In OpenAI compatible mode, Qwen3-ASR-Flash supports two input formats: Base64-encoded files and URLs of files that are accessible over the Internet.

When you use the SDK, if your recording files are stored in OSS, you cannot use temporary URLs that start with the oss:// prefix.

When you use a RESTful API, if the audio files are stored in OSS, you can use temporary URLs that start with the oss:// prefix. Note the following:

Important
  • The temporary URL is valid for 48 hours and cannot be used after it expires. Do not use it in a production environment.

  • The API for obtaining an upload credential is limited to 100 QPS and does not support scaling out. Do not use it in production environments, high-concurrency scenarios, or stress testing scenarios.

  • For production environments, use a stable storage service such as OSS to ensure long-term file availability and avoid rate limiting issues.

role string (Required)

The role of the user message. Set to user.

asr_options object (Optional)

Specifies whether to enable certain features.

asr_options is not a standard OpenAI parameter. If you use an OpenAI SDK, pass this parameter through extra_body.

Properties

language string (Optional) No default value

If you know the language of the audio, you can specify it using this parameter to improve recognition accuracy.

You can specify only one language.

If the language of the audio is uncertain or contains multiple languages, such as a mix of Chinese, English, Japanese, and Korean, do not specify this parameter.

Valid values

  • zh: Chinese (Mandarin, Sichuanese, Minnan, and Wu)

  • yue: Cantonese

  • en: English

  • ja: Japanese

  • de: German

  • ko: Korean

  • ru: Russian

  • fr: French

  • pt: Portuguese

  • ar: Arabic

  • it: Italian

  • es: Spanish

  • hi: Hindi

  • id: Indonesian

  • th: Thai

  • tr: Turkish

  • uk: Ukrainian

  • vi: Vietnamese

  • cs: Czech

  • da: Danish

  • fil: Filipino

  • fi: Finnish

  • is: Icelandic

  • ms: Malay

  • no: Norwegian

  • pl: Polish

  • sv: Swedish

enable_itn boolean (Optional) Defaults to: false

Specifies whether to enable Inverse Text Normalization (ITN). This feature is applicable only to Chinese and English audio.

Parameter values:

  • true

  • false

stream boolean (Optional) Defaults to: false

Specifies whether to use streaming output for the response. For more information, see Streaming output.

Valid values:

  • false: The model generates all content and returns it at once.

  • true: The model generates and outputs content simultaneously. A data block (chunk) is returned each time a part of the content is generated. You must read these blocks in real time to assemble the complete reply.

We recommend that you set this parameter to true to improve responsiveness and reduce the risk of timeouts.

stream_options object (Optional)

The configuration items for streaming output. This parameter takes effect only when stream is set to true.

Properties

include_usage boolean (Optional) Defaults to: false

Specifies whether to include token consumption information in the last data block of the response.

Valid values:

  • true

  • false

When streaming output is enabled, token consumption information can appear only in the last data block of the response.

Response body

Non-streaming output

{
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "annotations": [
                    {
                        "emotion": "neutral",
                        "language": "zh",
                        "type": "audio_info"
                    }
                ],
                "content": "Welcome to Alibaba Cloud.",
                "role": "assistant"
            }
        }
    ],
    "created": 1767683986,
    "id": "chatcmpl-487abe5f-d4f2-9363-a877-xxxxxxx",
    "model": "qwen3-asr-flash",
    "object": "chat.completion",
    "usage": {
        "completion_tokens": 12,
        "completion_tokens_details": {
            "text_tokens": 12
        },
        "prompt_tokens": 42,
        "prompt_tokens_details": {
            "audio_tokens": 42,
            "text_tokens": 0
        },
        "seconds": 1,
        "total_tokens": 54
    }
}

Streaming output

data: {"model":"qwen3-asr-flash","id":"chatcmpl-3fb97803-d27f-9289-8889-xxxxx","created":1767685989,"object":"chat.completion.chunk","usage":null,"choices":[{"logprobs":null,"index":0,"delta":{"content":"","role":"assistant"}}]}

data: {"model":"qwen3-asr-flash","id":"chatcmpl-3fb97803-d27f-9289-8889-xxxxx","choices":[{"delta":{"annotations":[{"type":"audio_info","language":"zh","emotion":"neutral"}],"content":"Welcome","role":null},"index":0}],"created":1767685989,"object":"chat.completion.chunk","usage":null}

data: {"model":"qwen3-asr-flash","id":"chatcmpl-3fb97803-d27f-9289-8889-xxxxx","choices":[{"delta":{"annotations":[{"type":"audio_info","language":"zh","emotion":"neutral"}],"content":" to","role":null},"index":0}],"created":1767685989,"object":"chat.completion.chunk","usage":null}

data: {"model":"qwen3-asr-flash","id":"chatcmpl-3fb97803-d27f-9289-8889-xxxxx","choices":[{"delta":{"annotations":[{"type":"audio_info","language":"zh","emotion":"neutral"}],"content":" Alibaba","role":null},"index":0}],"created":1767685989,"object":"chat.completion.chunk","usage":null}

data: {"model":"qwen3-asr-flash","id":"chatcmpl-3fb97803-d27f-9289-8889-xxxxx","choices":[{"delta":{"annotations":[{"type":"audio_info","language":"zh","emotion":"neutral"}],"content":" Cloud","role":null},"index":0}],"created":1767685989,"object":"chat.completion.chunk","usage":null}

data: {"model":"qwen3-asr-flash","id":"chatcmpl-3fb97803-d27f-9289-8889-xxxxx","choices":[{"delta":{"annotations":[{"type":"audio_info","language":"zh","emotion":"neutral"}],"content":".","role":null},"index":0}],"created":1767685989,"object":"chat.completion.chunk","usage":null}

data: {"model":"qwen3-asr-flash","id":"chatcmpl-3fb97803-d27f-9289-8889-xxxxx","choices":[{"delta":{"role":null},"index":0,"finish_reason":"stop"}],"created":1767685989,"object":"chat.completion.chunk","usage":null}

data: [DONE]

id string

The unique identifier for this call.

choices array

The output information of the model.

Properties

finish_reason string

The following three cases apply:

  • null: Generation is in progress.

  • stop: The model finished generating output naturally or was stopped by a stop condition in the input parameters.

  • length: Generation was stopped because the output exceeded the maximum length.

index integer

The index of the current object in the choices array.

message object

The message object output by the model.

Properties

role string

The role of the output message. The value is assistant.

content array

The speech recognition result.

annotations array

The output annotation information, such as the language.

Properties

language string

The language of the recognized audio. If the language request parameter is specified, this value is the same as the value of that parameter.

Valid values

  • zh: Chinese (Mandarin, Sichuanese, Minnan, and Wu)

  • yue: Cantonese

  • en: English

  • ja: Japanese

  • de: German

  • ko: Korean

  • ru: Russian

  • fr: French

  • pt: Portuguese

  • ar: Arabic

  • it: Italian

  • es: Spanish

  • hi: Hindi

  • id: Indonesian

  • th: Thai

  • tr: Turkish

  • uk: Ukrainian

  • vi: Vietnamese

  • cs: Czech

  • da: Danish

  • fil: Filipino

  • fi: Finnish

  • is: Icelandic

  • ms: Malay

  • no: Norwegian

  • pl: Polish

  • sv: Swedish

type string

Set to audio_info, which indicates audio information.

emotion string

The emotion of the recognized audio. The following emotions are supported:

  • surprised

  • neutral

  • happy

  • sad

  • disgusted

  • angry

  • fearful

created integer

The UNIX timestamp (in seconds) when the request was created.

model string

The model used for this request.

object string

Always chat.completion.

usage object

The token consumption information for this request.

Properties

completion_tokens integer

The number of tokens in the model's output.

completion_tokens_details object

The fine-grained details of the tokens in the model's output.

Properties

text_tokens integer

The number of tokens in the model's output text.

prompt_tokens object

The number of tokens in the input.

prompt_tokens_details object

The fine-grained details of the tokens in the input.

Properties

audio_tokens integer

The length of the input audio in tokens. The conversion rule is that each second of audio is converted to 25 tokens. Audio shorter than 1 second is counted as 1 second.

text_tokens integer

Ignore this parameter.

seconds integer

The duration of the audio in seconds.

total_tokens integer

The total number of tokens in the input and output (total_tokens = completion_tokens + prompt_tokens).

DashScope synchronous

URL

International

In the international deployment mode, the endpoint and data storage are located in the Singapore region, and model inference compute resources are dynamically scheduled globally, excluding Mainland China.

HTTP endpoint: POST https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation

base_url for SDK: https://dashscope-intl.aliyuncs.com/api/v1

United States

In the US deployment mode, the endpoint and data storage are located in the US (Virginia) region, and model inference compute resources are restricted to the United States.

HTTP endpoint: POST https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation

base_url for SDK: https://dashscope-us.aliyuncs.com/api/v1

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region, and model inference compute resources are available only in Mainland China.

HTTP endpoint: POST https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation

base_url for SDK: https://dashscope.aliyuncs.com/api/v1

Request body

Qwen3-ASR-Flash

The following example shows how to recognize audio from a URL. For an example of how to recognize a local audio file, see QuickStart.

cURL

# ======= Important =======
# The following URL is for the Singapore region. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# The API keys for the Singapore/US and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you use a model in the US region, you must add the "us" suffix.
# === Delete this comment before execution ===

curl -X POST "https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation" \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen3-asr-flash",
    "input": {
        "messages": [
            {
                "content": [
                    {
                        "text": ""
                    }
                ],
                "role": "system"
            },
            {
                "content": [
                    {
                        "audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    }
                ],
                "role": "user"
            }
        ]
    },
    "parameters": {
        "asr_options": {
            "enable_itn": false
        }
    }
}'

Java

import java.util.Arrays;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;

import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.common.MultiModalMessage;
import com.alibaba.dashscope.common.Role;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.Constants;
import com.alibaba.dashscope.utils.JsonUtils;

public class Main {
    public static void simpleMultiModalConversationCall()
            throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalMessage userMessage = MultiModalMessage.builder()
                .role(Role.USER.getValue())
                .content(Arrays.asList(
                        Collections.singletonMap("audio", "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3")))
                .build();

        MultiModalMessage sysMessage = MultiModalMessage.builder().role(Role.SYSTEM.getValue())
                // Configure the context for customized recognition here
                .content(Arrays.asList(Collections.singletonMap("text", "")))
                .build();

        Map<String, Object> asrOptions = new HashMap<>();
        asrOptions.put("enable_itn", false);
        // asrOptions.put("language", "zh"); // Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // The API keys for the Singapore/US and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // If you have not configured environment variables, replace the following line with: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                // If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
                .model("qwen3-asr-flash")
                .message(sysMessage)
                .message(userMessage)
                .parameter("asr_options", asrOptions)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(JsonUtils.toJson(result));
    }
    public static void main(String[] args) {
        try {
            // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
            Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
            simpleMultiModalConversationCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

Python

import os
import dashscope

# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1. If you use a model in the US region, replace the URL with: https://dashscope-us.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

messages = [
    {"role": "system", "content": [{"text": ""}]},  # Configure the context for customized recognition
    {"role": "user", "content": [{"audio": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"}]}
]

response = dashscope.MultiModalConversation.call(
    # The API keys for the Singapore/US and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you have not configured environment variables, replace the following line with: api_key = "sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # If you use a model in the US region, add the "-us" suffix to the model name, for example, qwen3-asr-flash-us
    model="qwen3-asr-flash",
    messages=messages,
    result_format="message",
    asr_options={
        #"language": "zh", # Optional. If the audio language is known, you can specify it using this parameter to improve recognition accuracy.
        "enable_itn":False
    }
)
print(response)

model string (Required)

The name of the model. This parameter is applicable only to Qwen-3-ASR-Flash.

messages array (Required)

The list of messages.

When you make an HTTP call, place messages in the input object.

Message type

System Message object (Optional)

The goal or role of the model. If you set a system message, place it at the beginning of the messages list.

This parameter is supported only by Qwen3-ASR-Flash.

Properties

content array (Required)

The content of the message. Only one set of messages is allowed.

Properties

text string

Specifies the context. Qwen3-ASR-Flash lets you provide background text, entity vocabularies, and other reference information (context) during speech recognition to obtain customized recognition results.

Length limit: 10,000 tokens.

For more information, see Context biasing.

role string (Required)

Set to system.

User Message object (Required)

The message sent by the user to the model.

Properties

content array (Required)

The content of the user message. Only one set of messages is allowed.

Properties

audio string (Required)

The audio to be recognized. For more information, see Getting started.

When called through DashScope, Qwen3-ASR-Flash supports three input formats: Base64-encoded files, absolute paths of local files, and URLs of files that are accessible over the Internet.

When you use the SDK, if your recording files are stored in OSS, you cannot use temporary URLs that start with the oss:// prefix.

When you use a RESTful API, if the audio files are stored in OSS, you can use temporary URLs that start with the oss:// prefix. Note the following:

Important
  • The temporary URL is valid for 48 hours and cannot be used after it expires. Do not use it in a production environment.

  • The API for obtaining an upload credential is limited to 100 QPS and does not support scaling out. Do not use it in production environments, high-concurrency scenarios, or stress testing scenarios.

  • For production environments, use a stable storage service such as OSS to ensure long-term file availability and avoid rate limiting issues.

role string (Required)

The role of the user message. Set to user.

asr_options object (Optional)

Specifies whether to enable certain features.

This parameter is supported only by Qwen3-ASR-Flash.

Properties

language string (Optional) No default value

If you know the language of the audio, you can specify it using this parameter to improve recognition accuracy.

You can specify only one language.

If the language of the audio is uncertain or contains multiple languages, such as a mix of Chinese, English, Japanese, and Korean, do not specify this parameter.

Valid values

  • zh: Chinese (Mandarin, Sichuanese, Minnan, and Wu)

  • yue: Cantonese

  • en: English

  • ja: Japanese

  • de: German

  • ko: Korean

  • ru: Russian

  • fr: French

  • pt: Portuguese

  • ar: Arabic

  • it: Italian

  • es: Spanish

  • hi: Hindi

  • id: Indonesian

  • th: Thai

  • tr: Turkish

  • uk: Ukrainian

  • vi: Vietnamese

  • cs: Czech

  • da: Danish

  • fil: Filipino

  • fi: Finnish

  • is: Icelandic

  • ms: Malay

  • no: Norwegian

  • pl: Polish

  • sv: Swedish

enable_itn boolean (Optional) Defaults to: false

Specifies whether to enable Inverse Text Normalization (ITN). This feature is applicable only to Chinese and English audio.

Parameter values:

  • true

  • false

Response body

Qwen3-ASR-Flash

{
    "output": {
        "choices": [
            {
                "finish_reason": "stop",
                "message": {
                    "annotations": [
                        {
                            "language": "zh",
                            "type": "audio_info",
                            "emotion": "neutral"
                        }
                    ],
                    "content": [
                        {
                            "text": "Welcome to Alibaba Cloud."
                        }
                    ],
                    "role": "assistant"
                }
            }
        ]
    },
    "usage": {
        "input_tokens_details": {
            "text_tokens": 0
        },
        "output_tokens_details": {
            "text_tokens": 6
        },
        "seconds": 1
    },
    "request_id": "568e2bf0-d6f2-97f8-9f15-a57b11dc6977"
}

request_id string

The unique identifier for this call.

The parameter returned by the Java SDK is requestId.

output object

The information about the call result.

Properties

choices array

The output of the model, which includes the `choices` parameter only when `result_format` is `message`.

Properties

finish_reason string

The following three cases apply:

  • The value is null during the generation process.

  • stop: The model finished generating output naturally or was stopped by a stop condition in the input parameters.

  • length: Generation was stopped because the output exceeded the maximum length.

message object

The message object output by the model.

Properties

role string

The role of the output message. The value is assistant.

content array

The content of the output message.

Properties

text string

The speech recognition result.

annotations array

The output annotation information, such as the language.

Properties

language string

The language of the recognized audio. If the language request parameter is specified, this value is the same as the value of that parameter.

Valid values

  • zh: Chinese (Mandarin, Sichuanese, Minnan, and Wu)

  • yue: Cantonese

  • en: English

  • ja: Japanese

  • de: German

  • ko: Korean

  • ru: Russian

  • fr: French

  • pt: Portuguese

  • ar: Arabic

  • it: Italian

  • es: Spanish

  • hi: Hindi

  • id: Indonesian

  • th: Thai

  • tr: Turkish

  • uk: Ukrainian

  • vi: Vietnamese

  • cs: Czech

  • da: Danish

  • fil: Filipino

  • fi: Finnish

  • is: Icelandic

  • ms: Malay

  • no: Norwegian

  • pl: Polish

  • sv: Swedish

type string

Set to audio_info, which indicates audio information.

emotion string

The emotion of the recognized audio. The following emotions are supported:

  • surprised

  • neutral

  • happy

  • sad

  • disgusted

  • angry

  • fearful

usage object

The token consumption information for this request.

Properties

input_tokens_details object

The length of the input content for Qwen3-ASR-Flash in tokens.

Properties

text_tokens integer

Ignore this paramter.

output_tokens_details object

The length of the output content from Qwen3-ASR-Flash in tokens.

Properties

text_tokens integer

The length of the recognized text output by Qwen3-ASR-Flash in tokens.

seconds integer

The duration of the audio for Qwen3-ASR-Flash in seconds.

DashScope asynchronous

Process description

Unlike the OpenAI compatible mode or DashScope synchronous call, which involve a single request and an immediate response, asynchronous invocation is designed for processing long audio files or other time-consuming tasks. This mode uses a two-step "submit-poll" process to prevent request timeouts caused by long waiting times:

  1. Step 1: Submit a task

    • The client initiates an asynchronous processing request.

    • After validating the request, the server does not execute the task immediately. Instead, it returns a unique task_id to indicate that the task has been successfully created.

  2. Step 2: Get the result

    • The client uses the obtained task_id to repeatedly call the result query API through polling.

    • After the task is complete, the result query API returns the final recognition result.

You can choose to use an SDK or directly call the RESTful API based on your integration environment.

  • Use an SDK (see Getting started for sample code, Submit a task's request body for request parameters, and Asynchronous call recognition result for returned results).

    The SDK encapsulates the underlying API call details, providing a more convenient programming experience.

    1. Submit a task: Call the async_call() (Python) or asyncCall() (Java) method to submit the task. This method returns a task object that contains a task_id.

    2. Get the result: Use the task object returned in the previous step or the task_id to call the fetch() method to retrieve the result. The SDK automatically handles the polling logic until the task is complete or times out.

  • 2. Use a RESTful API

    Directly calling the HTTP API provides maximum flexibility.

    1. Submit a task. If the request is successful, the response body contains a task_id.

    2. Use the task_id from the previous step to retrieve the task execution result.

Submit a task

URL

International

In the International deployment mode, the endpoint and data storage are located in the Singapore region, and model inference compute resources are dynamically scheduled globally, excluding Mainland China.

HTTP endpoint: POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription

base_url for SDK: https://dashscope-intl.aliyuncs.com/api/v1

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region, and model inference compute resources are restricted to Mainland China.

HTTP endpoint: POST https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription

base_url for SDK: https://dashscope.aliyuncs.com/api/v1

Request body

cURL

# ======= Important =======
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
# The API keys for the Singapore and Beijing regions are different. For more information about how to obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
# === Delete this comment before running the command. ===

curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header "Content-Type: application/json" \
--header "X-DashScope-Async: enable" \
--data '{
    "model": "qwen3-asr-flash-filetrans",
    "input": {
        "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
    },
    "parameters": {
        "channel_id":[
            0
        ], 
        "enable_itn": false
    }
}'

Java

For SDK samples, see Getting Started.

import com.google.gson.Gson;
import com.google.gson.annotations.SerializedName;
import okhttp3.*;

import java.io.IOException;

public class Main {
    // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
    private static final String API_URL = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription";

    public static void main(String[] args) {
        // The API keys for the Singapore and Beijing regions are different. For more information about how to obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
        // If you have not configured the environment variable, replace the following line with your Model Studio API key: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        OkHttpClient client = new OkHttpClient();
        Gson gson = new Gson();

        /*String payloadJson = """
                {
                    "model": "qwen3-asr-flash-filetrans",
                    "input": {
                        "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    },
                    "parameters": {
                        "channel_id": [0],
                        "enable_itn": false,
                        "language": "zh",
                        "corpus": {
                            "text": ""
                        }
                    }
                }
                """;*/
        String payloadJson = """
                {
                    "model": "qwen3-asr-flash-filetrans",
                    "input": {
                        "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
                    },
                    "parameters": {
                        "channel_id": [0],
                        "enable_itn": false
                    }
                }
                """;

        RequestBody body = RequestBody.create(payloadJson, MediaType.get("application/json; charset=utf-8"));
        Request request = new Request.Builder()
                .url(API_URL)
                .addHeader("Authorization", "Bearer " + apiKey)
                .addHeader("Content-Type", "application/json")
                .addHeader("X-DashScope-Async", "enable")
                .post(body)
                .build();

        try (Response response = client.newCall(request).execute()) {
            if (response.isSuccessful() && response.body() != null) {
                String respBody = response.body().string();
                // Parse JSON with Gson.
                ApiResponse apiResp = gson.fromJson(respBody, ApiResponse.class);
                if (apiResp.output != null) {
                    System.out.println("task_id: " + apiResp.output.taskId);
                } else {
                    System.out.println(respBody);
                }
            } else {
                System.out.println("task failed! HTTP code: " + response.code());
                if (response.body() != null) {
                    System.out.println(response.body().string());
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    static class ApiResponse {
        @SerializedName("request_id")
        String requestId;

        Output output;
    }

    static class Output {
        @SerializedName("task_id")
        String taskId;

        @SerializedName("task_status")
        String taskStatus;
    }
}

Python

For SDK examples, see Getting Started.

import requests
import json
import os

# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription
url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/asr/transcription"

# The API keys for the Singapore and Beijing regions are different. For more information about how to obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
# If you have not configured the environment variable, replace the following line with your Model Studio API key: DASHSCOPE_API_KEY = "sk-xxx"
DASHSCOPE_API_KEY = os.getenv("DASHSCOPE_API_KEY")

headers = {
    "Authorization": f"Bearer {DASHSCOPE_API_KEY}",
    "Content-Type": "application/json",
    "X-DashScope-Async": "enable"
}

payload = {
    "model": "qwen3-asr-flash-filetrans",
    "input": {
        "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/audios/welcome.mp3"
    },
    "parameters": {
        "channel_id": [0],
        # "language": "zh",
        "enable_itn": False
        # "corpus": {
        #     "text": ""
        # }
    }
}

response = requests.post(url, headers=headers, data=json.dumps(payload))
if response.status_code == 200:
    print(f"task_id: {response.json()["output"]["task_id"]}")
else:
    print("task failed!")
    print(response.json())

model string (Required)

The name of the model. Only applicable to Qwen3-ASR-Flash-Filetrans.

input object (Required)

Properties

file_url string (Required)

The URL of the audio file to be recognized. The URL must be accessible over the Internet.

When you use the SDK, if your recording files are stored in OSS, you cannot use temporary URLs that start with the oss:// prefix.

When you use a RESTful API, if the audio files are stored in OSS, you can use temporary URLs that start with the oss:// prefix. Note the following:

Important
  • The temporary URL is valid for 48 hours and cannot be used after it expires. Do not use it in a production environment.

  • The API for obtaining an upload credential is limited to 100 QPS and does not support scaling out. Do not use it in production environments, high-concurrency scenarios, or stress testing scenarios.

  • For production environments, use a stable storage service such as OSS to ensure long-term file availability and avoid rate limiting issues.

parameters object (Optional)

Properties

language string (Optional) No default value

If you know the language of the audio, you can specify it using this parameter to improve recognition accuracy.

You can specify only one language.

If the language of the audio is uncertain or contains multiple languages, such as a mix of Chinese, English, Japanese, and Korean, do not specify this parameter.

Valid values

  • zh: Chinese (Mandarin, Sichuanese, Minnan, and Wu)

  • yue: Cantonese

  • en: English

  • ja: Japanese

  • de: German

  • ko: Korean

  • ru: Russian

  • fr: French

  • pt: Portuguese

  • ar: Arabic

  • it: Italian

  • es: Spanish

  • hi: Hindi

  • id: Indonesian

  • th: Thai

  • tr: Turkish

  • uk: Ukrainian

  • vi: Vietnamese

  • cs: Czech

  • da: Danish

  • fil: Filipino

  • fi: Finnish

  • is: Icelandic

  • ms: Malay

  • no: Norwegian

  • pl: Polish

  • sv: Swedish

enable_itn boolean (Optional) Defaults to: false

Specifies whether to enable Inverse Text Normalization (ITN). This feature is applicable only to Chinese and English audio.

Parameter values:

  • true

  • false

enable_words boolean (Optional) Defaults to: false

Controls whether to return word-level timestamps:

  • false: Returns sentence-level timestamps.

  • true: Returns word-level timestamps.

    Word-level timestamps are supported only for the following languages: Chinese, English, Japanese, Korean, German, French, Spanish, Italian, Portuguese, and Russian. Accuracy may not be guaranteed for other languages.

This parameter also affects the sentence segmentation rules:

  • false: Segments sentences based on voice activity detection (VAD).

  • true: Segments sentences based on VAD and punctuation.

text string

Specifies the context. Qwen3-ASR-Flash lets you provide background text, entity vocabularies, and other reference information (context) during speech recognition to obtain customized recognition results.

Length limit: 10,000 tokens.

For more information, see Context biasing.

channel_id array (Optional) Defaults to: [0]

Specifies the indexes of the audio tracks to be recognized in a multi-track audio file. The index starts from 0. For example, [0] indicates that the first track is recognized, and [0, 1] indicates that the first and second tracks are recognized simultaneously. If this parameter is omitted, the first track is processed by default.

Important

Each specified audio track is billed separately. For example, requesting [0, 1] for a single file incurs two separate charges.

Response body

{
    "request_id": "92e3decd-0c69-47a8-************",
    "output": {
        "task_id": "8fab76d0-0eed-4d20-************",
        "task_status": "PENDING"
    }
}

request_id string

The unique identifier for this call.

output object

The information about the call result.

Properties

task_id string

The task ID. This ID is passed as a request parameter in the API for querying speech recognition tasks.

task_status string

The task status:

  • PENDING

  • RUNNING

  • SUCCEEDED

  • FAILED

  • UNKNOWN: The task does not exist or its status is unknown.

Get the task execution result

URL

International

HTTP endpoint: GET https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}

base_url for SDK: https://dashscope-intl.aliyuncs.com/api/v1

Mainland China

base_url for SDK: https://dashscope.aliyuncs.com/api/v1

HTTP endpoint: GET https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}

Request body

cURL

# ======= Important =======
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}. Note that you must replace {task_id} with the ID of the task to be queried.
# The API keys for the Singapore and Beijing regions are different. For more information about how to obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
# === Delete this comment before running the command. ===

curl --location --request GET 'https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header "X-DashScope-Async: enable" \
--header "Content-Type: application/json"

Java

For SDK examples, see Getting Started.

import okhttp3.*;

import java.io.IOException;

public class Main {
    public static void main(String[] args) {
        // Replace with the actual task_id.
        String taskId = "xxx";
        // The API keys for the Singapore and Beijing regions are different. For more information about how to obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
        // If you have not configured the environment variable, replace the following line with your Model Studio API key: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}. Note that you must replace {task_id} with the ID of the task to be queried.
        String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/tasks/" + taskId;

        OkHttpClient client = new OkHttpClient();

        Request request = new Request.Builder()
                .url(apiUrl)
                .addHeader("Authorization", "Bearer " + apiKey)
                .addHeader("X-DashScope-Async", "enable")
                .addHeader("Content-Type", "application/json")
                .get()
                .build();

        try (Response response = client.newCall(request).execute()) {
            if (response.body() != null) {
                System.out.println(response.body().string());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Python

For SDK examples, see Getting Started.

import os
import requests


# The API keys for the Singapore and Beijing regions are different. For more information about how to obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
# If you have not configured the environment variable, replace the following line with your Model Studio API key: DASHSCOPE_API_KEY = "sk-xxx"
DASHSCOPE_API_KEY = os.getenv("DASHSCOPE_API_KEY")

# Replace with the actual task_id.
task_id = "xxx"
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}. Note that you must replace {task_id} with the ID of the task to be queried.
url = f"https://dashscope-intl.aliyuncs.com/api/v1/tasks/{task_id}"

headers = {
    "Authorization": f"Bearer {DASHSCOPE_API_KEY}",
    "X-DashScope-Async": "enable",
    "Content-Type": "application/json"
}

response = requests.get(url, headers=headers)
print(response.json())

task_id string (Required)

The ID of the task. Pass the task_id returned by the Submit a task operation to query the speech recognition result.

Response body

RUNNING

{
    "request_id": "6769df07-2768-4fb0-ad59-************",
    "output": {
        "task_id": "9be1700a-0f8e-4778-be74-************",
        "task_status": "RUNNING",
        "submit_time": "2025-10-27 14:19:31.150",
        "scheduled_time": "2025-10-27 14:19:31.233",
        "task_metrics": {
            "TOTAL": 1,
            "SUCCEEDED": 0,
            "FAILED": 0
        }
    }
}

SUCCEEDED

{
    "request_id": "1dca6c0a-0ed1-4662-aa39-************",
    "output": {
        "task_id": "8fab76d0-0eed-4d20-929f-************",
        "task_status": "SUCCEEDED",
        "submit_time": "2025-10-27 13:57:45.948",
        "scheduled_time": "2025-10-27 13:57:46.018",
        "end_time": "2025-10-27 13:57:47.079",
        "result": {
            "transcription_url": "http://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/pre/pre-funasr-mlt-v1/20251027/13%3A57/7a3a8236-ffd1-4099-a280-0299686ac7da.json?Expires=1761631066&OSSAccessKeyId=LTAI**************&Signature=1lKv4RgyWCarRuUdIiErOeOBnwM%3D&response-content-disposition=attachment%3Bfilename%3D7a3a8236-ffd1-4099-a280-0299686ac7da.json"
        }
    },
    "usage": {
        "seconds": 3
    }
}

FAILED

{
    "request_id": "3d141841-858a-466a-9ff9-************",
    "output": {
        "task_id": "c58c7951-7789-4557-9ea3-************",
        "task_status": "FAILED",
        "submit_time": "2025-10-27 15:06:06.915",
        "scheduled_time": "2025-10-27 15:06:06.967",
        "end_time": "2025-10-27 15:06:07.584",
        "code": "FILE_403_FORBIDDEN",
        "message": "FILE_403_FORBIDDEN"
    }
}

request_id string

The unique identifier for this call.

output object

The information about the call result.

Properties

task_id string

The task ID. This ID is passed as a request parameter in the API for querying speech recognition tasks.

task_status string

The task status:

  • PENDING

  • RUNNING

  • SUCCEEDED

  • FAILED

  • UNKNOWN: The task does not exist or its status is unknown.

result object

The speech recognition result.

Properties

transcription_url string

The download URL for the recognition result file. The link is valid for 24 hours. After expiration, you cannot query the task or download the result using the previous URL.
The recognition result is saved as a JSON file. You can download the file from this link or directly read the file content using an HTTP request.

For more information, see Asynchronous invocation results.

submit_time string

The time when the task was submitted.

schedule_time string

The time when the task was scheduled, which is the start time of execution.

end_time string

The time when the task ended.

task_metrics object

Task metrics, which include statistics on the status of subtasks.

Properties

TOTAL integer

The total number of subtasks.

SUCCEEDED integer

The number of successful subtasks.

FAILED integer

The number of failed subtasks.

code string

The error code. This is returned only when the task fails.

message string

The error message. This is returned only when the task fails.

usage object

The token consumption information for this request.

Properties

seconds integer

The duration of the audio for Qwen3-ASR-Flash in seconds.

Asynchronous call recognition result description

{
    "file_url": "https://***.wav",
    "audio_info": {
        "format": "wav",
        "sample_rate": 16000
    },
    "transcripts": [
        {
            "channel_id": 0,
            "text": "Senior staff, Principal Doris Jackson, Wakefield faculty, and of course my fellow classmates.I am honored to have been chosen to speak before my classmates along with the students across America today.",
            "sentences": [
                {
                    "sentence_id": 0,
                    "begin_time": 240,
                    "end_time": 6720,
                    "language": "en",
                    "emotion": "happy",
                    "text": "Senior staff, Principal Doris Jackson, Wakefield faculty, and of course my fellow classmates.",
                    "words": [
                        {
                            "begin_time": 240,
                            "end_time": 1120,
                            "text": "Senior ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 1120,
                            "end_time": 1200,
                            "text": "staff",
                            "punctuation": ","
                        },
                        {
                            "begin_time": 1680,
                            "end_time": 1920,
                            "text": " Principal ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 2000,
                            "end_time": 2320,
                            "text": "Doris ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 2320,
                            "end_time": 2960,
                            "text": "Jackson",
                            "punctuation": ","
                        },
                        {
                            "begin_time": 3360,
                            "end_time": 3840,
                            "text": " Wakefield ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 3840,
                            "end_time": 4480,
                            "text": "faculty",
                            "punctuation": ","
                        },
                        {
                            "begin_time": 4800,
                            "end_time": 4960,
                            "text": " and ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 4960,
                            "end_time": 5040,
                            "text": "of ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 5040,
                            "end_time": 5520,
                            "text": "course ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 5520,
                            "end_time": 5680,
                            "text": "my ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 5760,
                            "end_time": 6000,
                            "text": "fellow ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 6000,
                            "end_time": 6720,
                            "text": "classmates",
                            "punctuation": "."
                        }
                    ]
                },
                {
                    "sentence_id": 1,
                    "begin_time": 12268,
                    "end_time": 17388,
                    "language": "en",
                    "emotion": "neutral",
                    "text": "I am honored to have been chosen to speak before my classmates along with the students across America today.",
                    "words": [
                        {
                            "begin_time": 12268,
                            "end_time": 12428,
                            "text": "I ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 12428,
                            "end_time": 12508,
                            "text": "am ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 12588,
                            "end_time": 12828,
                            "text": "honored ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 12908,
                            "end_time": 12908,
                            "text": "to ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 12908,
                            "end_time": 13068,
                            "text": "have ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 13068,
                            "end_time": 13228,
                            "text": "been ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 13228,
                            "end_time": 13628,
                            "text": "chosen ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 13628,
                            "end_time": 13708,
                            "text": "to ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 13708,
                            "end_time": 14028,
                            "text": "speak ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 14028,
                            "end_time": 14268,
                            "text": "before ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 14268,
                            "end_time": 14428,
                            "text": "my ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 14428,
                            "end_time": 15148,
                            "text": "classmates ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 15308,
                            "end_time": 15468,
                            "text": "as ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 15468,
                            "end_time": 15628,
                            "text": "well ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 15628,
                            "end_time": 15788,
                            "text": "as ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 15788,
                            "end_time": 15788,
                            "text": "the ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 15788,
                            "end_time": 16188,
                            "text": "students ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 16188,
                            "end_time": 16588,
                            "text": "across ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 16588,
                            "end_time": 16988,
                            "text": "America ",
                            "punctuation": ""
                        },
                        {
                            "begin_time": 16988,
                            "end_time": 17388,
                            "text": "today",
                            "punctuation": "."
                        }
                    ]
                }
            ]
        }
    ]
}

file_url string

The URL of the recognized audio file.

audio_info object

Information about the recognized audio file.

Properties

format string

The audio format.

sample_rate integer

The audio sampling rate.

transcripts array

A list of complete recognition results. Each element corresponds to the recognized content of an audio track.

Properties

channel_id integer

The audio track index, starting from 0.

text string

The recognized text.

sentences object

A list of sentence-level recognition results.

Properties

begin_timeinteger

The start timestamp of the sentence in milliseconds.

end_timeinteger

The end timestamp of the sentence in milliseconds.

text string

The recognized text.

sentence_id integer

The sentence index, starting from 0.

language string

The language of the recognized audio. If the language request parameter is specified, this value is the same as the value of that parameter.

Valid values

  • zh: Chinese (Mandarin, Sichuanese, Minnan, and Wu)

  • yue: Cantonese

  • en: English

  • ja: Japanese

  • de: German

  • ko: Korean

  • ru: Russian

  • fr: French

  • pt: Portuguese

  • ar: Arabic

  • it: Italian

  • es: Spanish

  • hi: Hindi

  • id: Indonesian

  • th: Thai

  • tr: Turkish

  • uk: Ukrainian

  • vi: Vietnamese

  • cs: Czech

  • da: Danish

  • fil: Filipino

  • fi: Finnish

  • is: Icelandic

  • ms: Malay

  • no: Norwegian

  • pl: Polish

  • sv: Swedish

emotion string

The emotion of the recognized audio. The following emotions are supported:

  • surprised: surprised

  • neutral: neutral

  • happy: happy

  • sad: Sad

  • disgusted: Disgust

  • angry: angry

  • fearful: Fear

words object

A list of word-level recognition results. This parameter is returned only if the enable_words request parameter is set to true.

Properties

begin_timeinteger

The start timestamp in milliseconds.

end_timeinteger

The end timestamp in milliseconds.

text string

The recognized text.

punctuation string

The punctuation mark.