All Products
Search
Document Center

Alibaba Cloud Model Studio:Paraformer audio file recognition RESTful API

Last Updated:Nov 14, 2025

This topic describes the parameters and API details for the Paraformer audio file recognition RESTful API.

Important

This document applies only to the China (Beijing) region. To use the model, you must use an API key from the China (Beijing) region.

User guide: For an overview of the models and guidance on model selection, see Audio file recognition.

The service provides a task submission interface and a task query interface. Typically, you call the task submission interface to upload a recognition task and then repeatedly call the task query interface until the task is complete.

Prerequisites

You have activated the Model Studio and created an API key. To prevent security risks, export the API key as an environment variable instead of hard-coding it in your code.

Note

To grant temporary access permissions to third-party applications or users, or if you want to strictly control high-risk operations such as accessing or deleting sensitive data, we recommend that you use a temporary authentication token.

Compared with long-term API keys, temporary authentication tokens are more secure because they are short-lived (60 seconds). They are suitable for temporary call scenarios and can effectively reduce the risk of API key leakage.

To use a temporary token, replace the API key used for authentication in your code with the temporary authentication token.

Model availability

paraformer-v2

paraformer-8k-v2

Scenarios

Multilingual recognition for scenarios such as live streaming and meetings

Chinese recognition for scenarios such as telephone customer service and voicemail

Sample rate

Any

8 kHz

Languages

Chinese (including Mandarin and various dialects), English, Japanese, Korean, German, French, and Russian

Supported Chinese dialects: Shanghai dialect, Wu dialect, Min Nan dialect, Northeastern dialect, Gansu dialect, Guizhou dialect, Henan dialect, Hubei dialect, Hunan dialect, Jiangxi dialect, Ningxia dialect, Shanxi dialect, Shaanxi dialect, Shandong dialect, Sichuan dialect, Tianjin dialect, Yunnan dialect, and Cantonese

Chinese

Punctuation prediction

✅ Supported by default, no configuration required

✅ Supported by default, no configuration required

Inverse text normalization (ITN)

✅ Supported by default, no configuration required

✅ Supported by default, no configuration required

Custom hotwords

✅ For more information, see Custom vocabularies

✅ For more information, see Custom vocabularies

Specify language for recognition

✅ Specified by the language_hints parameter

Limitations

The service does not support direct uploads of local audio or video files. It also does not support base64-encoded audio. The input source must be a file URL that is accessible over the Internet and supports the HTTP or HTTPS protocol, for example, https://your-domain.com/file.mp3.

You can specify the URL using the file_urls parameter. A single request supports up to 100 URLs.

  • Audio formats

    aac, amr, avi, flac, flv, m4a, mkv, mov, mp3, mp4, mpeg, ogg, opus, wav, webm, wma, and wmv

    Important

    The API cannot guarantee correct recognition for all audio and video formats and their variants because it is not feasible to test every possibility. We recommend testing your files to confirm that they produce the expected speech recognition results.

  • Audio sampling rate

    The sample rate varies by model:

    • paraformer-v2 supports any sample rate

    • paraformer-8k-v2 only supports an 8 kHz sample rate

  • Audio file size and duration

    The audio file cannot exceed 2 GB in size and 12 hours in duration.

    To process files that exceed these limits, you can pre-process them to reduce their size. For more information about pre-processing best practices, see Preprocess video files to improve file transcription efficiency (for audio file recognition scenarios).

  • Number of audio files for batch processing

    A single request supports up to 100 file URLs.

  • Recognizable languages

    Varies by model:

    • paraformer-v2:

      • Chinese, including Mandarin and various dialects: Shanghai dialect, Wu dialect, Min Nan dialect, Northeastern dialect, Gansu dialect, Guizhou dialect, Henan dialect, Hubei dialect, Hunan dialect, Jiangxi dialect, Ningxia dialect, Shanxi dialect, Shaanxi dialect, Shandong dialect, Sichuan dialect, Tianjin dialect, Yunnan dialect, and Cantonese

      • English

      • Japanese

      • Korean

    • paraformer-8k-v2 only supports Chinese

  • API call method limitations

    Direct API calls from the frontend are not supported. You must route calls through a backend server.

Task submission interface

Basic information

API endpoint description

Submits a speech recognition task.

URL

https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription

Request method

POST

Request headers

Authorization: Bearer {api-key} // Replace {api-key} with your API key.
Content-Type: application/json
X-DashScope-Async: enable // Do not omit this request header. Otherwise, the task cannot be submitted.

Message body

The following code shows a message body that contains all request parameters. You can omit optional fields as needed.

{
    "model":"paraformer-v2", // The model name. This parameter is required.
    "input":{
        "file_urls":[
            "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
            "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav"
        ] // The file to be recognized. This parameter is required.
    },
    "resources": [ // This field is supported only by v1 series models. Do not use this field for v2 and later series models.
        {
            "resource_id": "xxxxxxxxxxxx", // The ID of the hotword in an earlier version. This parameter is optional.
            "resource_type": "asr_phrase"  // Must be set to "asr_phrase". This parameter is used with resource_id.
        }
    ],
    "parameters":{ // This field is supported only by v2 and later series models. Do not use this field for v1 series models.
        "vocabulary_id":"vocab-Xxxx", // The ID of the custom vocabulary. This parameter is optional.
        "channel_id":[
            0
        ], // The audio track index. This parameter is optional.
        "disfluency_removal_enabled":false, // The switch for filtering filler words. This parameter is optional.
        "timestamp_alignment_enabled": false, // Specifies whether to enable the timestamp calibration feature. This parameter is optional.
        "special_word_filter": "xxx", // The sensitive words. This parameter is optional.
        "language_hints":[ // This parameter is applicable only to the paraformer-v2 model. Do not use this field for other models.
            "zh",
            "en"
        ],
        "diarization_enabled":false, // Specifies whether to enable automatic speaker diarization. This parameter is optional.
        "speaker_count": 2 // The reference number of speakers. This parameter is optional.
    }
}

Request parameters

Click to view a request example

The following code provides a cURL example for calling the task submission interface:

curl --location 'https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription' \
     --header "Authorization: Bearer $DASHSCOPE_API_KEY" \
     --header "Content-Type: application/json" \
     --header "X-DashScope-Async: enable" \
     --data '{"model":"paraformer-v2","input":{"file_urls":["https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
              "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav"]},"parameters":{"channel_id":[0]}}'

Parameter

Type

Default value

Required

Description

model

string

-

Yes

The name of the Paraformer model that is used for audio and video file transcription. For more information, see Model availability.

file_urls

array[string]

-

Yes

The list of URLs for audio and video file transcription. HTTP and HTTPS protocols are supported. A maximum of 100 URLs are supported in a single request.

If your audio files are stored in Alibaba Cloud OSS, the RESTful API supports temporary URLs that start with the oss:// prefix.

Important
  • The temporary URL is valid for 48 hours and cannot be used after it expires. Do not use it in a production environment.

  • The API for obtaining an upload credential is limited to 100 QPS and does not support scaling out. Do not use it in production environments, high-concurrency scenarios, or stress testing scenarios.

  • For production environments, use a stable storage service such as Alibaba Cloud OSS to ensure long-term file availability and avoid rate limiting issues.

vocabulary_id

string

-

No

The ID of the custom vocabulary. The latest v2 series models support this parameter and language configurations. The hotwords corresponding to this hotword ID take effect in the current speech recognition. This feature is disabled by default. For more information about how to use this feature, see Custom vocabularies.

resource_type

string

-

No

Must be set to "asr_phrase". This parameter must be used together with "resource_id".

channel_id

array[integer]

[0]

No

Specifies the indices of the audio tracks in a multi-track file that require speech recognition. The indices are provided in a list. For example, [0] indicates that only the first audio track is recognized, and [0, 1] indicates that the first two audio tracks are recognized at the same time.

disfluency_removal_enabled

boolean

false

No

Specifies whether to filter filler words. This feature is disabled by default.

timestamp_alignment_enabled

boolean

false

No

Specifies whether to enable the timestamp alignment feature. This feature is disabled by default.

special_word_filter

string

-

No

Specifies the sensitive words to be processed during speech recognition and supports different processing methods for different sensitive words.

If you do not pass this parameter, the system enables its built-in sensitive word filtering logic. Any words in the detection results that match the Alibaba Cloud Model Studio sensitive word list (Chinese) are replaced with an equal number of * characters.

If this parameter is passed, the following sensitive word processing strategies can be implemented:

  • Replace with *: Replaces the matched sensitive words with an equal number of asterisks (*).

  • Direct filtering: Completely removes matching sensitive words from the recognition results.

The value of this parameter must be a JSON string with the following structure:

{
  "filter_with_signed": {
    "word_list": ["test"]
  },
  "filter_with_empty": {
    "word_list": ["start", "happen"]
  },
  "system_reserved_filter": true
}

JSON field description:

  • filter_with_signed

    • Type: object.

    • Required: No.

    • Description: Configures the list of sensitive words to be replaced with *. Matched words in the recognition results are replaced with an equal number of asterisks (*).

    • Example: Based on the preceding JSON, the speech recognition result for "Help me test this code" will be "Help me ** this code".

    • Internal field:

      • word_list: A string array that lists the sensitive words to be replaced.

  • filter_with_empty

    • Type: object.

    • Required: No.

    • Description: Configures the list of sensitive words to be removed (filtered) from the recognition results. Matched words in the recognition results are completely deleted.

    • Example: Based on the preceding JSON, the speech recognition result for "Is the match about to start now?" will be "Is the match about to now?".

    • Internal field:

      • word_list: A string array that lists the sensitive words to be completely removed (filtered).

  • system_reserved_filter

    • Type: Boolean value.

    • Required: No.

    • Default value: true.

    • Description: Specifies whether to enable the system-predefined sensitive word rule. If this parameter is set to true, the system's built-in sensitive word filtering logic is also enabled, and words in the detection results that match the Alibaba Cloud Model Studio sensitive word list (Chinese) are replaced with an equal-length string of * characters.

language_hints

array[string]

["zh", "en"]

No

Specifies the language codes of the speech to be recognized.

This parameter is applicable only to the paraformer-v2 model.

Supported language codes:

  • zh: Chinese

  • en: English

  • ja: Japanese

  • yue: Cantonese

  • ko: Korean

  • de: German

  • fr: French

  • ru: Russian

diarization_enabled

boolean

false

No

Automatic speaker diarization. This feature is disabled by default.

This feature is applicable only to mono audio. Multi-channel audio does not support speaker diarization.

When this feature is enabled, the recognition results will display a speaker_id field to distinguish different speakers.

For an example of speaker_id, see Recognition result description.

speaker_count

integer

-

No

The reference value for the number of speakers. The value must be an integer from 2 to 100, including 2 and 100.

This parameter takes effect after speaker diarization is enabled (diarization_enabled is set to true).

By default, the number of speakers is automatically determined. If you configure this parameter, it can only assist the algorithm in trying to output the specified number of speakers, but cannot guarantee that this number will definitely be output.

Response parameters

Click to view a response example

{
  "output": {
    "task_status": "PENDING",
    "task_id": "c2e5d63b-96e1-4607-bb91-************"
  },
  "request_id": "77ae55ae-be17-97b8-9942--************"
}

Parameter

Type

Description

task_status

string

The task status.

task_id

string

The task ID. This ID is passed as a request parameter in the task query interface.

Task query interface

Basic information

API endpoint description

Queries the status and result of a speech recognition task.

URL

https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}

Request method

POST

Request headers

Authorization: Bearer {api-key} // Replace {api-key} with your API key.

Message body

None.

Request parameters

Click to view a request example

curl --location 'https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}' --header "Authorization: Bearer $DASHSCOPE_API_KEY"

Parameter

Type

Default value

Required

Description

task_id

string

-

Yes

To query a task, you must specify its ID. This ID is returned by the task submission interface.

Response parameters

Click to view a response example

If a task contains multiple subtasks, the status of the entire task is marked as SUCCEEDED if any subtask succeeds. You must check the subtask_status field to determine the result of each subtask.

Normal example

{
  "request_id": "f9e1afad-94d3-997e-a83b-************",
  "output": {
    "task_id": "f86ec806-4d73-485f-a24f-************",
    "task_status": "SUCCEEDED",
    "submit_time": "2024-09-12 15:11:40.041",
    "scheduled_time": "2024-09-12 15:11:40.071",
    "end_time": "2024-09-12 15:11:40.903",
    "results": [
      {
        "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav",
        "transcription_url": "https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/pre/filetrans-16k/20240912/15%3A11/3bdf7689-b598-409d-806a-121cff5e4a31-1.json?Expires=1726211500&OSSAccessKeyId=yourOSSAccessKeyId&Signature=Fj%2BaF%2FH0Kayj3w3My2ECBeP****%3D",
        "subtask_status": "SUCCEEDED"
      },
      {
        "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
        "transcription_url": "https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/pre/filetrans-16k/20240912/15%3A11/409a4b92-445b-4dd8-8c1d-f110954d82d8-1.json?Expires=1726211500&OSSAccessKeyId=yourOSSAccessKeyId&Signature=v5Owy5qoAfT7mzGmQgH0g8C****%3D",
        "subtask_status": "SUCCEEDED"
      }
    ],
    "task_metrics": {
      "TOTAL": 2,
      "SUCCEEDED": 2,
      "FAILED": 0
    }
  },
  "usage": {
    "duration": 9
  }
}

Exception example

The code parameter indicates the error code, and the message parameter indicates the error message. These two fields appear only in exception cases. You can use these fields to troubleshoot problems by referring to the error codes.

{
    "task_id": "7bac899c-06ec-4a79-8875-xxxxxxxxxxxx",
    "task_status": "SUCCEEDED",
    "submit_time": "2024-12-16 16:30:59.170",
    "scheduled_time": "2024-12-16 16:30:59.204",
    "end_time": "2024-12-16 16:31:02.375",
    "results": [
        {
            "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/long_audio_demo_cn.mp3",
            "transcription_url": "https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/prod/paraformer-v2/20241216/xxxx",
            "subtask_status": "SUCCEEDED"
        },
        {
            "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_exaple_1.wav",
            "code": "InvalidFile.DownloadFailed",
            "message": "The audio file cannot be downloaded.",
            "subtask_status": "FAILED"
        }
    ],
    "task_metrics": {
        "TOTAL": 2,
        "SUCCEEDED": 1,
        "FAILED": 1
    }
}

Parameter

Type

Description

task_id

string

The ID of the queried task.

task_status

string

The status of the queried task.

Note

If a task contains multiple subtasks, the status of the entire task is marked as SUCCEEDED as long as any subtask succeeds. You need to check the subtask_status field to determine the results of specific subtasks.

subtask_status

string

The subtask status.

file_url

string

The URL of the file that is processed in the file transcription task.

transcription_url

string

The link to obtain the recognition result. This link is valid for 24 hours. After expiration, you cannot query the task or download results using a previously obtained URL.

The recognition result is saved as a JSON file. You can download the file from the preceding link or directly read the content of the file using an HTTP request.

For more information about the fields in the JSON data, see Recognition result description.

Recognition result description

The recognition result is saved as a JSON file.

Click to view recognition result example

{
    "file_url":"https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
    "properties":{
        "audio_format":"pcm_s16le",
        "channels":[
            0
        ],
        "original_sampling_rate":16000,
        "original_duration_in_milliseconds":3834
    },
    "transcripts":[
        {
            "channel_id":0,
            "content_duration_in_milliseconds":3720,
            "text":"Hello world, this is Alibaba Speech Lab.",
            "sentences":[
                {
                    "begin_time":100,
                    "end_time":3820,
                    "text":"Hello world, this is Alibaba Speech Lab.",
                    "sentence_id":1,
                    "speaker_id":0, //This field is only displayed when automatic speaker diarization is enabled.
                    "words":[
                        {
                            "begin_time":100,
                            "end_time":596,
                            "text":"Hello ",
                            "punctuation":""
                        },
                        {
                            "begin_time":596,
                            "end_time":844,
                            "text":"world",
                            "punctuation":", "
                        }
                        // Other content is omitted here.
                    ]
                }
            ]
        }
    ]
}

The following table describes the key parameters:

Parameter

Type

Description

audio_format

string

The audio format in the source file.

channels

array[integer]

The audio track index information in the source file. Returns [0] for single-track audio, [0, 1] for dual-track audio, and so on.

original_sampling_rate

integer

The sample rate (Hz) of the audio in the source file.

original_duration

integer

The original audio duration (ms) in the source file.

channel_id

integer

The audio track index of the transcription result, starting from 0.

content_duration

integer

The duration (ms) of content determined to be speech in the audio track.

Important

The Paraformer speech recognition model service only transcribes and charges for the duration of content determined to be speech in the audio track. Non-speech content is not measured or charged. Typically, the speech content duration is shorter than the original audio duration. Because an AI model determines whether speech content exists, discrepancies may occur.

transcript

string

The paragraph-level speech transcription result.

sentences

array

The sentence-level speech transcription result.

words

array

The word-level speech transcription result.

begin_time

integer

The start timestamp (ms).

end_time

integer

The end timestamp (ms).

text

string

The speech transcription result.

speaker_id

integer

The index of the current speaker, starting from 0, used to distinguish different speakers.

This field is displayed in the recognition result only when speaker diarization is enabled.

punctuation

string

The predicted punctuation after the word, if any.

Complete example

You can use the HTTP libraries that are built into programming languages to implement task submission and query requests. First, call the task submission interface to upload a recognition task, and then repeatedly call the task query interface until the task is complete.

The following code provides an example in Python:

import requests
import json
import time

api_key = "your-dashscope-api-key"  # Replace this with your API key.
file_urls = [
    "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_female2.wav",
    "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/paraformer/hello_world_male2.wav",
]
language_hints = ["zh", "en"]


# Submit a file transcription task, including a list of URLs of the files to be transcribed.
def submit_task(apikey, file_urls) -> str:

    headers = {
        "Authorization": f"Bearer {apikey}",
        "Content-Type": "application/json",
        "X-DashScope-Async": "enable",
    }
    data = {
        "model": "paraformer-v2",
        "input": {"file_urls": file_urls},
        "parameters": {
            "channel_id": [0],
            "language_hints": language_hints,
            "vocabulary_id": "vocab-Xxxx",
        },
    }
    # The URL of the recorded file transcription service.
    service_url = (
        "https://dashscope.aliyuncs.com/api/v1/services/audio/asr/transcription"
    )
    response = requests.post(
        service_url, headers=headers, data=json.dumps(data)
    )

    # Print the response content.
    if response.status_code == 200:
        return response.json()["output"]["task_id"]
    else:
        print("task failed!")
        print(response.json())
        return None


# Recursively query the task status until the task is successful.
def wait_for_complete(task_id):
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json",
        "X-DashScope-Async": "enable",
    }

    pending = True
    while pending:
        # The URL of the task status query service.
        service_url = f"https://dashscope.aliyuncs.com/api/v1/tasks/{task_id}"
        response = requests.post(
            service_url, headers=headers
        )
        if response.status_code == 200:
            status = response.json()['output']['task_status']
            if status == 'SUCCEEDED':
                print("task succeeded!")
                pending = False
                return response.json()['output']['results']
            elif status == 'RUNNING' or status == 'PENDING':
                pass
            else:
                print("task failed!")
                pending = False
        else:
            print("query failed!")
            pending = False
        print(response.json())
        time.sleep(0.1)


task_id = submit_task(apikey=api_key, file_urls=file_urls)
print("task_id: ", task_id)
result = wait_for_complete(task_id)
print("transcription result: ", result)

Error codes

If you encounter an error, see Error messages for troubleshooting.

If the problem persists, join the developer group to report the issue and provide the Request ID for further investigation.

If a task contains multiple subtasks and any subtask succeeds, the overall task status is marked as SUCCEEDED. You must check the subtask_status field to determine the result of each subtask.

Error response example:

{
    "task_id": "7bac899c-06ec-4a79-8875-xxxxxxxxxxxx",
    "task_status": "SUCCEEDED",
    "submit_time": "2024-12-16 16:30:59.170",
    "scheduled_time": "2024-12-16 16:30:59.204",
    "end_time": "2024-12-16 16:31:02.375",
    "results": [
        {
            "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/long_audio_demo_cn.mp3",
            "transcription_url": "https://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/prod/paraformer-v2/20241216/xxxx",
            "subtask_status": "SUCCEEDED"
        },
        {
            "file_url": "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/sensevoice/rich_text_exaple_1.wav",
            "code": "InvalidFile.DownloadFailed",
            "message": "The audio file cannot be downloaded.",
            "subtask_status": "FAILED"
        }
    ],
    "task_metrics": {
        "TOTAL": 2,
        "SUCCEEDED": 1,
        "FAILED": 1
    }
}

More examples

For more examples, see our GitHub repository.

FAQ

Features

Q: Is Base64 encoded audio supported?

No, it is not. The service only supports recognition of audio from URLs that are accessible over the internet. It does not support binary streams or local files.

Q: How can I provide audio files as publicly accessible URLs?

Follow these general steps. The specific process may vary depending on the storage product you use. We recommend uploading audio to OSS.

1. Choose a storage and hosting method

You can use methods such as the following:

  • Object Storage Service (OSS) (recommended):

    • Use an Object Storage Service such as Alibaba Cloud OSS, upload audio files to a bucket, and set them for public access.

    • Advantages: High availability, supports content delivery network (CDN) acceleration, easy to manage.

  • Web server:

    • Place audio files on a web server that supports HTTP/HTTPS access, such as Nginx or Apache.

    • Advantages: Suitable for small projects or local testing.

  • Content delivery network (CDN):

    • Host audio files on a CDN and access them through URLs provided by the CDN.

    • Advantages: Accelerates file transfer, suitable for high concurrency scenarios.

2. Upload audio files

Upload the audio files based on your chosen storage method. For example:

  • Object Storage Service:

    • Log in to the cloud service provider's console and create a bucket.

    • Upload audio files and set file permissions to "public-read" or generate temporary access links.

  • Web server:

    • Place audio files in a specified directory on the server, such as /var/www/html/audio/.

    • Ensure files can be accessed via HTTP/HTTPS.

3. Generate publicly accessible URLs

For example:

  • Object Storage Service:

    • After file upload, the system automatically generates a public access URL, typically in the format https://<bucket-name>.<region>.aliyuncs.com/<file-name>.

    • If you need a more friendly domain name, you can bind a custom domain name and enable HTTPS.

  • Web server:

    • The file access URL is typically the server address plus the file path, such as https://your-domain.com/audio/file.mp3.

  • CDN:

    • After you configure CDN acceleration, use the URL provided by the CDN, such as https://cdn.your-domain.com/audio/file.mp3.

4. Verify URL availability

Verify that the generated URL is publicly accessible. For example:

  • In a browser, open the URL and check if the audio file can be played.

  • Use a tool, such as curl or Postman, to verify if the URL returns the correct HTTP response (status code 200).

Q: How long does it take to obtain the recognition results?

After a task is submitted, it enters the PENDING state. The queuing time depends on the queue length and file duration and cannot be precisely determined, but it is typically within a few minutes. Longer audio files require more processing time.

Troubleshooting

If you encounter an error, refer to the information in Error codes.

Q: What should I do if the recognition results are not synchronized with the audio playback?

Set the timestamp_alignment_enabled request parameter to true to enable the timestamp alignment feature. This feature synchronizes the recognition results with the audio playback.

Q: What do I do if the temporary public access URL of an OSS audio file is inaccessible?

In the headers, set X-DashScope-OssResourceResolve to enable.

This method is not recommended.

The Java SDK and the Python SDK do not support configuring headers.

Q: Why can't I obtain a result after continuous polling?

This may be due to rate limiting. To request a quota increase, join the developer group.

Q: Why is the speech not recognized (no recognition result)?

  • Check whether the audio meets the format and sample rate requirements.

  • If you are using the paraformer-v2 model, check whether the language_hints parameter is set correctly.

  • If the previous checks do not resolve the issue, you can use custom hotwords to improve the recognition of specific words.

More questions

For more questions, see the FAQ on GitHub.