All Products
Search
Document Center

API reference

Last Updated: Jun 02, 2020

Features

You can use the recording file recognition service to recognize recording files. However, the service does not recognize recording files in real time. In addition, to recognize a recording file, you need to submit a reachable HTTP or HTTPS URL of the file, but not the local file. The service has the following features:

  • Recognizes single-track and dual-track recording files in WAV and MP3 formats.
  • Supports two API call methods: polling and callback.
  • Supports custom models and hotwords.
  • Supports the audio sampling rates of 8 kHz and 16 kHz.
  • Recognizes multiple languages, such as Chinese Mandarin, Chinese dialects, and English.

Limits

  • The access permission of a recording file to be recognized must be public. Its URL can contain the domain name, but not the IP address.
  • The maximum file size is 512 MB.
  • After you send a recording file recognition request, the recording file recognition server recognizes the file within 24 hours and retains the recognition result for 72 hours.
  • You can use the trial edition to recognize recording files with a maximum duration of 2 hours per day.

Procedure

  1. Check the format and audio sampling rate of your recording file. Select a project that has an appropriate scenario and model in the console based on your business scenario.
  2. Store the recording file in Object Storage Service (OSS). If the access permission of the file is public, directly obtain the OSS URL of the file. If the access permission of the file is private, use the SDK to generate an OSS URL that has a validity period. You can also build a file server and store the recording file on the file server. To download the file from the file server, ensure that the length indicated by the Content-Length field in the HTTP response header is the same as the actual length of data in the response body. Otherwise, you may fail to download the file.
  3. The client sends a recording file recognition request. If the request is successful, the recording file recognition server returns the task ID, which can be used to query the recognition result.
  4. The client uses the task ID that is obtained in step 3 to query the recognition result. Currently, the recognition result is retained on the recording file recognition server for 72 hours.

The following figure shows the interaction process between the client and the server.

Interaction process

Note: The server adds the TaskId field to the response header for all responses to indicate the ID of the recognition task. You need to record the value of this field. If an error occurs, you can open a ticket to submit the task ID and error message.

API call methods

The recording file recognition service provides the pctowap open platform (POP) API that can be called in a remote procedure call (RPC) style. To call an API operation, the client encapsulates parameters in a request and uses an HTTP method to send the request. The recording file recognition server returns the result in a response. You must store recording files to be recognized on a server and ensure that each file can be accessed through a URL. We recommend that you store recording files in Alibaba Cloud OSS.

The recording file recognition POP API supports two operations: uses the POST method to send a recording file recognition request and uses the GET method to query the recording file recognition result.

Recording file recognition request

  • If you use the polling method, you can send a recording file recognition request and obtain the task ID for subsequent recognition result polling.
  • If you use the callback method, you can send a recording file recognition request and the callback URL. If the request is successful, the server uses the POST method to send the recognition result to the callback URL. Ensure that the callback URL is able to receive the POST request.

Note: In earlier versions of the recording file recognition service (2.0 by default), the recognition result obtained by the callback method differs from that obtained by the polling method. The differences lie in both the style and fields of the JSON-formatted string. This topic describes the differences later. In 4.0, the recording file recognition service has updated the recognition result obtained by the callback method to a camelCase JSON-formatted string. This produces the same recognition result as that obtained by the polling method. If you have activated the recording file recognition service without setting the version to 4.0, its version is 2.0 by default and you can continue to use this version. If you are a new user, you need to set the version of the recording file recognition service to 4.0.

Request parameters

When sending a recording file recognition request, you need to set request parameters and add the request parameters to the request body as a JSON-formatted string. The following example shows request parameters in JSON format:

  1. {
  2. "appkey": "your-appkey",
  3. "file_link": "https://aliyun-nls.oss-cn-hangzhou.aliyuncs.com/asr/fileASR/examples/nls-sample-16k.wav",
  4. "auto_split":false,
  5. "version": "4.0",
  6. "enable_words": false,
  7. // The valid_times parameter specifies the valid time period that truly requires speech recognition in the total length of an audio track. This parameter is optional.
  8. "valid_times": [
  9. {
  10. "begin_time": 200,
  11. "end_time":2000,
  12. "channel_id": 0
  13. }
  14. ]
  15. }

Request parameter description

Parameter Type Required Description
appkey String Yes The appkey that uniquely identifies a project created in the console.
file_link String Yes The URL of the recording file. Ensure that the scenario and model of the project created in the console are suitable for the recording file.
version String Yes The version of the recording file recognition service. Default value: 2.0. Set this parameter to 4.0.
enable_words Boolean No Specifies whether to return the recognition results of words. Default value: false. This parameter takes effect only when the version of the recording file recognition service is 4.0.
enable_callback Boolean No Specifies whether to enable the callback method. Default value: false.
callback_url String No The callback URL. You must specify this parameter if you set the enable_callback parameter to true. The callback URL can be an HTTP or HTTPS URL. It can contain the domain name, but not the IP address.
auto_split Boolean No Specifies whether to enable automatic track splitting. If you enable automatic track splitting, the recording file recognition server can identify the speaker of each sentence in a conversation between two parties based on the ChannelId parameter in the recognition result of the sentence. The value of the ChannelId parameter is 1 for the first speaker in the conversation. Only mono files sampled at 8 kHz are supported.
valid_times List< ValidTime > No The valid time period that truly requires speech recognition in the total length of an audio track.
max_end_silence Integer No Maximum allowed end silence, default value is 450, unit is millisecond
max_single_segment_time Integer No The maximum silence time allowed for a single sentence. The default value is 20000. The unit is millisecond.
customization_id String No Custom model id created by pop api, empty by default
class_vocabulary_id String No ID of the created Categorized hotwords list, empty by default
vocabulary_id String No ID of the Extended hotwords created. It is not added by default.

The following table describes the parameters in the ValidTime object.

Parameter Type Required Description
begin_time Integer Yes The start time offset of the valid time period. Unit: milliseconds.
end_time Integer Yes The end time offset of the valid time period. Unit: milliseconds.
channel_id Integer Yes The sequence number of the audio track to which the setting of the valid time period applies. The value starts from 0.

Response parameters

The server returns a response to the recording file recognition request. The response includes response parameters in a JSON-formatted string, as shown in the following example:

  1. {
  2. "TaskId": "4b56f0c4b7e611e88f34c33c2a60497b",
  3. "RequestId": "E4B183CC-6CFE-411E-A547-D877F7BD6C44",
  4. "StatusText": "SUCCESS",
  5. "StatusCode": 21050000
  6. }

Response parameter description

  • HTTP status code 200 indicates that the request is successful. For more information, see HTTP status codes.
  • The following table describes the response parameters.
Parameter Type Required Description
TaskId String Yes The ID of the recognition task.
RequestId String Yes The ID of the request, which is used for debugging.
StatusCode Integer Yes The status code.
StatusText String Yes The status message.

Query request for the recording file recognition result

After sending a recording file recognition request, you can poll the recognition result based on the task ID.

Request parameters

After the server returns a response to the recording file recognition request, you can use the task ID in the response as a parameter to query the recognition result. When calling the API operation, you need to set the polling interval.

Parameter Type Required Description
TaskId String Yes The ID of the recognition task.

Response parameters

The server returns a response to the query request for the recording file recognition result. The response includes response parameters in a JSON-formatted string.

Sample success responseThe following response shows the recognition result of the single-track recording file nls-sample-16k.wav:

  1. {
  2. "TaskId": "d429dd7dd75711e89305ab6170fe6cd1",
  3. "RequestId": "9240D669-6485-4DCC-896A-F8B31F946CF9",
  4. "StatusText": "SUCCESS",
  5. "BizDuration": 2956,
  6. "SolveTime": 1540363288472,
  7. "StatusCode": 21050000,
  8. "Result": {
  9. "Sentences": [{
  10. "EndTime": 2365,
  11. "SilenceDuration": 0,
  12. "BeginTime": 340,
  13. "Text": "Weather in Beijing",
  14. "ChannelId": 0,
  15. "SpeechRate": 177,
  16. "EmotionValue": 5.0
  17. }]
  18. }
  19. }

If you set the enable_callback parameter to true, specify the callback_url parameter, and set the version parameter to 4.0, the following response shows the recognition result obtained by the callback method:

  1. {
  2. "Result": {
  3. "Sentences": [{
  4. "EndTime": 2365,
  5. "SilenceDuration": 0,
  6. "BeginTime": 340,
  7. "Text": "Weather in Beijing",
  8. "ChannelId": 0,
  9. "SpeechRate": 177,
  10. "EmotionValue": 5.0
  11. }]
  12. },
  13. "TaskId": "36d01b244ad811e9952db7bb7ed2b0dd",
  14. "StatusCode": 21050000,
  15. "StatusText": "SUCCESS",
  16. "RequestTime": 1553062810452,
  17. "SolveTime": 1553062810831,
  18. "BizDuration": 2956
  19. }

Note:

  • The value of the RequestTime parameter is a timestamp that indicates the time when the recording file recognition request was sent, in milliseconds. For example, a value of 1553062810452 indicates 14:20:10 on March 20, 2019, UTC+8.
  • The value of the SolveTime parameter is a timestamp that indicates the time when the recording file recognition task was completed, in milliseconds.

The following response shows that the task is in the queuing state:

  1. {
  2. "TaskId": "c7274235b7e611e88f34c33c2a60497b",
  3. "RequestId": "981AD922-0655-46B0-8C6A-5C836822F773",
  4. "StatusText": "QUEUEING",
  5. "StatusCode": 21050002
  6. }

The following response shows that the task is in the running state:

  1. {
  2. "TaskId": "c7274235b7e611e88f34c33c2a60497b",
  3. "RequestId": "8E908ED2-867F-457E-82BF-4756194A6C78",
  4. "StatusText": "RUNNING",
  5. "BizDuration": 0,
  6. "StatusCode": 21050001
  7. }

Sample error response

The following response shows that the file fails to be downloaded:

  1. {
  2. "TaskId": "4cf25b7eb7e711e88f34c33c2a60497b",
  3. "RequestId": "098BF27C-4CBA-45FF-BD11-3F532F261733",
  4. "StatusText": "FILE_DOWNLOAD_FAILED",
  5. "BizDuration": 0,
  6. "SolveTime": 1536906469146,
  7. "StatusCode": 41050002
  8. }

For more information, see the error codes and solutions in the Service status codes section of this topic.

Response parameter description

  • HTTP status code 200 indicates that the request is successful. For more information, see HTTP status codes.
  • The following table describes the response parameters.
Parameter Type Required Description
TaskId String Yes The ID of the recognition task.
StatusCode Integer Yes The status code.
StatuxText String Yes The status message.
RequestId String Yes The ID of the request, which is used for debugging.
Result Object Yes The recognition result object.
Sentences List< SentenceResult > Yes The list of the recognition results of sentences. This parameter is returned only when the value of the StatuxText parameter is SUCCEED.
Words List< WordResult > No The recognition results of words. This parameter is returned only when the enable_words parameter is set to true and the version parameter is set to 4.0.
BizDuration Long Yes The total duration of the recording file to be recognized. Unit: milliseconds.
SolveTime Long Yes The timestamp that indicates the time when the recording file recognition task was completed. Unit: milliseconds.

The following table describes the parameters in the recognition result of each sentence.

Parameter Type Required Description
ChannelId Integer Yes The ID of the audio track to which the sentence belongs.
BeginTime Integer Yes The start time offset of the sentence. Unit: milliseconds.
EndTime Integer Yes The end time offset of the sentence. Unit: milliseconds.
Text String Yes The recognition result of the sentence.
EmotionValue Integer Yes The emotion value. Valid values: [1, 10]. A larger value indicates a stronger emotion.
SilenceDuration Integer Yes The silence duration between the current and the previous sentences. Unit: seconds.
SpeechRate Integer Yes The average speed of the sentence. Unit: words per minute.

Recognition results of words:If the enable_words parameter is set to true and the version parameter is set to 4.0, the server returns the recognition results of words in the response. The recognition results of words obtained by the polling method are the same as those obtained by the callback method. The following response shows the recognition result obtained by the polling method:

  1. {
  2. "StatusCode": 21050000,
  3. "Result": {
  4. "Sentences": [{
  5. "SilenceDuration": 0,
  6. "EmotionValue": 5.0,
  7. "ChannelId": 0,
  8. "Text": "Weather in Beijing",
  9. "BeginTime": 340,
  10. "EndTime": 2365,
  11. "SpeechRate": 177
  12. }],
  13. "Words": [{
  14. "ChannelId": 0,
  15. "Word": "Weather",
  16. "BeginTime": 640,
  17. "EndTime": 940
  18. }, {
  19. "ChannelId": 0,
  20. "Word": "in",
  21. "BeginTime": 940,
  22. "EndTime": 1120
  23. }, {
  24. "ChannelId": 0,
  25. "Word": "Beijing",
  26. "BeginTime": 1120,
  27. "EndTime": 2020
  28. }]
  29. },
  30. "SolveTime": 1553236968873,
  31. "StatusText": "SUCCESS",
  32. "RequestId": "027B126B-4AC8-4C98-9FEC-A031158F3F5A",
  33. "TaskId": "b505e78c4c6d11e9a213e11db149f2ff",
  34. "BizDuration": 2956
  35. }

The following table describes the parameters in the recognition result of each word.

Parameter Type Required Description
BeginTime Integer Yes The start time of the word. Unit: milliseconds.
EndTime Integer Yes The end time of the word. Unit: milliseconds.
ChannelId Integer Yes The ID of the audio track to which the word belongs.
Word String Yes The recognition result of the word.

Service status codes

Normal codes

Status code Status message Description Solution
21050000 SUCCESS The request is successful after you use the POST method to send a recording file recognition request or the GET method to query the recording file recognition result. No solution is required.
21050001 RUNNING The recording file recognition task is running. Use the GET method to query the recording file recognition result later.
21050002 QUEUEING The recording file recognition task is waiting in a queue. Use the GET method to query the recording file recognition result later.
21050003 SUCCESS_WITH_NO_VALID_FRAGMENT The query request for the recording file recognition result is successful but the server does not detect any speech data. Check whether the recording file contains speech data or the duration of speech data is too short.

Error codes

Note: Status codes that start with 4 indicate client errors, whereas those that start with 5 indicate server errors.

Status code Status message Description Solution
41050001 USER_BIZDURATION_QUOTA_EXCEED The number of requests exceeds the quota of the day. If you have a large business volume and want to increase the upper limit, send an email to nls_support@service.aliyun.com.
41050002 FILE_DOWNLOAD_FAILED The file fails to be downloaded. Check whether the recording file path is correct or whether the recording file can be accessed and downloaded from a public network.
41050003 FILE_CHECK_FAILED The file format is incorrect. Check whether the recording file is a single-track or dual-track file in WAV or MP3 format.
41050004 FILE_TOO_LARGE The file is too large. Check whether the recording file is larger than 512 MB in size.
41050005 FILE_NORMALIZE_FAILED The file fails to be normalized. Check whether the recording file is damaged or cannot be played.
41050006 FILE_PARSE_FAILED The file fails to be parsed. Check whether the recording file is damaged or cannot be played.
41050007 MKV_PARSE_FAILED The MKV parsing fails. Check whether the recording file is damaged or cannot be played.
41050008 UNSUPPORTED_SAMPLE_RATE The audio sampling rate is not supported. Check whether the audio sampling rate of the recording file is 8 kHz or 16 kHz.
41050009 UNSUPPORTED_ASR_GROUP The automatic speech recognition (ASR) group is not supported. Check whether the appkey belongs to the same Alibaba Cloud account as the token.
41050010 FILE_TRANS_TASK_EXPIRED The recording file recognition task expires. Check whether the task ID exists or expires.
41050011 REQUEST_INVALID_FILE_URL_VALUE The specified file_link parameter is invalid. Check whether the file_link parameter is specified in a correct format.
41050012 REQUEST_INVALID_CALLBACK_VALUE The specified callback_url parameter is invalid. Check whether the callback_url parameter is specified in a correct format.
41050013 REQUEST_PARAMETER_INVALID The request contains incorrect parameters. Check whether the request body is a valid JSON-formatted string.
41050014 REQUEST_EMPTY_APPKEY_VALUE The appkey parameter is not specified. Check whether the appkey parameter is specified.
41050015 REQUEST_APPKEY_UNREGISTERED The specified appkey parameter is invalid. Check whether the appkey specified by the appkey parameter is valid or whether the appkey belongs to the same Alibaba Cloud account as the specified AccessKey ID.
41050021 RAM_CHECK_FAILED The RAM user fails authentication. Check whether the RAM user is granted the permission to use the Intelligent Speech Interaction API. For more information, see Activate Intelligent Speech Interaction: RAM user authentication in Quick Start.
41050023 CONTENT_LENGTH_CHECK_FAILED The specified Content-Length field is invalid. When you download the recording file, check whether the length indicated by the Content-Length field in the HTTP response header is the same as the actual length of data in the response body.
41050024 FILE_404_NOT_FOUND The file to be downloaded does not exist. Check whether the recording file to be downloaded exists.
41050025 FILE_403_FORBIDDEN You are not authorized to download the file. Check whether you have the permission to download the recording file.
41050026 FILE_SERVER_ERROR A file server error has occurred. Check whether the server where the recording file is stored works properly.
51050000 INTERNAL_ERROR An internal error has occurred. If the error code is occasionally returned, ignore it. If the error code is returned multiple times, open a ticket.
51050001 VAD_FAILED The voice activity detection (VAD) fails. If the error code is occasionally returned, ignore it. If the error code is returned multiple times, open a ticket.
51050002 RECOGNIZE_FAILED The ASR fails. If the error code is occasionally returned, ignore it. If the error code is returned multiple times, open a ticket.
51050003 RECOGNIZE_INTERRUPT The ASR is interrupted. If the error code is occasionally returned, ignore it. If the error code is returned multiple times, open a ticket.
51050004 OFFER_INTERRUPT The recognition task is prevented from being written to the queue. If the error code is occasionally returned, ignore it. If the error code is returned multiple times, open a ticket.
51050005 FILE_TRANS_TIMEOUT The recognition task fails due to a timeout. If the error code is occasionally returned, ignore it. If the error code is returned multiple times, open a ticket.
51050006 FRAGMENT_FAILED The multi-channel audio data fails to be converted to mono audio data. If the error code is occasionally returned, ignore it. If the error code is returned multiple times, open a ticket.

Earlier versions

If you have activated the recording file recognition service without setting the version to 4.0, its version is 2.0 by default. In 2.0, the recognition result obtained by the callback method differs from that obtained by the polling method. The differences lie in both the style and fields of the JSON-formatted string.If you set the enable_callback parameter to true and specify the callback_url parameter, the following response shows the recognition result obtained by the callback method:

  1. {
  2. "result": [{
  3. "begin_time": 340,
  4. "channel_id": 0,
  5. "emotion_value": 5.0,
  6. "end_time": 2365,
  7. "silence_duration": 0,
  8. "speech_rate": 177,
  9. "text": "Weather in Beijing"
  10. }],
  11. "task_id": "3f5d4c0c399511e98dc025f34473d12f",
  12. "status_code": 21050000,
  13. "status_text": "SUCCESS",
  14. "request_time": 1551164878830,
  15. "solve_time": 1551164879230,
  16. "biz_duration": 2956
  17. }