Speech recognition - OpenSearch - Alibaba Cloud Documentation Center

The AI Search Open Platform supports calling speech recognition services through APIs. It supports convert speech content in videos or audio into structured text. This can be used for scenarios such as meeting records, video retrieval, and online customer service.

Service list

Service name	Service ID (service_id)	Service description	API call QPS limit (including Alibaba Cloud account and RAM users)
Speech recognition service	ops-audio-asr-001	Extracts audio information to generate subtitle files.	5 Note To apply for higher QPS, submit a ticket.

The authentication information is obtained.
When you call an AI Search Open Platform service by using an API, you need to authenticate the caller's identity.
The service access address is obtained.
You can call a service over the Internet or a virtual private cloud (VPC). For more information, see Get service registration address.

Create an asynchronous speech recognition task

Request method: POST

URL

POST {host}/v3/openapi/workspaces/{workspace_name}/audio-asr/{service_id}/async

host: The address for calling the service. You can call API services through the Internet or VPC. For more information, see Obtain service registration address.

workspace_name: The name of the workspace, such as default.
service_id: The built-in service ID in the system, such as ops-audio-asr-001.

Request parameters

Header parameters

API-KEY authentication

Parameter	Type	Required	Description	Example value
Content-Type	String	Yes	Request type: application/json	application/json
Authorization	String	Yes	API-Key	Bearer OS-d1**2a

Body parameters

Parameter	Type	Required	Description
input	Object(input)	Yes	Specifies the multimedia file to be processed.
parameters	Object	No	Specifies the parameters for the service.
output	Object(output)	Yes	Controls the output.

input

Parameter	Type	Required	Description
content	String	No	Base64 encoding data of video/audio content. Supported audio formats include mp3, wav, aac, flac, ogg, m4a, alac, and wma. Supported video formats include mp4, avi, mkv, mov, flv, and webm. Note The input.content and input.oss parameters are mutually exclusive. You can only choose one of them. Using BASE64 data: Pass the encoded BASE64 data to the `content` parameter in the format `data:<TYPE>/<FORMAT>;base64,<BASE64_DATA>`, where: `<TYPE>/<FORMAT>` For audio (such as MP3), fill in audio/mp3. For video (such as MOV), fill in video/mov. `<BASE64_DATA>`: BASE64 encoded data of the audio or video. Examples: Audio: data:audio/mp3;base64,AAAAIGZ0eXBtcDQyAAABAGlzbWZj... Video: data:video/mov;base64,AAAAIGZ0eXBtcDQyAAABAGlzbWZj...
oss	String	No	The OSS path of the input file, for example, oss://<BUCKET_NAME>/xxx/xxx.mp3.
file_name	String	No	The name of the video/audio file. If not set, it will be parsed from the file name in the content.

output

Parameter

Type

Required

Description

type

String

text: Returns the speech recognition results in text form, only supported in synchronous task calls.

oss: The audio file is stored in OSS (default).

oss

String

The OSS path of the output file. This must be filled in when the type is oss.

Example: oss://<BUCKET_NAME>/result

Response parameters

Parameter	Type	Description	Example value
result.task_id	String	The unique identifier ID of the speech recognition task.	asr-xxxx-abc-123

Curl request example

curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <Your API-KEY>" \
  "http://***-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/audio-asr/ops-audio-asr-001/async"
  --data '{
  "input":{
      "oss":"oss://<BUCKET_NAME>/xxx/xxx.mp3",
      "file_name":"xxx"
    },
    "output" :{
      "type":"oss",
      "oss":"oss://<BUCKET_NAME>/result"
    }
  }' \

Sample response

{
  "request_id":"3eb8de02091b59431601f3bff******",
   "latency":37,
   "usage":{},
   "result":{
         "task_id":"asr-20250610164552-1108418170738252-******",
         "status":"PENDING"
             }
}

Get asynchronous speech recognition task status

Request method: GET

URL

{host}/v3/openapi/workspaces/{workspace_name}/audio-asr/{service_id}/async/task-status?task_id={task_id}

host: The address for calling the service. Call API services through the Internet or VPC. For more information, see Obtain service registration address.
workspace_name: The name of the workspace, such as default.
service_id: The built-in service ID in the system, such as ops-audio-asr-001.
task_id: The task identifier in the return parameters of creating an asynchronous speech recognition task.

Request parameters

Parameter	Type	Required	Description	Example
Content-Type	String	Yes	Request type: application/json	application/json
Authorization	String	Yes	API-Key	Bearer OS-d1**2a

Response parameters

Parameter	Type	Description	Example
request_id	String	The request ID.	3C09570D-12DB-46B4-BF0F-A100D79B****
latency	Float/Int	Request latency in ms.	3.0
result.task_id	String	The asynchronous task ID.	a7e4c0f6-874c-47e3-b05b-02278a96e****
result.status	String	Task status: PENDING: Waiting to be processed. SUCCESS: Task successfully completed. FAIL: Task failed and terminated.	PENDING
result.error	String	Error message content when status=FAIL. Empty under normal conditions.
result.data	List(AsrResult)	The result of speech recognition. This field is empty when the asynchronous task status is not successfully completed (SUCCESS).
usage.duration	Float.duration	The duration of the audio file.

AsrResult

Parameter	Type	Description
text	String	Text data obtained from speech recognition.
start	Float	The start timestamp of the current text in the video, in seconds.
end	Float	The end timestamp of the current text in the video, in seconds.

Curl request example

curl -X GET \
-H"Content-Type: application/json" \
-H "Authorization: Bearer <Your API-KEY>" \
"http://***-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/audio-asr/ops-audio-asr-001/async/task-status?task_id=asr-20250618112151-1108418170738252-******"

Sample response

{
  "request_id": "1a1a4ca4b7a91dd630a40c54af******",
  "latency": 9,
  "usage": {
    "duration": 9
  },
  "result": {
    "task_id": "asr-20250618112151-1108418170738252-******",
    "status": "SUCCESS",
    "data": [
      {
        "text": "Rong Jielvdou began to speak, his voice as warm as the spring sun,",
        "start": 0.0,
        "end": 3.9
      },
      {
        "text": "full of life and warming the hearts of everyone who listened.",
        "start": 4.24,
        "end": 9.06
      }
    ]
  }
}

Create a synchronous speech recognition task

Request method: POST

URL

{host}/v3/openapi/workspaces/{workspace_name}/audio-asr/{service_id}/sync

host: The address for calling the service. You can call API services through the Internet or VPC. For more information, see Obtain service registration address.
workspace_name: The name of the workspace, such as default.
service_id: The built-in service ID in the system, such as ops-audio-asr-001.

Request parameters

Header parameters

API-KEY authentication

Parameter	Type	Required	Description	Example value
Content-Type	String	Yes	Request type: application/json	application/json
Authorization	String	Yes	API-Key	Bearer OS-d1**2a

Body parameters

Parameter	Type	Required	Description
input	Object(input)	Yes	Specifies the multimedia file to be processed.
parameters	Object	No	Specifies the parameters for the service.
output	Object(output)	Yes	Controls the output.

input

Parameter	Type	Required	Description
content	String	No	Base64 encoding data of video/audio content. Supported audio formats include mp3, wav, aac, flac, ogg, m4a, alac, and wma. Supported video formats include mp4, avi, mkv, mov, flv, and webm. Note The input.content and input.oss parameters are mutually exclusive. You can only choose one of them. Using BASE64 data: Pass the encoded BASE64 data to the `content` parameter in the format `data:<TYPE>/<FORMAT>;base64,<BASE64_DATA>`, where: `<TYPE>/<FORMAT>` For audio (such as MP3), fill in audio/mp3. For video (such as MOV), fill in video/mov. `<BASE64_DATA>`: BASE64 encoded data of the audio or video. Examples: Audio: data:audio/mp3;base64,AAAAIGZ0eXBtcDQyAAABAGlzbWZj... Video: data:video/mov;base64,AAAAIGZ0eXBtcDQyAAABAGlzbWZj...
oss	String	No	The OSS path of the input file, for example, oss://<BUCKET_NAME>/xxx/xxx.mp3.
file_name	String	No	The name of the video/audio file. If not set, it will be parsed from the file name in the content.

Output

Parameter

Type

Required

Description

type

String

text: Returns the speech recognition results in text form. Only supports synchronous calls.

oss: The video/audio file is stored in OSS (default).

oss

String

The OSS path of the output file. This must be filled in when the type is oss.

Example: oss://<BUCKET_NAME>/result

Response parameters

Parameter	Type	Description	Example value
result.task_id	String	The unique identifier ID of the speech recognition task.	asr-xxxx-abc-123

Curl request example

curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <Your API-KEY>" \
  "http://***-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/audio-asr/ops-audio-asr-001/sync"
  --data '{
  "input":{
      "oss":"oss://<BUCKET_NAME>/xxx/xxx.mp3",
      "file_name":"xxx"
    },
    "output" :{
      "type":"oss",
      "oss":"oss://<BUCKET_NAME>/result"
    }
  }' \

Sample response

{
  "request_id": "df96b5c444281e0e79561fe9f8******",
  "latency": 570,
  "usage": {
    "duration": 9
  },
  "result": {
    "task_id": "asr-20250618132401-1108418170738252-******",
    "status": "SUCCESS",
    "data": [
      {
        "text": "Rong Jielvdou began to speak, his voice as warm as the spring sun,",
        "start": 0.0,
        "end": 3.9
      },
      {
        "text": "full of life and warming the hearts of everyone who listened.",
        "start": 4.24,
        "end": 9.06
      }
    ]
  }
}

Status code description

In case of request errors, the output result will indicate the error reason through code and message.

{
    "request_id": "6F33AFB6-A35C-4DA7-AFD2-9EA16CCF****",
    "latency": 2.0,
    "code": "InvalidParameter",
    "http_code": 400,
    "message": "JSON parse error: Cannot deserialize value of type `ImageStorage` from String \\"xxx\\"
}

HTTP status code	Error code	Description
200	-	Request successful, including task failure scenarios. The actual task status needs to be determined from result.status.
404	BadRequest.TaskNotExist	The error message returned because the task does not exist.
400	InvalidParameter	Invalid Request.
500	InternalServerError	Internal error.

For more information about status codes, see Status code description.