The AI Search Open Platform supports calling speech recognition services through APIs. It supports convert speech content in videos or audio into structured text. This can be used for scenarios such as meeting records, video retrieval, and online customer service.
Service list
Service name | Service ID (service_id) | Service description | API call QPS limit (including Alibaba Cloud account and RAM users) |
Speech recognition service | ops-audio-asr-001 | Extracts audio information to generate subtitle files. | 5 Note To apply for higher QPS, submit a ticket. |
The authentication information is obtained.
When you call an AI Search Open Platform service by using an API, you need to authenticate the caller's identity.
The service access address is obtained.
You can call a service over the Internet or a virtual private cloud (VPC). For more information, see Get service registration address.
Create an asynchronous speech recognition task
Request method: POST
URL
POST {host}/v3/openapi/workspaces/{workspace_name}/audio-asr/{service_id}/async host: The address for calling the service. You can call API services through the Internet or VPC. For more information, see Obtain service registration address.
workspace_name: The name of the workspace, such as default.
service_id: The built-in service ID in the system, such as ops-audio-asr-001.
Request parameters
Header parameters
API-KEY authentication
Parameter | Type | Required | Description | Example value |
Content-Type | String | Yes | Request type: application/json | application/json |
Authorization | String | Yes | API-Key | Bearer OS-d1**2a |
Body parameters
Parameter | Type | Required | Description |
input | Object(input) | Yes | Specifies the multimedia file to be processed. |
parameters | Object | No | Specifies the parameters for the service. |
output | Object(output) | Yes | Controls the output. |
input
Parameter | Type | Required | Description |
content | String | No | Base64 encoding data of video/audio content. Supported audio formats include mp3, wav, aac, flac, ogg, m4a, alac, and wma. Supported video formats include mp4, avi, mkv, mov, flv, and webm. Note The input.content and input.oss parameters are mutually exclusive. You can only choose one of them. Using BASE64 data: Pass the encoded BASE64 data to the
Examples:
|
oss | String | No | The OSS path of the input file, for example, oss://<BUCKET_NAME>/xxx/xxx.mp3. |
file_name | String | No | The name of the video/audio file. If not set, it will be parsed from the file name in the content. |
output
Parameter | Type | Required | Description |
type | String | No | text: Returns the speech recognition results in text form, only supported in synchronous task calls. oss: The audio file is stored in OSS (default). |
oss | String | No | The OSS path of the output file. This must be filled in when the type is oss. Example: |
Response parameters
Parameter | Type | Description | Example value |
result.task_id | String | The unique identifier ID of the speech recognition task. | asr-xxxx-abc-123 |
Curl request example
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <Your API-KEY>" \
"http://***-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/audio-asr/ops-audio-asr-001/async"
--data '{
"input":{
"oss":"oss://<BUCKET_NAME>/xxx/xxx.mp3",
"file_name":"xxx"
},
"output" :{
"type":"oss",
"oss":"oss://<BUCKET_NAME>/result"
}
}' \
Sample response
{
"request_id":"3eb8de02091b59431601f3bff******",
"latency":37,
"usage":{},
"result":{
"task_id":"asr-20250610164552-1108418170738252-******",
"status":"PENDING"
}
}Get asynchronous speech recognition task status
Request method: GET
URL
{host}/v3/openapi/workspaces/{workspace_name}/audio-asr/{service_id}/async/task-status?task_id={task_id}host: The address for calling the service. Call API services through the Internet or VPC. For more information, see Obtain service registration address.
workspace_name: The name of the workspace, such as default.
service_id: The built-in service ID in the system, such as ops-audio-asr-001.
task_id: The task identifier in the return parameters of creating an asynchronous speech recognition task.
Request parameters
Parameter | Type | Required | Description | Example |
Content-Type | String | Yes | Request type: application/json | application/json |
Authorization | String | Yes | API-Key | Bearer OS-d1**2a |
Response parameters
Parameter | Type | Description | Example |
request_id | String | The request ID. | 3C09570D-12DB-46B4-BF0F-A100D79B**** |
latency | Float/Int | Request latency in ms. | 3.0 |
result.task_id | String | The asynchronous task ID. | a7e4c0f6-874c-47e3-b05b-02278a96e**** |
result.status | String | Task status:
| PENDING |
result.error | String | Error message content when status=FAIL. Empty under normal conditions. | |
result.data | List(AsrResult) | The result of speech recognition. This field is empty when the asynchronous task status is not successfully completed (SUCCESS). | |
usage.duration | Float.duration | The duration of the audio file. |
AsrResult
Parameter | Type | Description |
text | String | Text data obtained from speech recognition. |
start | Float | The start timestamp of the current text in the video, in seconds. |
end | Float | The end timestamp of the current text in the video, in seconds. |
Curl request example
curl -X GET \
-H"Content-Type: application/json" \
-H "Authorization: Bearer <Your API-KEY>" \
"http://***-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/audio-asr/ops-audio-asr-001/async/task-status?task_id=asr-20250618112151-1108418170738252-******"
Sample response
{
"request_id": "1a1a4ca4b7a91dd630a40c54af******",
"latency": 9,
"usage": {
"duration": 9
},
"result": {
"task_id": "asr-20250618112151-1108418170738252-******",
"status": "SUCCESS",
"data": [
{
"text": "Rong Jielvdou began to speak, his voice as warm as the spring sun,",
"start": 0.0,
"end": 3.9
},
{
"text": "full of life and warming the hearts of everyone who listened.",
"start": 4.24,
"end": 9.06
}
]
}
}Create a synchronous speech recognition task
Request method: POST
URL
{host}/v3/openapi/workspaces/{workspace_name}/audio-asr/{service_id}/synchost: The address for calling the service. You can call API services through the Internet or VPC. For more information, see Obtain service registration address.
workspace_name: The name of the workspace, such as default.
service_id: The built-in service ID in the system, such as ops-audio-asr-001.
Request parameters
Header parameters
API-KEY authentication
Parameter | Type | Required | Description | Example value |
Content-Type | String | Yes | Request type: application/json | application/json |
Authorization | String | Yes | API-Key | Bearer OS-d1**2a |
Body parameters
Parameter | Type | Required | Description |
input | Object(input) | Yes | Specifies the multimedia file to be processed. |
parameters | Object | No | Specifies the parameters for the service. |
output | Object(output) | Yes | Controls the output. |
input
Parameter | Type | Required | Description |
content | String | No | Base64 encoding data of video/audio content. Supported audio formats include mp3, wav, aac, flac, ogg, m4a, alac, and wma. Supported video formats include mp4, avi, mkv, mov, flv, and webm. Note The input.content and input.oss parameters are mutually exclusive. You can only choose one of them. Using BASE64 data: Pass the encoded BASE64 data to the
Examples:
|
oss | String | No | The OSS path of the input file, for example, oss://<BUCKET_NAME>/xxx/xxx.mp3. |
file_name | String | No | The name of the video/audio file. If not set, it will be parsed from the file name in the content. |
Output
Parameter | Type | Required | Description |
type | String | No | text: Returns the speech recognition results in text form. Only supports synchronous calls. oss: The video/audio file is stored in OSS (default). |
oss | String | No | The OSS path of the output file. This must be filled in when the type is oss. Example: |
Response parameters
Parameter | Type | Description | Example value |
result.task_id | String | The unique identifier ID of the speech recognition task. | asr-xxxx-abc-123 |
Curl request example
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <Your API-KEY>" \
"http://***-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/audio-asr/ops-audio-asr-001/sync"
--data '{
"input":{
"oss":"oss://<BUCKET_NAME>/xxx/xxx.mp3",
"file_name":"xxx"
},
"output" :{
"type":"oss",
"oss":"oss://<BUCKET_NAME>/result"
}
}' \
Sample response
{
"request_id": "df96b5c444281e0e79561fe9f8******",
"latency": 570,
"usage": {
"duration": 9
},
"result": {
"task_id": "asr-20250618132401-1108418170738252-******",
"status": "SUCCESS",
"data": [
{
"text": "Rong Jielvdou began to speak, his voice as warm as the spring sun,",
"start": 0.0,
"end": 3.9
},
{
"text": "full of life and warming the hearts of everyone who listened.",
"start": 4.24,
"end": 9.06
}
]
}
}Status code description
In case of request errors, the output result will indicate the error reason through code and message.
{
"request_id": "6F33AFB6-A35C-4DA7-AFD2-9EA16CCF****",
"latency": 2.0,
"code": "InvalidParameter",
"http_code": 400,
"message": "JSON parse error: Cannot deserialize value of type `ImageStorage` from String \\"xxx\\"
}HTTP status code | Error code | Description |
200 | - | Request successful, including task failure scenarios. The actual task status needs to be determined from result.status. |
404 | BadRequest.TaskNotExist | The error message returned because the task does not exist. |
400 | InvalidParameter | Invalid Request. |
500 | InternalServerError | Internal error. |
For more information about status codes, see Status code description.