All Products
Search
Document Center

OpenSearch:Speech recognition

Last Updated:Aug 05, 2025

The AI Search Open Platform supports calling speech recognition services through APIs. It supports convert speech content in videos or audio into structured text. This can be used for scenarios such as meeting records, video retrieval, and online customer service.

Service list

Service name

Service ID (service_id)

Service description

API call QPS limit (including Alibaba Cloud account and RAM users)

Speech recognition service

ops-audio-asr-001

Extracts audio information to generate subtitle files.

5

Note

To apply for higher QPS, submit a ticket.

  • The authentication information is obtained.

    When you call an AI Search Open Platform service by using an API, you need to authenticate the caller's identity.

  • The service access address is obtained.

    You can call a service over the Internet or a virtual private cloud (VPC). For more information, see Get service registration address.

Create an asynchronous speech recognition task

Request method: POST

URL

POST {host}/v3/openapi/workspaces/{workspace_name}/audio-asr/{service_id}/async 
  • workspace_name: The name of the workspace, such as default.

  • service_id: The built-in service ID in the system, such as ops-audio-asr-001.

Request parameters

Header parameters

API-KEY authentication

Parameter

Type

Required

Description

Example value

Content-Type

String

Yes

Request type: application/json

application/json

Authorization

String

Yes

API-Key

Bearer OS-d1**2a

Body parameters

Parameter

Type

Required

Description

input

Object(input)

Yes

Specifies the multimedia file to be processed.

parameters

Object

No

Specifies the parameters for the service.

output

Object(output)

Yes

Controls the output.

input

Parameter

Type

Required

Description

content

String

No

Base64 encoding data of video/audio content.

Supported audio formats include mp3, wav, aac, flac, ogg, m4a, alac, and wma.

Supported video formats include mp4, avi, mkv, mov, flv, and webm.

Note

The input.content and input.oss parameters are mutually exclusive. You can only choose one of them.

Using BASE64 data: Pass the encoded BASE64 data to the content parameter in the format data:<TYPE>/<FORMAT>;base64,<BASE64_DATA>, where:

  • <TYPE>/<FORMAT>

    • For audio (such as MP3), fill in audio/mp3.

    • For video (such as MOV), fill in video/mov.

  • <BASE64_DATA>: BASE64 encoded data of the audio or video.

Examples:

  • Audio: data:audio/mp3;base64,AAAAIGZ0eXBtcDQyAAABAGlzbWZj...

  • Video: data:video/mov;base64,AAAAIGZ0eXBtcDQyAAABAGlzbWZj...

oss

String

No

The OSS path of the input file, for example, oss://<BUCKET_NAME>/xxx/xxx.mp3.

file_name

String

No

The name of the video/audio file. If not set, it will be parsed from the file name in the content.

output

Parameter

Type

Required

Description

type

String

No

text: Returns the speech recognition results in text form, only supported in synchronous task calls.

oss: The audio file is stored in OSS (default).

oss

String

No

The OSS path of the output file. This must be filled in when the type is oss.

Example: oss://<BUCKET_NAME>/result

Response parameters

Parameter

Type

Description

Example value

result.task_id

String

The unique identifier ID of the speech recognition task.

asr-xxxx-abc-123

Curl request example

curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <Your API-KEY>" \
  "http://***-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/audio-asr/ops-audio-asr-001/async"
  --data '{
  "input":{
      "oss":"oss://<BUCKET_NAME>/xxx/xxx.mp3",
      "file_name":"xxx"
    },
    "output" :{
      "type":"oss",
      "oss":"oss://<BUCKET_NAME>/result"
    }
  }' \ 

Sample response

{
  "request_id":"3eb8de02091b59431601f3bff******",
   "latency":37,
   "usage":{},
   "result":{
         "task_id":"asr-20250610164552-1108418170738252-******",
         "status":"PENDING"
             }
}

Get asynchronous speech recognition task status

Request method: GET

URL

{host}/v3/openapi/workspaces/{workspace_name}/audio-asr/{service_id}/async/task-status?task_id={task_id}
  • host: The address for calling the service. Call API services through the Internet or VPC. For more information, see Obtain service registration address.

  • workspace_name: The name of the workspace, such as default.

  • service_id: The built-in service ID in the system, such as ops-audio-asr-001.

  • task_id: The task identifier in the return parameters of creating an asynchronous speech recognition task.

Request parameters

Parameter

Type

Required

Description

Example

Content-Type

String

Yes

Request type: application/json

application/json

Authorization

String

Yes

API-Key

Bearer OS-d1**2a

Response parameters

Parameter

Type

Description

Example

request_id

String

The request ID.

3C09570D-12DB-46B4-BF0F-A100D79B****

latency

Float/Int

Request latency in ms.

3.0

result.task_id

String

The asynchronous task ID.

a7e4c0f6-874c-47e3-b05b-02278a96e****

result.status

String

Task status:

  • PENDING: Waiting to be processed.

  • SUCCESS: Task successfully completed.

  • FAIL: Task failed and terminated.

PENDING

result.error

String

Error message content when status=FAIL. Empty under normal conditions.

result.data

List(AsrResult)

The result of speech recognition. This field is empty when the asynchronous task status is not successfully completed (SUCCESS).

usage.duration

Float.duration

The duration of the audio file.

AsrResult

Parameter

Type

Description

text

String

Text data obtained from speech recognition.

start

Float

The start timestamp of the current text in the video, in seconds.

end

Float

The end timestamp of the current text in the video, in seconds.

Curl request example

curl -X GET \
-H"Content-Type: application/json" \
-H "Authorization: Bearer <Your API-KEY>" \
"http://***-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/audio-asr/ops-audio-asr-001/async/task-status?task_id=asr-20250618112151-1108418170738252-******" 
 

Sample response

{
  "request_id": "1a1a4ca4b7a91dd630a40c54af******",
  "latency": 9,
  "usage": {
    "duration": 9
  },
  "result": {
    "task_id": "asr-20250618112151-1108418170738252-******",
    "status": "SUCCESS",
    "data": [
      {
        "text": "Rong Jielvdou began to speak, his voice as warm as the spring sun,",
        "start": 0.0,
        "end": 3.9
      },
      {
        "text": "full of life and warming the hearts of everyone who listened.",
        "start": 4.24,
        "end": 9.06
      }
    ]
  }
}

Create a synchronous speech recognition task

Request method: POST

URL

{host}/v3/openapi/workspaces/{workspace_name}/audio-asr/{service_id}/sync
  • host: The address for calling the service. You can call API services through the Internet or VPC. For more information, see Obtain service registration address.

  • workspace_name: The name of the workspace, such as default.

  • service_id: The built-in service ID in the system, such as ops-audio-asr-001.

Request parameters

Header parameters

API-KEY authentication

Parameter

Type

Required

Description

Example value

Content-Type

String

Yes

Request type: application/json

application/json

Authorization

String

Yes

API-Key

Bearer OS-d1**2a

Body parameters

Parameter

Type

Required

Description

input

Object(input)

Yes

Specifies the multimedia file to be processed.

parameters

Object

No

Specifies the parameters for the service.

output

Object(output)

Yes

Controls the output.

input

Parameter

Type

Required

Description

content

String

No

Base64 encoding data of video/audio content.

Supported audio formats include mp3, wav, aac, flac, ogg, m4a, alac, and wma.

Supported video formats include mp4, avi, mkv, mov, flv, and webm.

Note

The input.content and input.oss parameters are mutually exclusive. You can only choose one of them.

Using BASE64 data: Pass the encoded BASE64 data to the content parameter in the format data:<TYPE>/<FORMAT>;base64,<BASE64_DATA>, where:

  • <TYPE>/<FORMAT>

    • For audio (such as MP3), fill in audio/mp3.

    • For video (such as MOV), fill in video/mov.

  • <BASE64_DATA>: BASE64 encoded data of the audio or video.

Examples:

  • Audio: data:audio/mp3;base64,AAAAIGZ0eXBtcDQyAAABAGlzbWZj...

  • Video: data:video/mov;base64,AAAAIGZ0eXBtcDQyAAABAGlzbWZj...

oss

String

No

The OSS path of the input file, for example, oss://<BUCKET_NAME>/xxx/xxx.mp3.

file_name

String

No

The name of the video/audio file. If not set, it will be parsed from the file name in the content.

Output

Parameter

Type

Required

Description

type

String

No

text: Returns the speech recognition results in text form. Only supports synchronous calls.

oss: The video/audio file is stored in OSS (default).

oss

String

No

The OSS path of the output file. This must be filled in when the type is oss.

Example: oss://<BUCKET_NAME>/result

Response parameters

Parameter

Type

Description

Example value

result.task_id

String

The unique identifier ID of the speech recognition task.

asr-xxxx-abc-123

Curl request example

curl -X POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <Your API-KEY>" \
  "http://***-hangzhou.opensearch.aliyuncs.com/v3/openapi/workspaces/default/audio-asr/ops-audio-asr-001/sync"
  --data '{
  "input":{
      "oss":"oss://<BUCKET_NAME>/xxx/xxx.mp3",
      "file_name":"xxx"
    },
    "output" :{
      "type":"oss",
      "oss":"oss://<BUCKET_NAME>/result"
    }
  }' \ 

Sample response

{
  "request_id": "df96b5c444281e0e79561fe9f8******",
  "latency": 570,
  "usage": {
    "duration": 9
  },
  "result": {
    "task_id": "asr-20250618132401-1108418170738252-******",
    "status": "SUCCESS",
    "data": [
      {
        "text": "Rong Jielvdou began to speak, his voice as warm as the spring sun,",
        "start": 0.0,
        "end": 3.9
      },
      {
        "text": "full of life and warming the hearts of everyone who listened.",
        "start": 4.24,
        "end": 9.06
      }
    ]
  }
}

Status code description

In case of request errors, the output result will indicate the error reason through code and message.

{
    "request_id": "6F33AFB6-A35C-4DA7-AFD2-9EA16CCF****",
    "latency": 2.0,
    "code": "InvalidParameter",
    "http_code": 400,
    "message": "JSON parse error: Cannot deserialize value of type `ImageStorage` from String \\"xxx\\"
}

HTTP status code

Error code

Description

200

-

Request successful, including task failure scenarios. The actual task status needs to be determined from result.status.

404

BadRequest.TaskNotExist

The error message returned because the task does not exist.

400

InvalidParameter

Invalid Request.

500

InternalServerError

Internal error.

For more information about status codes, see Status code description.