All Products
Search
Document Center

Intelligent Speech Interaction:Overview

Last Updated:Dec 12, 2023

The short sentence recognition service recognizes short speeches that last within 60 seconds. The service applies to scenarios such as chat conversation, voice command control, voice search in applications, and speech input.

Features

  • Supports pulse-code modulation (PCM) encoded 16-bit mono audio files.

  • Supports the audio sampling rates of 8,000 Hz and 16,000 Hz.

  • Recognizes short speeches that last within 60 seconds.

  • Allows you to specify the following configuration items about the service response:

    • Whether to return intermediate results

    • Whether to add punctuation marks during post-processing

    • Whether to convert Chinese numerals to Arabic numerals

  • Allows you to select linguistic models to recognize speeches in different languages when you manage projects in the Intelligent Speech Interaction console. For more information, see Manage projects.

    The currently supported languages and dialect models include Vietnamese, Thai, Turkish, Russian, Portuguese, Malaysian, Italian, Kazakh, Indonesian, Hindi, German, French, Filipino, Arabic, Japanese, Korean, English, Chinese Mandarin, and Cantonese.

Endpoints

Access type

Description

URL

External access from the Internet

This endpoint allows you to access the short sentence recognition service from any host over the Internet. By default, the Internet access URL is built in the SDK.

wss://nls-gateway.cn-shanghai.aliyuncs.com/ws/v1

1. Authenticate the client

To establish a WebSocket connection with the server, the client must use a token for authentication. For more information about how to obtain the token, see Obtain a token.

2. Start and confirm recognition

The client sends a request to start short sentence recognition to the server. The server confirms that the request is valid.

You must set the request parameters for the client to send a service request. You can set the request parameters by using the set method of the SpeechRecognizer object in the SDK. The following table describes the request parameters.

Parameter

Type

Required

Description

appkey

String

Yes

The appkey of your project that is created in the Intelligent Speech Interaction console.

format

String

No

The audio coding format. Default value: PCM.

The short sentence recognition service supports PCM encoded 16-bit mono audio files.

sample_rate

Integer

No

The audio sampling rate. Unit: Hz. Default value: 16000.

After you set this parameter, you must specify a model or scene that is applicable to the audio sampling rate for your project in the Intelligent Speech Interaction console.

enable_intermediate_result

Boolean

No

Specifies whether to return intermediate results. Default value: False.

enable_punctuation_prediction

Boolean

No

Specifies whether to add punctuation marks during post-processing. Default value: False.

enable_inverse_text_normalization

Boolean

No

Specifies whether to enable inverse text normalization (ITN) during post-processing. Valid values: True and False. Default value: False. If you set this parameter to True, Chinese numerals are converted to Arabic numerals.

customization_id

String

No

The ID of the custom linguistic model.

vocabulary_id

String

No

The vocabulary ID of custom hotwords.

enable_voice_detection

Boolean

No

Specifies whether to enable voice detection. Default value: False.

max_start_silence

Integer

No

The maximum duration of start silence. Unit: milliseconds. Valid values: (0, 60000]. This parameter takes effect only when the enable_voice_detection parameter is set to True.

If the actual duration of start silence exceeds the value of this parameter, the server sends a TaskFailed event to end the recognition task.

max_end_silence

Integer

No

The maximum duration of end silence. Unit: milliseconds. Valid values: 200 to 2000. This parameter takes effect only when the enable_voice_detection parameter is set to True.

If the actual duration of end silence exceeds the value of this parameter, the server sends a RecognitionCompleted message to complete the recognition task. Then, the subsequent speech is no longer processed.

3. Send and recognize audio data

The client cyclically sends audio data to the server and continuously receives recognition results from the server.

  • If the enable_intermediate_result parameter is set to True, the server sends multiple RecognitionResultChanged messages to return intermediate results. For example, the server returns the following intermediate results:

    Weather
    Weather in Beijing

    The server returns the following response:

    {
    	"header": {
    		"namespace": "SpeechRecognizer",
    		"name": "RecognitionResultChanged",
    		"status": 20000000,
    		"message_id": "e06d2b5d50ca40d5a50d4215c7c8****",
    		"task_id": "4c3502c7a5ce4ac3bdc488749ce4****",
    		"status_text": "Gateway:SUCCESS:Success."
    	},
    	"payload": {
    		"result": "Weather in Beijing"
    	}
    }
    

    The following table describes the parameters in the header object.

    Parameter

    Type

    Description

    namespace

    String

    The namespace of the message.

    name

    String

    The name of the message. The RecognitionResultChanged message indicates that an intermediate result is obtained.

    status

    Integer

    The status code. It indicates whether the request is successful. For more information, see the "Service status codes" section of this topic.

    status_text

    String

    The status message.

    task_id

    String

    The globally unique identifier (GUID) of the recognition task. Record the value of this parameter to facilitate troubleshooting.

    message_id

    String

    The ID of the message.

    The following table describes the result parameter in the payload object.

    Parameter

    Type

    Description

    result

    String

    The intermediate result of the recognition task.

    Note

    The latest intermediate result may be different from the final result. Use the result included in the RecognitionCompleted message as the final result.

  • If the enable_intermediate_result parameter is set to False, the server does not return any messages in this step.

4. Stop and complete recognition

The client sends a request to stop short sentence recognition to the server. The server returns the final recognition result. For example, the server returns the following final recognition result:

{
	"header": {
		"namespace": "SpeechRecognizer",
		"name": "RecognitionCompleted",
		"status": 20000000,
		"message_id": "10490c992aef44eaa4246614838f****",
		"task_id": "4c3502c7a5ce4ac3bdc488749ce4****",
		"status_text": "Gateway:SUCCESS:Success."
	},
	"payload": {
		"result": "Weather in Beijing."
	}
}

The following table describes the parameters in the header object.

Parameter

Type

Description

namespace

String

The namespace of the message.

name

String

The name of the message. The RecognitionCompleted message indicates that the recognition task is completed.

status

Integer

The status code. It indicates whether the request is successful. For more information, see the "Service status codes" section of this topic.

status_text

String

The status message.

task_id

String

The GUID of the recognition task. Record the value of this parameter to facilitate troubleshooting.

message_id

String

The ID of the message, which is automatically generated by the SDK.

The following table describes the result parameter in the payload object.

Parameter

Type

Description

result

String

The final recognition result.

Service status codes

Each response message contains a status field that indicates the service status code. The following tables describe common error codes, gateway error codes, and configuration error codes.

  • Common error codes

    Error code

    Description

    Solution

    40000001

    The error message returned because the client fails authentication.

    Check whether the token that is used by the client is correct and valid.

    40000002

    The error message returned because the request is invalid.

    Check whether the request that is sent by the client meets the requirements.

    403

    The error message returned because the token expires or the request contains invalid parameters.

    1. Check whether the token that is used by the client is valid.

    2. Check whether the parameter values are valid.

    40000004

    The error message returned because the idle status of the client times out.

    Check whether the client does not send data to the server for a long time, for example, 10s.

    40000005

    The error message returned because the number of requests exceeds the upper limit.

    Check whether the number of concurrent connections or queries per second (QPS) exceeds the upper limit. If the number of concurrent connections exceeds the upper limit, we recommend that you upgrade Intelligent Speech Interaction from the trial edition to Commercial Edition. If you have upgraded the service to Commercial Edition, we recommend that you purchase more resources for higher concurrency.

    41050008

    The error message returned because the specified audio sampling rate does not match that of the selected model.

    Check whether the audio sampling rate specified for the service call matches the audio sampling rate of the automatic speech recognition (ASR) model that is bound to the appkey of your project in the console.

    41010101

    The error message returned because the specified audio sampling rate is not supported.

    Check whether the audio sampling rate specified for the service call matches the audio sampling rate of the ASR model that is bound to the appkey of your project in the console.

    41010120

    The error message returned because a client time-out error has occurred.

    Check whether the client does not send data to the server for at least 10 consecutive seconds.

    40000000

    The error message returned because a client error has occurred. This is the default client error code.

    Resolve the error based on the error message or submit a ticket.

    50000000

    The error message returned because a server error has occurred. This is the default server error code.

    If the error code is occasionally returned, ignore it. If the error code is returned multiple times, submit a ticket.

    50000001

    The error message returned because an internal call error has occurred.

    If the error code is occasionally returned, ignore it. If the error code is returned multiple times, submit a ticket.

    52010001

    The error message returned because an internal call error has occurred.

    If the error code is occasionally returned, ignore it. If the error code is returned multiple times, submit a ticket.

  • Gateway error codes

    Error code

    Description

    Solution

    40010001

    The error message returned because the method is not supported.

    If you use the SDK, submit a ticket.

    40010002

    The error message returned because the instruction is not supported.

    If you use the SDK, submit a ticket.

    40010003

    The error message returned because the instruction format is invalid.

    If you use the SDK, submit a ticket.

    40010004

    The error message returned because the client is unexpectedly disconnected.

    Check whether the client is disconnected before the server completes the requested task.

    40010005

    The error message returned because the task is in an abnormal state.

    Check whether the instruction is supported in the current task status.

  • Configuration error codes

    Error code

    Description

    Solution

    40020105

    The error message returned because the application does not exist.

    Resolve the route to check whether the application exists.

    40020106

    The error message returned because the specified appkey and token do not match.

    Check whether the appkey of the application is correct and belongs to the same Alibaba Cloud account as the token.

    40020503

    The error message returned because RAM user authentication fails.

    Use your Alibaba Cloud account to authorize the RAM user to access the pctowap open platform (POP) API.