All Products
Search
Document Center

Overview

Last Updated: Sep 18, 2020

The short sentence recognition service provides the Natural User Interaction (NUI) SDK for mobile clients to recognize speeches that last within 60 seconds in real time. The SDK applies to scenarios such as chat conversation, voice command control, voice search in applications, and speech input.

Description

Compared with common SDKs, the NUI SDK is smaller in size and supports more comprehensive status management. The NUI SDK provides comprehensive speech processing capabilities and can also serve as an atomic SDK, meeting diverse user requirements. In addition, the NUI SDK uses a unified API.

Featrures

  • Supports pulse-code modulation (PCM) encoded 16-bit mono audio files.

  • Supports the audio sampling rates of 8,000 Hz and 16,000 Hz.

  • Supports recognizing speeches that last within 60 seconds.

  • Allows you to specify whether to return intermediate results, whether to add punctuation marks during post-processing, and whether to convert Chinese numerals to Arabic numerals.

  • Allows you to select linguistic models to recognize speeches in different languages when you manage projects in the Intelligent Speech Interaction console. For more information, see Manage projects.

Endpoints

Access type

Description

URL

External access from the Internet

This endpoint allows you to access the short sentence recognition service from any host over the Internet. By default, the Internet access URL is built in the SDK.

wss://nls-gateway.cn-shanghai.aliyuncs.com/ws/v1

Internal access from an Elastic Compute Service (ECS) instance located in the China (Shanghai) region

This endpoint allows you to access the short sentence recognition service from an ECS instance located in the China (Shanghai) region over an internal network.<br> You cannot access an AnyTunnel virtual IP address (VIP) from a classic network-connected ECS instance. This means that you cannot use such an ECS instance to access the short sentence recognition service over an internal network. To access this service by using an AnyTunnel VIP, you must create a virtual private network (VPC) and access the service from the VPC.

Note

  • Access from an ECS instance over the internal network does not consume Internet access traffic.

  • For more information about the network types of ECS instances, see Network types.

ws://nls-gateway.cn-shanghai-internal.aliyuncs.com:80/ws/v1

Interaction process

The following figure shows the interaction process between the SDK and an Android or iOS client.

Interaction process of short sentence recognition

Note

The server adds the task_id parameter to the response header for all responses to indicate the ID of the recognition task. You can record the value of this parameter. If an error occurs, you can submit a ticket to report the task ID and error message.

1. Authenticate the client and initialize the SDK

To establish a WebSocket connection with the server, the client must use a token for authentication. For more information about how to obtain the token, see Obtain a token.

The following table describes the parameters used for authentication and initialization.

Parameter

Type

Required

Description

workspaceStringYesThe working directory from which the SDK reads the configuration file.
app_keyStringYesThe appkey of your project created in the Intelligent Speech Interaction console.
tokenStringYesThe token provided as the credential for you to use Intelligent Speech Interaction. Make sure that the token is valid. You can set the token when you initialize the SDK and update the token when you set the request parameters.
device_idStringYesThe unique identifier of the device, for example, the media access control (MAC) address, serial number, or pseudo unique ID of the device.
debug_pathStringNoThe directory where audio files generated during the debugging are stored. If the save_log parameter is set to true when you initialize the SDK, intermediate results are stored in this directory.
save_wavStringNoThis parameter is valid if the save_log parameter is set to true when you initialize the SDK. This parameter specifies whether to store audio files generated during the debugging in the directory specified by the debug_path parameter. Make sure that the directory is writable.

2. Send a request to use the short sentence recognition service

You must set the request parameters for the client to send a service request. You can set the request parameters in the JSON format by using the setParams method in the SDK. The parameter configuration applies to all service requests. The following table describes the request parameters.

Parameter

Type

Required

Description

appkeyStringNoThe appkey of your project created in the Intelligent Speech Interaction console. This parameter is generally set when you initialize the SDK.
tokenStringNoThe token provided as the credential for you to use Intelligent Speech Interaction. You can update the token as required by setting this parameter.
service_typeIntYesThe type of speech service to be requested. Set this parameter to 0, which indicates the short sentence recognition service.
direct_ipStringNoThe IP address that is resolved from the Domain Name System (DNS) domain name. The client completes the resolution and uses the obtained IP address to access the service.
nls_configJsonObjectNoThe service parameters.

The following table describes the parameters in the nls_config parameter.

Parameter

Type

Required

Description

sr_formatStringNoThe audio encoding format. The short sentence recognition service supports the Opus and PCM formats. Default value: OPUS. Note: This parameter must be set to PCM if the sample_rate parameter is set to 8000.
sample_rateIntegerNoThe audio sampling rate. Unit: Hz. Default value: 16000. After you set this parameter, you must specify a model or scene that is applicable to the audio sampling rate for your project in the Intelligent Speech Interaction console.
enable_intermediate_resultBooleanNoSpecifies whether to return intermediate results. Default value: False.
enable_punctuation_predictionBooleanNoSpecifies whether to add punctuation marks during post-processing. Default value: False.
enable_inverse_text_normalizationBooleanNoSpecifies whether to enable inverse text normalization (ITN) during post-processing. Valid values: true and false. Default value: false. If you set this parameter to true, Chinese numerals are converted to Arabic numerals. Note: ITN is not implemented on words.
customization_idStringNoThe ID of the custom speech training model.
vocabulary_idStringNoThe vocabulary ID of custom extensive hotwords.
enable_voice_detectionBooleanNoSpecifies whether to enable voice detection. Default value: False.
max_start_silenceIntegerNoThe maximum duration of start silence. Unit: milliseconds. <br >If the actual duration of start silence exceeds the value of this parameter, the server sends a TaskFailed event to end the recognition task. This parameter takes effect only when the enable_voice_detection parameter is set to true.
max_end_silenceIntegerNoThe maximum duration of end silence. Unit: milliseconds. Valid values: 200 to 2000. <br >If the actual duration of end silence exceeds the value of this parameter, the server sends a RecognitionCompleted message to complete the recognition task. Then, the subsequent speech is no longer processed. This parameter takes effect only when the enable_voice_detection parameter is set to true.

3. Send audio data from the client

The client cyclically sends audio data to the server and continuously receives recognition results from the server.

  • If the enable_intermediate_result parameter is set to true, the SDK reports multiple EVENT_ASR_PARTIAL_RESULT events by calling the onNuiEventCallback method to return intermediate results of a sentence. For example, the server returns the following response:

    {
        "header": {
            "namespace": "SpeechRecognizer",
            "name": "RecognitionResultChanged",
            "status": 20000000,
            "message_id": "e06d2b5d50ca40d5a50d4215c7c8****",
            "task_id": "4c3502c7a5ce4ac3bdc488749ce4****",
            "status_text": "Gateway:SUCCESS:Success."
        },
        "payload": {
            "result": "Weather in Beijing"
        }
    }

    The following table describes the parameters in the header object.

    Parameter

    Type

    Description

    namespaceStringThe namespace of the message.
    nameStringThe name of the message. The RecognitionResultChanged message indicates that an intermediate result is obtained.
    statusIntegerThe HTTP status code. It indicates whether the request is successful. For more information, see the "Error codes" section of this topic.
    message_idStringThe ID of the message, which is automatically generated by the SDK.
    task_idStringThe GUID of the task. Record the value of this parameter to facilitate troubleshooting.
    status_textStringThe status message.

    The following table describes the parameters in the payload object.

    Parameter

    Type

    Description

    result

    String

    The intermediate result of the recognition task.

    Note

    The latest intermediate result may be different from the final result. Use the result returned in the EVENT_ASR_RESULT event as the final result.

  • If the enable_intermediate_result parameter is set to false, the server does not return any messages in this step.

4. Complete the recognition task

The client sends a request for stopping short sentence recognition to the server. The server returns the final recognition result. For example, the server returns the following response:

{
    "header": {
        "namespace": "SpeechRecognizer",
        "name": "RecognitionCompleted",
        "status": 20000000,
        "message_id": "10490c992aef44eaa4246614838f****",
        "task_id": "4c3502c7a5ce4ac3bdc488749ce4****",
        "status_text": "Gateway:SUCCESS:Success."
    },
    "payload": {
        "result": "Weather in Beijing. "
    }
}

The following table describes the parameters in the header object.

Parameter

Type

Description

namespaceStringThe namespace of the message.
nameStringThe name of the message. The RecognitionCompleted message indicates that the recognition task is completed.
statusIntegerThe HTTP status code. It indicates whether the request is successful. For more information, see the "Error codes" section of this topic.
message_idStringThe ID of the message, which is automatically generated by the SDK.
task_idStringThe GUID of the task. Record the value of this parameter to facilitate troubleshooting.
status_textStringThe status message.

The following table describes the parameters in the payload object.

Parameter

Type

Description

result

String

The final recognition result.

Error codes

For more information about the error codes that the short sentence recognition service may return, see Error codes.