All Products
Search
Document Center

Intelligent Speech Interaction:NUI SDK for Android

Last Updated:Oct 12, 2023

The short sentence recognition service provides a Natural User Interaction (NUI) SDK for Android. This topic describes how to download the NUI SDK for Android, lists the key methods in the SDK, and provides sample code for you to use the SDK.

Prerequisites

  • You understand how the SDK works. For more information, see Overview.

  • The appkey of your project is obtained. For more information, see Create a project.

  • A token used to access the service is obtained. For more information, see Obtain an access token.

Download and install the SDK

  1. Download the NUI SDK for Android and sample code.

  2. Decompress the downloaded package to obtain the demo project and find the SDK package in the app/libs directory, which is an AAR package.

  3. Open the demo project in Android Studio.

    The sample code for the short sentence recognition service is stored in the SpeechRecognizerActivity.java file.

Key methods

  • initialize: initializes the SDK.

    /**
         * Initialize the SDK. The SDK uses a singleton pattern. To initialize the SDK again, you must first release the SDK. Do not call the SDK on the user interface (UI) thread. Otherwise, the process may be blocked.
         * @param callback: the event listener callback. For more information, see the following callback methods.
         * @param parameters: the parameters used in the initialization. For more information, see Overview.
         * @param level: the log level to use. The smaller the parameter value is, the more logs are recorded.
         * @param save_log: specifies whether to store logs in files. The debug_path parameter specifies the directory where log files are stored.
         * @return: the returned error code. For more information, see Error codes.
         */
    public synchronized int initialize(final INativeNuiCallback callback,
                                               String parameters,
                                               final Constants.LogLevel level,
                                               final boolean save_log)

    INativeNuiCallback supports the following callback methods.

    • onNuiAudioStateChanged: determines whether to enable recording based on the value of AudioState.

      /**
           * When the start, stop, or cancel method is called, the SDK uses this callback method to instruct the client to enable or disable recording.
           * @param state: specifies whether to enable recording.
           */
          void onNuiAudioStateChanged(AudioState state);
    • onNuiNeedAudioData: provides audio data.

      /**
           * When the server starts a recognition task, this method is continuously called to read audio data from the client.
           * @param buffer: the storage space of the server for storing audio data.
           * @param len: the required number of bytes of the audio data to be read from the client.
           * @return: the actual number of bytes of the audio data that is read from the client.
           */
          int onNuiNeedAudioData(byte[] buffer, int len);
    • onNuiEventCallback: reports the occurred event to the server.

      /**
           * Report the occurred event to the server.
           * @param event: the event to be reported by the client. You can view possible events in the following table.
           * @param resultCode: the returned error code. This parameter is valid for the EVENT_ASR_ERROR event.
           * @param arg2: Reserved.
           * @param kwsResult: the wake-up word recognition feature.
           * @param asrResult: the recognition result of the audio stream.
           */
          void onNuiEventCallback(NuiEvent event, final int resultCode, final int arg2, KwsResult kwsResult, AsrResult asrResult);

      The following table lists the possible events in the SDK.

      Name

      Description

      EVENT_VAD_START

      Detects the beginning of a speech.

      EVENT_VAD_END

      Detects the end of a speech.

      EVENT_ASR_PARTIAL_RESULT

      Generates the intermediate recognition result.

      EVENT_ASR_RESULT

      Generates the final recognition result.

      EVENT_ASR_ERROR

      Determines the error cause based on the returned error code.

      EVENT_MIC_EEROR

      Returns a recording error.

  • set_params: sets SDK parameters in the JSON format.

    /**
         * Set parameters in the JSON format.
         * @param params: the request parameters. For more information, see Overview.
         * @return: the returned error code. For more information, see Error codes.
         */
        public synchronized int setParams(String params)
  • startDialog: starts the recognition task.

    /**
         * Start the recognition task.
         * @param vad_mode: the voice activity detection (VAD) mode of the task. Use the Production-to-Test (P2T) mode for a recognition task.
         * @return: the returned error code. For more information, see Error codes.
         */
        public synchronized int startDialog(VadMode vad_mode, String dialog_params)
  • stopDialog: completes the recognition task.

    /**
         * When this method is called, the server returns the final recognition result to the client and completes the recognition task.
         * @return: the returned error code. For more information, see Error codes.
         */
        public synchronized int stopDialog()
  • release: releases the SDK.

    /**
         * Release the SDK.
         * @return: the returned error code. For more information, see Error codes.
         */
        public synchronized int release()

Procedure

  1. Initialize the SDK and the recorder instance.

  2. Set request parameters based on your business requirements.

  3. Call the startDialog method to start the recognition task.

  4. Call the onNuiAudioStateChanged method based on the value of AudioState and then enable recording accordingly.

  5. Call the onNuiNeedAudioData method to send audio data to the server.

  6. Obtain the recognition result in the EVENT_ASR_PARTIAL_RESULT callback event.

  7. Call the stopDialog method to complete the recognition task.

  8. Call the release method to release the SDK.

ProGuard configuration

If you use the obfuscating code, configure the following command in the proguard-rules.pro file:

-keep class com.alibaba.idst.nui. *{*;}

Sample code

Initialize the NUI SDK

CommonUtils.copyAssetsData(this);
int ret = NativeNui.GetInstance().initialize(this, genInitParams(path,path2), Constants.LogLevel.LOG_LEVEL_VERBOSE, true);

The genInitParams method generates a JSON string that contains the information about the resource directory and user. The user information contains the following parameters:

private String genInitParams(String workpath, String debugpath) {
        String str = "";
        try{
            JSONObject object;
            object.put("app_key","");
            object.put("token","");
            object.put("device_id",Utils.getDeviceId());
            object.put("url","wss://nls-gateway-ap-southeast-1.aliyuncs.com:443/ws/v1");
            object.put("workspace", workpath);
            object.put("debug_path",debugpath);
            str = object.toString();
        } catch (JSONException e) {
            e.printStackTrace();
        }
  return str;
}

Set the request parameters

Set the request parameters in the format of a JSON string, as shown in the following code:

private String genParams() {
  String params = "";
  try {
    JSONObject nls_config = new JSONObject();
    nls_config.put("enable_intermediate_result", true);
    nls_config.put("enable_voice_detection", true);
    JSONObject parameters = new JSONObject();
    parameters.put("nls_config", nls_config);
    // Select the short sentence recognition service.
    parameters.put("service_type", Constants.kServiceTypeASR);
    params = parameters.toString();
  } catch (JSONException e) {
    e.printStackTrace();
  }
  return params;
}
NativeNui.GetInstance().setParams(genParams());

Start the recognition task

Call the startDialog method to start the recognition task.

NativeNui.GetInstance().startDialog(Constants.VadMode.TYPE_P2T, genDialogParams());

Handle callbacks

  • Call the onNuiAudioStateChanged method based on the value of AudioState. Then, the SDK determines whether to enable recording based on the obtained value.

    public void onNuiAudioStateChanged(Constants.AudioState state) {
            Log.i(TAG, "onNuiAudioStateChanged");
            if (state == Constants.AudioState.STATE_OPEN) {
                Log.i(TAG, "audio recorder start");
                mAudioRecorder.startRecording();
            } else if (state == Constants.AudioState.STATE_CLOSE) {
                Log.i(TAG, "audio recorder close");
                mAudioRecorder.release();
            } else if (state == Constants.AudioState.STATE_PAUSE) {
                Log.i(TAG, "audio recorder pause");
                mAudioRecorder.stop();
            }
        }
  • Call the onNuiNeedAudioData method to send audio data to the server.

    public int onNuiNeedAudioData(byte[] buffer, int len) {
            int ret = 0;
            if (mAudioRecorder.getState() != AudioRecord.STATE_INITIALIZED) {
                Log.e(TAG, "audio recorder not init");
                return -1;
            }
            ret = mAudioRecorder.read(buffer, 0, len);
            return ret;
        }
  • Call the onNuiEventCallback method to report the occurred event to the server. Do not call an SDK method in the callbacks. Otherwise, a deadlock may occur.

    public void onNuiEventCallback(Constants.NuiEvent event, final int resultCode, final int arg2, KwsResult kwsResult, AsrResult asrResult) {
            Log.i(TAG, "event=" + event);
            if (event == Constants.NuiEvent.EVENT_ASR_RESULT) {
                showText(asrView, asrResult.asrResult);
            } else if (event == Constants.NuiEvent.EVENT_ASR_PARTIAL_RESULT) {
                showText(asrView, asrResult.asrResult);
            } else if (event == Constants.NuiEvent.EVENT_ASR_ERROR) {
                ;
            }
        }

Complete the recognition task

NativeNui.GetInstance().stopDialog();