Android SDK - Intelligent Speech Interaction - Alibaba Cloud Documentation Center

The real-time speech recognition service provides an SDK for Android. This topic describes how to download and install the SDK. This topic also provides sample code for you to use the SDK.

Note

We recommend that you use the latest version of the SDK for Android. The current version is no longer updated. For more information, see NUI SDK for Android.

Prerequisites

You understand how the SDK works. For more information, see Overview.
A project is created in the Intelligent Speech Interaction console and the appkey of the project is obtained. For more information, see Create a project.
An access token that is used to call Intelligent Speech Interaction services is obtained. For more information, see Obtain an access token.

Download and installation

Download the SDK for Android and sample code.
Decompress the downloaded ZIP package to obtain the nls-sdk-android folder that stores the demo project. In the app/libs directory, you can find an .aar file, which is the SDK package.
Open the demo project in Android Studio to view the sample code. The sample code for real-time speech recognition contains the following two code files:
1. SpeechTransciberActivity.java: demonstrates how to import an audio file to the SDK for recognition.
2. SpeechTransciberWithRecorderActivity.java: demonstrates how to use the SDK to record and recognize audio data. We recommend that you use this method.

Key objects

NlsClient: the speech processing client. You can use this client to process short sentence recognition, real-time speech recognition, and speech synthesis tasks. The client is thread-safe. We recommend that you globally create one NlsClient instance.
SpeechTranscriber: the real-time speech recognition object, which represents a speech recognition request. You can record audio data or obtain audio data from an audio file, and send the audio data to the SDK.
SpeechTranscriberWithRecorder: the real-time speech recognition object, which represents a speech recognition request. This object provides the recording feature based on the SpeechTranscriber object. We recommend that you use this object because it is easy to use.
SpeechTranscriberCallback: the object of speech recognition callback functions. Callbacks are fired when recognition results are returned or any errors occur. You can call this object by running the sample code and customize the callbacks with your own logic.
SpeechTranscriberWithRecorderCallback: the object of speech recognition callback functions. This object provides callbacks for recorded audio data and audio volume based on the SpeechTranscriberCallback object.

Call procedure

This section describes how to call the real-time speech recognition service by using the SpeechTransciberActivity.java file.

Create an NlsClient instance.
Define callbacks of the SpeechTranscriberCallback object to process recognition results and errors based on your business needs.
Call the NlsClient.createTranscriberRequest() method to create a SpeechTranscriber instance.
Set the parameters of the SpeechTranscriber instance, including the access token and appkey.
Call the SpeechTranscriber.start() method to connect to the server.
Collect audio data and call the SpeechTranscriber.sendAudio() method to send the audio data to the server.
Process the recognition result or error by using a callback.
Call the SpeechTranscriber.stop() method to stop recognition.
To initiate a new request, repeat Step 3 to Step 8.
Call the NlsClient.release() method to release the NlsClient instance.

Proguard configuration

If you use obfuscated code, configure the following command line in the proguard-rules.pro file:

-keep class com.alibaba.idst.util. *{*;}

Sample code

Create a recognition request.

// Create a SpeechTranscriberCallback object.
SpeechTranscriberCallback callback = new MyCallback();
// Create a recognition request.
speechTranscriber = client.createTranscriberRequest(callback);
speechTranscriber.setToken("");
speechTranscriber.setAppkey("");
// Specify the server to return intermediate results.
speechTranscriber.enableIntermediateResult(true);
// Start the speech recognition.
int code = speechTranscriber.start();

Collect and send audio data to the server.

You can collect the audio data from a file or other sources. If you use the SpeechTranscriberWithRecorder object, you can skip this step because the SDK automatically processes and sends the audio data to the server.

ByteBuffer buf = ByteBuffer.allocateDirect(SAMPLES_PER_FRAME);
while(sending){
    buf.clear();
    // Collect audio data.
    int readBytes = mAudioRecorder.read(buf, SAMPLES_PER_FRAME);
    byte[] bytes = new byte[SAMPLES_PER_FRAME];
    buf.get(bytes, 0, SAMPLES_PER_FRAME);
    if (readBytes>0 && sending){
        // Send the audio data to the server.
        int code = recognizer.sendAudio(bytes, bytes.length);
        if (code < 0) {
            Log.w(TAG, "Failed to send audio!") ;
            break;
        }
    }
    buf.position(readBytes);
    buf.flip();
}

Set the callbacks.

// This recognition callback is fired when a sentence begins.
@Override
public void onSentenceBegin(String msg, int code)
{
    Log.i(TAG, "Sentence begin");
}
// This recognition callback is fired when the recognition is completed. The text of the complete sentence is obtained.
@Override
public void onSentenceEnd(final String msg, int code)
{
    Log.d(TAG,"OnSentenceEnd " + msg + ": " + String.valueOf(code));
}

// This recognition callback is fired when the intermediate result is returned. This callback is fired only when required options are enabled.
@Override
public void onTranscriptionResultChanged(final String msg, int code)
{
    Log.d(TAG,"OnTranscriptionResultChanged " + msg + ": " + String.valueOf(code));
}