All Products
Search
Document Center

Android SDK 2.0

Last Updated: Jun 02, 2020

Note:

Download and installation

  1. Download the Android SDK and sample code.
  2. Decompress the downloaded package and find the demo project in the nls-sdk-android directory. In the app/libs directory, you can find an .aar file, which is the SDK package.
  3. Open the demo project in Android Studio to view the sample code. The sample code for real-time speech recognition contains the following two activities:
    • SpeechTransciberActivity demonstrates how to import an audio file to the SDK for recognition.
    • SpeechTransciberWithRecorderActivity demonstrates how to use the SDK to record and recognize audio data. We recommend that you use this method.
  4. Before running the sample code, create a project and obtain the project appkey in the Intelligent Speech Interaction console, and obtain the service token. For more information, see relevant topics in Quick Start.

Key objects

  • NlsClient: the speech processing client, which is equivalent to a factory for all speech processing classes. You can globally create an NlsClient instance.
  • SpeechTranscriber: the real-time speech recognition object, representing a speech recognition request. You can record audio data or obtain audio data from an audio file, and send the audio data to the SDK.
  • SpeechTranscriberWithRecorder: the real-time speech recognition object, representing a speech recognition request. This object provides recording function based on the SpeechTranscriber object. We recommend that you use this method because it is easy to use.
  • SpeechTranscriberCallback: the object of speech recognition callback functions. These callbacks are fired when recognition results are returned or any errors occur. You can customize these callbacks with your own logic as shown in the demo.
  • SpeechTranscriberWithRecorderCallback: the object of speech recognition callback functions. This object provides callbacks for recorded audio data and recording volume based on the SpeechTranscriberCallback object.

Usage (taking SpeechTranscriber as an example)

  1. Create an NlsClient instance.
  2. Define callbacks of the SpeechTranscriberCallback object to process recognition results and errors as required.
  3. Call the NlsClient.createTranscriberRequest() method to create a SpeechTranscriber instance.
  4. Set SpeechTranscriber parameters, including accessToken and appkey.
  5. Call the SpeechTranscriber.start() method to connect the server.
  6. Collect audio data and call the SpeechTranscriber.sendAudio() method to send the audio data to the server.
  7. Process the recognition result or error using a callback.
  8. Call the SpeechTranscriber.stop() method to stop recognition.
  9. To initiate a new request, repeat step 3 to step 8. The SpeechTranscriber object cannot be reused. You must create a new one.
  10. Call the NlsClient.release() method to release the client instance.

ProGuard configuration

If you use the obfuscating code, configure the following command line in the proguard-rules.pro file:

  1. -keep class com.alibaba.idst.util.*{*;}

Sample code

Create a recognition request.

  1. // Create a SpeechTranscriberCallback object.
  2. SpeechTranscriberCallback callback = new MyCallback();
  3. // Create a recognition request.
  4. speechTranscriber = client.createTranscriberRequest(callback);
  5. // set url
  6. speechTranscriber.setUrl("wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1");
  7. // Obtain a dynamic token. For more information, see https://www.alibabacloud.com/help/doc-detail/72153.htm.
  8. speechTranscriber.setToken("your token");
  9. // Obtain an appkey in the Intelligent Speech Interaction console (https://nls-portal.console.aliyun.com/).
  10. speechTranscriber.setAppkey("your token");
  11. // Specify the server to return intermediate results. For more information about parameters, see official documentation.
  12. speechTranscriber.enableIntermediateResult(true);
  13. // Start the speech recognition.
  14. int code = speechTranscriber.start();

Collect and send audio data to the recognition server. You can collect the audio data from a file or other sources. If you use the SpeechTranscriberWithRecorder object, you can skip this step because the SDK automatically processes and sends the audio data to the server.

  1. ByteBuffer buf = ByteBuffer.allocateDirect(SAMPLES_PER_FRAME);
  2. while(sending){
  3. buf.clear();
  4. // Collect audio data.
  5. int readBytes = mAudioRecorder.read(buf, SAMPLES_PER_FRAME);
  6. byte[] bytes = new byte[SAMPLES_PER_FRAME];
  7. buf.get(bytes, 0, SAMPLES_PER_FRAME);
  8. if (readBytes>0 && sending){
  9. // Send the audio data to the recognition server.
  10. int code = recognizer.sendAudio(bytes, bytes.length);
  11. if (code < 0) {
  12. Log.w(TAG, "Failed to send audio!");
  13. break;
  14. }
  15. }
  16. buf.position(readBytes);
  17. buf.flip();
  18. }

Handle the callback.

  1. // Detect the beginning of a new sentence.
  2. @Override
  3. public void onSentenceBegin(String msg, int code)
  4. {
  5. Log.i(TAG, "Sentence begin");
  6. }
  7. // Detect the end of the sentence and generate a text of the complete recognition result for the sentence.
  8. @Override
  9. public void onSentenceEnd(final String msg, int code)
  10. {
  11. Log.d(TAG,"OnSentenceEnd " + msg + ": " + String.valueOf(code));
  12. }
  13. // Return intermediate results. This callback is fired only when required options are enabled.
  14. @Override
  15. public void onTranscriptionResultChanged(final String msg, int code)
  16. {
  17. Log.d(TAG,"OnTranscriptionResultChanged " + msg + ": " + String.valueOf(code));
  18. }