All Products
Search
Document Center

Java SDK 2.0

Last Updated: Oct 25, 2019

Note:

  • Read API reference before using the SDK.
  • Since Java SDK V2.1.0, nls-sdk-short-asr is renamed as nls-sdk-recognizer.When you upgrade the SDK, delete nls-sdk-short-asr and add callbacks as prompted.

Download and installation

You can download the latest version of the SDK from the Maven repository:

  1. <dependency>
  2. <groupId>com.alibaba.nls</groupId>
  3. <artifactId>nls-sdk-recognizer</artifactId>
  4. <version>2.1.0</version>
  5. </dependency>

For more information about how to use the Java SDK, see the sample code below. Download the Java SDK demo.

Decompress the demo package. Run the mvn package command from the pom directory. An executable JAR package nls-example-recognizer-2.0.0-jar-with-dependencies.jar is generated in the target directory. Copy this JAR package to the target server. You can use it for quick service validation and stress testing.

Service validation

Run the java -cp nls-example-recognizer-2.0.0-jar-with-dependencies.jar com.alibaba.nls.client.SpeechRecognizerDemo command.Set parameters as required. Then, the logs/nls.log file is generated in the directory where the command is run.

Stress testing

Run the java -jar nls-example-recognizer-2.0.0-jar-with-dependencies.jar command.Set parameters as required. The parameter for Alibaba Cloud URL is wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1. Provide .pcm audio files with the audio sampling rate of 16,000 Hz. Set the maximum number of concurrent calls based on your purchased service.

Note: Stress testing on more than two concurrent calls will generate fees.

Key objects

  • NlsClient: the speech processing client, which is equivalent to a factory for all speech processing classes. You can globally create an NlsClient instance. This object is thread-safe.
  • SpeechRecognizer: the short sentence recognition object. You can use this object to set request parameters, send a request, and send audio data. This object is not thread-safe.
  • SpeechRecognizerListener: the recognition result listener, which listens to recognition results. This object is not thread-safe.

For more information, see Java API Reference.

Notes on SDK calls

  1. You can globally create an NlsClient object and reuse it if necessary. Based on Netty, the creation of an NlsClient object consumes time and resources, but the created NlsClient object can be reused. We recommend that you create and close an NlsClient object based on the lifecycle of your application.
  2. The SpeechRecognizer object cannot be reused. You must create a SpeechRecognizer object for each recognition task. For example, to process N audio files, you must create N SpeechRecognizer objects to complete N recognition tasks.
  3. A SpeechRecognizerListener object corresponds to a SpeechRecognizer object. You cannot use a SpeechRecognizerListener object for multiple SpeechRecognizer objects. Otherwise, you may fail to distinguish recognition tasks.
  4. The Java SDK is dependent on Netty. The version of Netty must be 4.1.17.Final or later. If your application is dependent on Netty, ensure that the version of Netty is appropriate.

Sample code

Note 1: The demo uses an audio file at the sampling rate of 16,000 Hz. To obtain correct recognition results, set the mode to universal model for the project to which the appkey is bound in the Intelligent Speech Interaction console. In actual use, you need to select the model according to the audio sampling rate. For more information about model setting, see Manage projects.

nls-sample-16k.wav

Note 2: The demo shows how to use the specific URL when you create the NlsClient object.

  1. client = new NlsClient("wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1", accessToken);

Example:

  1. package com.alibaba.nls.client;
  2. import java.io.File;
  3. import java.io.FileInputStream;
  4. import com.alibaba.nls.client.protocol.InputFormatEnum;
  5. import com.alibaba.nls.client.protocol.NlsClient;
  6. import com.alibaba.nls.client.protocol.SampleRateEnum;
  7. import com.alibaba.nls.client.protocol.asr.SpeechRecognizer;
  8. import com.alibaba.nls.client.protocol.asr.SpeechRecognizerListener;
  9. import com.alibaba.nls.client.protocol.asr.SpeechRecognizerResponse;
  10. import org.slf4j.Logger;
  11. import org.slf4j.LoggerFactory;
  12. /**
  13. * for demo show only
  14. */
  15. public class SpeechRecognizerDemo {
  16. private static final Logger logger = LoggerFactory.getLogger(SpeechRecognizerDemo.class);
  17. private String appKey;
  18. NlsClient client;
  19. public SpeechRecognizerDemo(String appKey, String token, String url) {
  20. this.appKey = appKey;
  21. // Create an NlsClient object. You can globally create an NlsClient object and specify the endpoint.
  22. if(url.isEmpty()) {
  23. client = new NlsClient("wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1", token);
  24. }else {
  25. client = new NlsClient(url, token);
  26. }
  27. }
  28. // user-define params
  29. private static SpeechRecognizerListener getRecognizerListener(int myOrder, String userParam) {
  30. SpeechRecognizerListener listener = new SpeechRecognizerListener() {
  31. // Return intermediate results. The server returns this message when it recognizes a word.
  32. // This message is returned only when the setEnableIntermediateResult parameter is set to true.
  33. @Override
  34. public void onRecognitionResultChanged(SpeechRecognizerResponse response) {
  35. // The message name RecognitionResultChanged.
  36. // The status code. The code 20000000 indicates that the request is successful
  37. // The recognized text.
  38. System.out.println("name: " + response.getName() + ", status: " + response.getStatus() + ", result: " + response.getRecognizedText());
  39. }
  40. //Indicate that the recognition is completed.
  41. @Override
  42. public void onRecognitionCompleted(SpeechRecognizerResponse response) {
  43. System.out.println("name: " + response.getName() + ", status: " + response.getStatus() + ", result: " + response.getRecognizedText());
  44. }
  45. @Override
  46. public void onStarted(SpeechRecognizerResponse response) {
  47. System.out.println("myOrder: " + myOrder + "; myParam: " + userParam + "; task_id: " + response.getTaskId());
  48. }
  49. @Override
  50. public void onFail(SpeechRecognizerResponse response) {
  51. // response.getStatus() : the error message.
  52. // task_id : very important, unique id
  53. System.out.println("task_id: " + response.getTaskId() + ", status: " + response.getStatus() + ", status_text: " + response.getStatusText());
  54. }
  55. };
  56. return listener;
  57. }
  58. // calculate the corresponding equivalent voice length based on the binary data size
  59. public static int getSleepDelta(int dataSize, int sampleRate) {
  60. int sampleBytes = 16;
  61. // only supports single channel
  62. int soundChannel = 1;
  63. return (dataSize * 10 * 8000) / (160 * sampleRate);
  64. }
  65. public void process(String filepath, int sampleRate) {
  66. SpeechRecognizer recognizer = null;
  67. try {
  68. String myParam = "user-param";
  69. int myOrder = 1234;
  70. SpeechRecognizerListener listener = getRecognizerListener(myOrder, myParam);
  71. recognizer = new SpeechRecognizer(client, listener);
  72. recognizer.setAppKey(appKey);
  73. //audo format
  74. recognizer.setFormat(InputFormatEnum.PCM);
  75. // Specify the audio coding format.
  76. if(sampleRate == 16000) {
  77. recognizer.setSampleRate(SampleRateEnum.SAMPLE_RATE_16K);
  78. } else if(sampleRate == 8000) {
  79. recognizer.setSampleRate(SampleRateEnum.SAMPLE_RATE_8K);
  80. }
  81. //intermediate result
  82. recognizer.setEnableIntermediateResult(true);
  83. long now = System.currentTimeMillis();
  84. recognizer.start();
  85. logger.info("ASR start latency : " + (System.currentTimeMillis() - now) + " ms");
  86. File file = new File(filepath);
  87. FileInputStream fis = new FileInputStream(file);
  88. byte[] b = new byte[3200];
  89. int len;
  90. while ((len = fis.read(b)) > 0) {
  91. logger.info("send data pack length: " + len);
  92. recognizer.send(b);
  93. // if it is real-time speech, then no sleep, if it is 8k sample rate, the second parameter is changed to 8000
  94. // if 8000 sample rate, 3200 bytes is recommended for sleep 200ms. if 16000 sample rate, 3200 bytes is recommended for sleep 100ms.
  95. int deltaSleep = getSleepDelta(len, sampleRate);
  96. Thread.sleep(deltaSleep);
  97. }
  98. now = System.currentTimeMillis();
  99. logger.info("ASR wait for complete");
  100. recognizer.stop();
  101. logger.info("ASR stop latency : " + (System.currentTimeMillis() - now) + " ms");
  102. fis.close();
  103. } catch (Exception e) {
  104. System.err.println(e.getMessage());
  105. } finally {
  106. //close
  107. if (null != recognizer) {
  108. recognizer.close();
  109. }
  110. }
  111. }
  112. public void shutdown() {
  113. client.shutdown();
  114. }
  115. public static void main(String[] args) throws Exception {
  116. String appKey = null;
  117. String token = null;
  118. String url = ""; // default:wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1
  119. if (args.length == 2) {
  120. appKey = args[0];
  121. token = args[1];
  122. } else if (args.length == 3) {
  123. appKey = args[0];
  124. token = args[1];
  125. url = args[2];
  126. } else {
  127. System.err.println("run error, need params(url is optional): " + "<app-key> <token> [url]");
  128. System.exit(-1);
  129. }
  130. SpeechRecognizerDemo demo = new SpeechRecognizerDemo(appKey, token, url);
  131. demo.process("./nls-sample-16k.wav", 16000);
  132. demo.shutdown();
  133. }
  134. }