All Products
Search
Document Center

Java SDK 2.0

Last Updated: Jun 02, 2020

Note:

  • Read API reference before using the SDK.
  • Since Java SDK V2.1.0, nls-sdk-long-asr is renamed as nls-sdk-transcriber.When you upgrade the SDK, delete nls-sdk-long-asr and add callbacks as prompted.

Download and installation

You can download the latest version of the SDK from the Maven repository:

  1. <dependency>
  2. <groupId>com.alibaba.nls</groupId>
  3. <artifactId>nls-sdk-transcriber</artifactId>
  4. <version>2.1.0</version>
  5. </dependency>

For more information about how to use the Java SDK, see the sample code below. Download the Java SDK demo.

Decompress the demo package. Run the mvn package command from the pom directory. An executable JAR package nls-example-transcriber-2.0.0-jar-with-dependencies.jar is generated in the target directory. Copy this JAR package to the target server. You can use it for quick service validation and stress testing.

Service validation

Run the java -cp nls-example-transcriber-2.0.0-jar-with-dependencies.jar com.alibaba.nls.client.SpeechTranscriberDemo command.Set parameters as required. Then, the logs/nls.log file is generated in the directory where the command is run.

Stress testing

Run the java -jar nls-example-transcriber-2.0.0-jar-with-dependencies.jar command.Set parameters as required. The parameter for Alibaba Cloud URL is wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1. Provide .pcm audio files with the audio sampling rate of 16,000 Hz. Set the maximum number of concurrent calls based on your purchased service.

Note: Stress testing on more than two concurrent calls will generate fees.

Key objects

  • NlsClient: the speech processing client, which is equivalent to a factory for all speech processing classes. You can globally create an NlsClient instance. This object is thread-safe.
  • SpeechTranscriber: the real-time speech recognition object. You can use this object to set request parameters, send a request, and send audio data. This object is not thread-safe.
  • SpeechTranscriberListener: the real-time speech recognition result listener, which listens to recognition results. This object is not thread-safe.

For more information, see Java API Reference.

Notes on SDK calls

  1. You can globally create an NlsClient object and reuse it if necessary. Based on Netty, the creation of an NlsClient object consumes time and resources, but the created NlsClient object can be reused. We recommend that you create and close an NlsClient object based on the lifecycle of your application.
  2. The SpeechTranscriber object cannot be reused. You must create a SpeechTranscriber object for each recognition task. For example, to process N audio files, you must create N SpeechTranscriber objects to complete N recognition tasks.
  3. A SpeechTranscriberListener object corresponds to a SpeechTranscriber object. You cannot use a SpeechTranscriberListener object for multiple SpeechTranscriber objects. Otherwise, you may fail to distinguish recognition tasks.
  4. The Java SDK is dependent on Netty. The version of Netty must be 4.1.17.Final or later. If your application is dependent on Netty, ensure that the version of Netty is appropriate.

Sample code

Note 1: The demo uses an audio file at the sampling rate of 16,000 Hz. To obtain correct recognition results, set the mode to universal model for the project to which the appkey is bound in the Intelligent Speech Interaction console. In actual use, you need to select the model according to the audio sampling rate. For more information about model setting, see Manage projects.

nls-sample-16k.wav

  1. client = new NlsClient("wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1", accessToken);

Example:

  1. package com.alibaba.nls.client;
  2. import java.io.File;
  3. import java.io.FileInputStream;
  4. import com.alibaba.nls.client.protocol.InputFormatEnum;
  5. import com.alibaba.nls.client.protocol.NlsClient;
  6. import com.alibaba.nls.client.protocol.SampleRateEnum;
  7. import com.alibaba.nls.client.protocol.asr.SpeechTranscriber;
  8. import com.alibaba.nls.client.protocol.asr.SpeechTranscriberListener;
  9. import com.alibaba.nls.client.protocol.asr.SpeechTranscriberResponse;
  10. import org.slf4j.Logger;
  11. import org.slf4j.LoggerFactory;
  12. public class SpeechTranscriberDemo {
  13. private String appKey;
  14. private NlsClient client;
  15. private static final Logger logger = LoggerFactory.getLogger(SpeechTranscriberDemo.class);
  16. public SpeechTranscriberDemo(String appKey, String token, String url) {
  17. this.appKey = appKey;
  18. //Create an NlsClient object. You can globally create an NlsClient object and specify the endpoint.
  19. if(url.isEmpty()) {
  20. client = new NlsClient("wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1", token);
  21. }else {
  22. client = new NlsClient(url, token);
  23. }
  24. }
  25. private static SpeechTranscriberListener getTranscriberListener() {
  26. SpeechTranscriberListener listener = new SpeechTranscriberListener() {
  27. // Return intermediate results. The server returns this message when it recognizes a word.
  28. // This message is returned only when the setEnableIntermediateResult parameter is set to true.
  29. @Override
  30. public void onTranscriptionResultChange(SpeechTranscriberResponse response) {
  31. System.out.println("task_id: " + response.getTaskId() +
  32. ", name: " + response.getName() +
  33. //The status code. The code 20000000 indicates that the request is successful.
  34. ", status: " + response.getStatus() +
  35. //The sequence number of the sentence, which starts from 1.
  36. ", index: " + response.getTransSentenceIndex() +
  37. //The recognition result of the sentence.
  38. ", result: " + response.getTransSentenceText() +
  39. //The duration of currently processed audio streams, in milliseconds.
  40. ", time: " + response.getTransSentenceTime());
  41. }
  42. @Override
  43. public void onTranscriberStart(SpeechTranscriberResponse response) {
  44. System.out.println("task_id: " + response.getTaskId() + ", name: " + response.getName() + ", status: " + response.getStatus());
  45. }
  46. @Override
  47. public void onSentenceBegin(SpeechTranscriberResponse response) {
  48. System.out.println("task_id: " + response.getTaskId() + ", name: " + response.getName() + ", status: " + response.getStatus());
  49. }
  50. //Recognize a complete sentence. The server can detect the beginning and end of a sentence. When the server detects the end of the sentence, it returns this message.
  51. @Override
  52. public void onSentenceEnd(SpeechTranscriberResponse response) {
  53. System.out.println("task_id: " + response.getTaskId() +
  54. ", name: " + response.getName() +
  55. //The status code. The code 20000000 indicates that the request is successful.
  56. ", status: " + response.getStatus() +
  57. //The sequence number of the sentence, which starts from 1.
  58. ", index: " + response.getTransSentenceIndex() +
  59. //The recognition result of the sentence.
  60. ", result: " + response.getTransSentenceText() +
  61. //The confidence level.
  62. ", confidence: " + response.getConfidence() +
  63. //The time when the server detects the beginning of the sentence.
  64. ", begin_time: " + response.getSentenceBeginTime() +
  65. //The duration of currently processed audio streams, in milliseconds.
  66. ", time: " + response.getTransSentenceTime());
  67. }
  68. //Indicate that the recognition is completed.
  69. @Override
  70. public void onTranscriptionComplete(SpeechTranscriberResponse response) {
  71. System.out.println("task_id: " + response.getTaskId() + ", name: " + response.getName() + ", status: " + response.getStatus());
  72. }
  73. @Override
  74. public void onFail(SpeechTranscriberResponse response) {
  75. System.out.println("task_id: " + response.getTaskId() + ", status: " + response.getStatus() + ", status_text: " + response.getStatusText());
  76. }
  77. };
  78. return listener;
  79. }
  80. // calculate the corresponding equivalent voice length based on the binary data size
  81. public static int getSleepDelta(int dataSize, int sampleRate) {
  82. int sampleBytes = 16;
  83. // only supports single channel
  84. int soundChannel = 1;
  85. return (dataSize * 10 * 8000) / (160 * sampleRate);
  86. }
  87. public void process(String filepath) {
  88. SpeechTranscriber transcriber = null;
  89. try {
  90. //Create an object and establish a connection
  91. transcriber = new SpeechTranscriber(client, getTranscriberListener());
  92. transcriber.setAppKey(appKey);
  93. //Specify the audio coding format
  94. transcriber.setFormat(InputFormatEnum.PCM);
  95. //Specify the audio sampling rate
  96. transcriber.setSampleRate(SampleRateEnum.SAMPLE_RATE_16K);
  97. //Specify whether to return intermediate results
  98. transcriber.setEnableIntermediateResult(false);
  99. //Specify whether to add punctuation marks to the recognition result
  100. transcriber.setEnablePunctuation(true);
  101. //Specify whether to enable inverse text normalization (ITN). A value of true indicates that Chinese numerals are converted to Arabic numerals
  102. transcriber.setEnableITN(false);
  103. //Serialize preceding parameters to the JSON format and send them to the server for confirmation
  104. transcriber.start();
  105. File file = new File(filepath);
  106. FileInputStream fis = new FileInputStream(file);
  107. byte[] b = new byte[3200];
  108. int len;
  109. while ((len = fis.read(b)) > 0) {
  110. logger.info("send data pack length: " + len);
  111. transcriber.send(b);
  112. // if it is real-time speech, then no sleep, if it is 8k sample rate, the second parameter is changed to 8000
  113. // if 8000 sample rate, 3200 bytes is recommended for sleep 200ms. if 16000 sample rate, 3200 bytes is recommended for sleep 100ms.
  114. int deltaSleep = getSleepDelta(len, 16000);
  115. Thread.sleep(deltaSleep);
  116. }
  117. //Notify the server that all audio data has been sent and wait for the completion message from the server.
  118. long now = System.currentTimeMillis();
  119. logger.info("ASR wait for complete");
  120. transcriber.stop();
  121. logger.info("ASR latency : " + (System.currentTimeMillis() - now) + " ms");
  122. } catch (Exception e) {
  123. System.err.println(e.getMessage());
  124. } finally {
  125. if (null != transcriber) {
  126. transcriber.close();
  127. }
  128. }
  129. }
  130. public void shutdown() {
  131. client.shutdown();
  132. }
  133. public static void main(String[] args) throws Exception {
  134. String appKey = null;
  135. String token = null;
  136. String url = ""; // Default: wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1
  137. if (args.length == 2) {
  138. appKey = args[0];
  139. token = args[1];
  140. } else if (args.length == 3) {
  141. appKey = args[0];
  142. token = args[1];
  143. url = args[2];
  144. } else {
  145. System.err.println("run error, need params(url is optional): " + "<app-key> <token> [url]");
  146. System.exit(-1);
  147. }
  148. String filepath = "nls-sample-16k.wav";
  149. SpeechTranscriberDemo demo = new SpeechTranscriberDemo(appKey, token, url);
  150. demo.process(filepath);
  151. demo.shutdown();
  152. }
  153. }