全部产品
Search
文档中心

java SDK 2.0

更新时间: 2020-11-03

提示:

  • 在使用SDK之前,请先确保已阅读了 接口说明文档
  • 从2.1.0版本开始原有nls-sdk-short-asr 更名为 nls-sdk-recognizer.升级时需确认删除掉nls-sdk-short-asr,并按编译提示添加相应回调方法即可。

下载安装

可从maven 服务器下载最新版本SDK:

  1. <dependency>
  2. <groupId>com.alibaba.nls</groupId>
  3. <artifactId>nls-sdk-recognizer</artifactId>
  4. <version>2.1.0</version>
  5. </dependency>

使用方式参见下面代码示例。Demo 源码下载链接

demo 解压后,在pom 目录运行mvn package ,会在target目录生成可执行jar:nls-example-recognizer-2.0.0-jar-with-dependencies.jar 将此jar拷贝到目标服务器,可用于快速验证及压测服务。

服务验证

java -cp nls-example-recognizer-2.0.0-jar-with-dependencies.jar com.alibaba.nls.client.SpeechRecognizerDemo并按提示提供相应参数,运行后在命令执行目录生成logs/nls.log

服务压测

java -jar nls-example-recognizer-2.0.0-jar-with-dependencies.jar并按提示提供相应参数,其中阿里云服务url参数为: wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1 ,语音文件请提供16k采样率 pcm 格式文件,并发数根据用户已购买并发谨慎选择。

温馨提示:自行压测超过2并发会产生费用。

关键接口

  • NlsClient:语音处理client,相当于所有语音相关处理类的factory,全局创建一个实例即可。线程安全。
  • SpeechRecognizer:一句话识别处理类,设置请求参数,发送请求及声音数据。非线程安全。
  • SpeechRecognizerListener:识别结果监听类,监听识别结果。非线程安全。

更多介绍参见API文档链接: Java API接口说明

SDK 调用注意事项

  1. NlsClient对象创建一次可以重复使用,每次创建消耗性能。NlsClient使用了netty的框架,创建时比较消耗时间和资源,但创建之后可以重复利用。建议调用程序将NlsClient的创建和关闭与程序本身的生命周期结合。
  2. SpeechRecognizer对象不能重复使用,一个识别任务对应一个SpeechRecognizer对象。例如有N个音频文件,则要进行N次识别任务,创建N个SpeechRecognizer对象。
  3. 实现的SpeechRecognizerListener对象和SpeechRecognizer对象是一一对应的,不能将一个SpeechRecognizerListener对象设置到多个SpeechRecognizer对象中,否则不能区分是哪个识别任务。
  4. Java SDK依赖了Netty网络库,版本需设置为4.1.17.Final及以上。如果您的应用中依赖了Netty,请确保版本符合要求。

代码示例

说明1:Demo中使用的音频文件为16000Hz采样率,请在管控台中将appKey对应项目的模型设置为通用模型,以获取正确的识别结果;如果使用其他音频,请设置为支持该音频场景的模型,模型设置请阅读管理项目一节。

nls-sample-16k.wav

说明2:Demo显示了如何在在创建NlsClient对象的时候设置URL:

  1. client = new NlsClient("ws://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1", accessToken);

说明3:更多代码细节和多线程调用示例请参考SDK的示例程序。

示例:

  1. package com.alibaba.nls.client;
  2. import java.io.File;
  3. import java.io.FileInputStream;
  4. import com.alibaba.nls.client.protocol.InputFormatEnum;
  5. import com.alibaba.nls.client.protocol.NlsClient;
  6. import com.alibaba.nls.client.protocol.SampleRateEnum;
  7. import com.alibaba.nls.client.protocol.asr.SpeechRecognizer;
  8. import com.alibaba.nls.client.protocol.asr.SpeechRecognizerListener;
  9. import com.alibaba.nls.client.protocol.asr.SpeechRecognizerResponse;
  10. import org.slf4j.Logger;
  11. import org.slf4j.LoggerFactory;
  12. /**
  13. * for demo show only
  14. */
  15. public class SpeechRecognizerDemo {
  16. private static final Logger logger = LoggerFactory.getLogger(SpeechRecognizerDemo.class);
  17. private String appKey;
  18. NlsClient client;
  19. public SpeechRecognizerDemo(String appKey, String token, String url) {
  20. this.appKey = appKey;
  21. // Create an NlsClient object. You can globally create an NlsClient object and specify the endpoint.
  22. if(url.isEmpty()) {
  23. client = new NlsClient("wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1", token);
  24. }else {
  25. client = new NlsClient(url, token);
  26. }
  27. }
  28. // user-define params
  29. private static SpeechRecognizerListener getRecognizerListener(int myOrder, String userParam) {
  30. SpeechRecognizerListener listener = new SpeechRecognizerListener() {
  31. // Return intermediate results. The server returns this message when it recognizes a word.
  32. // This message is returned only when the setEnableIntermediateResult parameter is set to true.
  33. @Override
  34. public void onRecognitionResultChanged(SpeechRecognizerResponse response) {
  35. // The message name RecognitionResultChanged.
  36. // The status code. The code 20000000 indicates that the request is successful
  37. // The recognized text.
  38. System.out.println("name: " + response.getName() + ", status: " + response.getStatus() + ", result: " + response.getRecognizedText());
  39. }
  40. //Indicate that the recognition is completed.
  41. @Override
  42. public void onRecognitionCompleted(SpeechRecognizerResponse response) {
  43. System.out.println("name: " + response.getName() + ", status: " + response.getStatus() + ", result: " + response.getRecognizedText());
  44. }
  45. @Override
  46. public void onStarted(SpeechRecognizerResponse response) {
  47. System.out.println("myOrder: " + myOrder + "; myParam: " + userParam + "; task_id: " + response.getTaskId());
  48. }
  49. @Override
  50. public void onFail(SpeechRecognizerResponse response) {
  51. // response.getStatus() : the error message.
  52. // task_id : very important, unique id
  53. System.out.println("task_id: " + response.getTaskId() + ", status: " + response.getStatus() + ", status_text: " + response.getStatusText());
  54. }
  55. };
  56. return listener;
  57. }
  58. // calculate the corresponding equivalent voice length based on the binary data size
  59. public static int getSleepDelta(int dataSize, int sampleRate) {
  60. int sampleBytes = 16;
  61. // only supports single channel
  62. int soundChannel = 1;
  63. return (dataSize * 10 * 8000) / (160 * sampleRate);
  64. }
  65. public void process(String filepath, int sampleRate) {
  66. SpeechRecognizer recognizer = null;
  67. try {
  68. String myParam = "user-param";
  69. int myOrder = 1234;
  70. SpeechRecognizerListener listener = getRecognizerListener(myOrder, myParam);
  71. recognizer = new SpeechRecognizer(client, listener);
  72. recognizer.setAppKey(appKey);
  73. //audo format
  74. recognizer.setFormat(InputFormatEnum.PCM);
  75. // Specify the audio coding format.
  76. if(sampleRate == 16000) {
  77. recognizer.setSampleRate(SampleRateEnum.SAMPLE_RATE_16K);
  78. } else if(sampleRate == 8000) {
  79. recognizer.setSampleRate(SampleRateEnum.SAMPLE_RATE_8K);
  80. }
  81. //intermediate result
  82. recognizer.setEnableIntermediateResult(true);
  83. long now = System.currentTimeMillis();
  84. recognizer.start();
  85. logger.info("ASR start latency : " + (System.currentTimeMillis() - now) + " ms");
  86. File file = new File(filepath);
  87. FileInputStream fis = new FileInputStream(file);
  88. byte[] b = new byte[3200];
  89. int len;
  90. while ((len = fis.read(b)) > 0) {
  91. logger.info("send data pack length: " + len);
  92. recognizer.send(b);
  93. // if it is real-time speech, then no sleep, if it is 8k sample rate, the second parameter is changed to 8000
  94. // if 8000 sample rate, 3200 bytes is recommended for sleep 200ms. if 16000 sample rate, 3200 bytes is recommended for sleep 100ms.
  95. int deltaSleep = getSleepDelta(len, sampleRate);
  96. Thread.sleep(deltaSleep);
  97. }
  98. now = System.currentTimeMillis();
  99. logger.info("ASR wait for complete");
  100. recognizer.stop();
  101. logger.info("ASR stop latency : " + (System.currentTimeMillis() - now) + " ms");
  102. fis.close();
  103. } catch (Exception e) {
  104. System.err.println(e.getMessage());
  105. } finally {
  106. //close
  107. if (null != recognizer) {
  108. recognizer.close();
  109. }
  110. }
  111. }
  112. public void shutdown() {
  113. client.shutdown();
  114. }
  115. public static void main(String[] args) throws Exception {
  116. String appKey = null;
  117. String token = null;
  118. String url = ""; // default:wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1
  119. if (args.length == 2) {
  120. appKey = args[0];
  121. token = args[1];
  122. } else if (args.length == 3) {
  123. appKey = args[0];
  124. token = args[1];
  125. url = args[2];
  126. } else {
  127. System.err.println("run error, need params(url is optional): " + "<app-key> <token> [url]");
  128. System.exit(-1);
  129. }
  130. SpeechRecognizerDemo demo = new SpeechRecognizerDemo(appKey, token, url);
  131. demo.process("./nls-sample-16k.wav", 16000);
  132. demo.shutdown();
  133. }
  134. }