全部产品
Search
文档中心

java SDK 2.0

更新时间: 2020-11-03

提示:

  • 在使用SDK之前,请先确保已阅读了 接口说明文档
  • 从2.1.0版本开始原有nls-sdk-long-asr 更名为 nls-sdk-transcriber.升级时需确认删除掉nls-sdk-long-asr,并按编译提示添加相应回调方法即可。

下载安装

可从maven 服务器下载最新版本SDK:

  1. <dependency>
  2. <groupId>com.alibaba.nls</groupId>
  3. <artifactId>nls-sdk-transcriber</artifactId>
  4. <version>2.1.0</version>
  5. </dependency>

使用方式参见下面代码示例。Demo 源码下载链接

demo 解压后,在pom 目录运行mvn package ,会在target目录生成可执行jar nls-example-transcriber-2.0.0-jar-with-dependencies.jar 将此jar拷贝到目标服务器,可用于快速验证及压测服务。

服务验证

java -cp nls-example-transcriber-2.0.0-jar-with-dependencies.jar com.alibaba.nls.client.SpeechTranscriberDemo并按提示提供相应参数,运行后在命令执行目录生成logs/nls.log

服务压测

java -jar nls-example-transcriber-2.0.0-jar-with-dependencies.jar并按提示提供相应参数,其中阿里云服务url参数为: wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1 ,语音文件请提供16k采样率 pcm 格式文件,并发数根据用户已购买并发谨慎选择。

温馨提示:自行压测超过2并发会产生费用。

关键接口

  • NlsClient:语音处理client,相当于所有语音相关处理类的factory,全局创建一个实例即可。线程安全。
  • SpeechTranscriber:实时语音识别类,设置请求参数,发送请求及声音数据。非线程安全。
  • SpeechTranscriberListener:实时语音识别结果监听类,监听识别结果。非线程安全。

更多介绍参见API文档链接: Java API接口说明

SDK 调用注意事项

  1. NlsClient对象创建一次可以重复使用,每次创建消耗性能。NlsClient使用了netty的框架,创建时比较消耗时间和资源,但创建之后可以重复利用。建议调用程序将NlsClient的创建和关闭与程序本身的生命周期结合。
  2. SpeechTranscriber对象不能重复使用,一个识别任务对应一个SpeechTranscriber对象。例如有N个音频文件,则要进行N次识别任务,创建N个SpeechTranscriber对象。
  3. 实现的SpeechTranscriberListener对象和SpeechTranscriber对象是一一对应的,不能将一个SpeechTranscriberListener对象设置到多个SpeechTranscriber对象中,否则不能区分是哪个识别任务。
  4. Java SDK依赖了Netty网络库,版本需设置为4.1.17.Final及以上。如果您的应用中依赖了Netty,请确保版本符合要求。

代码示例

说明:Demo中使用的音频文件为16000Hz采样率,请在管控台中将appKey对应项目的模型设置为通用模型,以获取正确的识别结果;如果使用其他音频,请设置为支持该音频场景的模型,模型设置请阅读管理项目一节。

nls-sample-16k.wav

示例:

  1. package com.alibaba.nls.client;
  2. import java.io.File;
  3. import java.io.FileInputStream;
  4. import com.alibaba.nls.client.protocol.InputFormatEnum;
  5. import com.alibaba.nls.client.protocol.NlsClient;
  6. import com.alibaba.nls.client.protocol.SampleRateEnum;
  7. import com.alibaba.nls.client.protocol.asr.SpeechTranscriber;
  8. import com.alibaba.nls.client.protocol.asr.SpeechTranscriberListener;
  9. import com.alibaba.nls.client.protocol.asr.SpeechTranscriberResponse;
  10. import org.slf4j.Logger;
  11. import org.slf4j.LoggerFactory;
  12. /**
  13. * 此示例演示了
  14. * ASR实时识别API调用
  15. * 动态获取token
  16. * 通过本地模拟实时流发送
  17. * 识别耗时计算
  18. * (仅作演示,需用户根据实际情况实现)
  19. */
  20. public class SpeechTranscriberDemo {
  21. private String appKey;
  22. private NlsClient client;
  23. private static final Logger logger = LoggerFactory.getLogger(SpeechTranscriberDemo.class);
  24. public SpeechTranscriberDemo(String appKey, String token, String url) {
  25. this.appKey = appKey;
  26. //Create an NlsClient object. You can globally create an NlsClient object and specify the endpoint.
  27. if(url.isEmpty()) {
  28. client = new NlsClient("wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1", token);
  29. }else {
  30. client = new NlsClient(url, token);
  31. }
  32. }
  33. private static SpeechTranscriberListener getTranscriberListener() {
  34. SpeechTranscriberListener listener = new SpeechTranscriberListener() {
  35. // Return intermediate results. The server returns this message when it recognizes a word.
  36. // This message is returned only when the setEnableIntermediateResult parameter is set to true.
  37. @Override
  38. public void onTranscriptionResultChange(SpeechTranscriberResponse response) {
  39. System.out.println("task_id: " + response.getTaskId() +
  40. ", name: " + response.getName() +
  41. //The status code. The code 20000000 indicates that the request is successful.
  42. ", status: " + response.getStatus() +
  43. //The sequence number of the sentence, which starts from 1.
  44. ", index: " + response.getTransSentenceIndex() +
  45. //The recognition result of the sentence.
  46. ", result: " + response.getTransSentenceText() +
  47. //The duration of currently processed audio streams, in milliseconds.
  48. ", time: " + response.getTransSentenceTime());
  49. }
  50. @Override
  51. public void onTranscriberStart(SpeechTranscriberResponse response) {
  52. System.out.println("task_id: " + response.getTaskId() + ", name: " + response.getName() + ", status: " + response.getStatus());
  53. }
  54. @Override
  55. public void onSentenceBegin(SpeechTranscriberResponse response) {
  56. System.out.println("task_id: " + response.getTaskId() + ", name: " + response.getName() + ", status: " + response.getStatus());
  57. }
  58. //Recognize a complete sentence. The server can detect the beginning and end of a sentence. When the server detects the end of the sentence, it returns this message.
  59. @Override
  60. public void onSentenceEnd(SpeechTranscriberResponse response) {
  61. System.out.println("task_id: " + response.getTaskId() +
  62. ", name: " + response.getName() +
  63. //The status code. The code 20000000 indicates that the request is successful.
  64. ", status: " + response.getStatus() +
  65. //The sequence number of the sentence, which starts from 1.
  66. ", index: " + response.getTransSentenceIndex() +
  67. //The recognition result of the sentence.
  68. ", result: " + response.getTransSentenceText() +
  69. //The confidence level.
  70. ", confidence: " + response.getConfidence() +
  71. //The time when the server detects the beginning of the sentence.
  72. ", begin_time: " + response.getSentenceBeginTime() +
  73. //The duration of currently processed audio streams, in milliseconds.
  74. ", time: " + response.getTransSentenceTime());
  75. }
  76. //Indicate that the recognition is completed.
  77. @Override
  78. public void onTranscriptionComplete(SpeechTranscriberResponse response) {
  79. System.out.println("task_id: " + response.getTaskId() + ", name: " + response.getName() + ", status: " + response.getStatus());
  80. }
  81. @Override
  82. public void onFail(SpeechTranscriberResponse response) {
  83. System.out.println("task_id: " + response.getTaskId() + ", status: " + response.getStatus() + ", status_text: " + response.getStatusText());
  84. }
  85. };
  86. return listener;
  87. }
  88. // calculate the corresponding equivalent voice length based on the binary data size
  89. public static int getSleepDelta(int dataSize, int sampleRate) {
  90. int sampleBytes = 16;
  91. // only supports single channel
  92. int soundChannel = 1;
  93. return (dataSize * 10 * 8000) / (160 * sampleRate);
  94. }
  95. public void process(String filepath) {
  96. SpeechTranscriber transcriber = null;
  97. try {
  98. //Create an object and establish a connection
  99. transcriber = new SpeechTranscriber(client, getTranscriberListener());
  100. transcriber.setAppKey(appKey);
  101. //Specify the audio coding format
  102. transcriber.setFormat(InputFormatEnum.PCM);
  103. //Specify the audio sampling rate
  104. transcriber.setSampleRate(SampleRateEnum.SAMPLE_RATE_16K);
  105. //Specify whether to return intermediate results
  106. transcriber.setEnableIntermediateResult(false);
  107. //Specify whether to add punctuation marks to the recognition result
  108. transcriber.setEnablePunctuation(true);
  109. //Specify whether to enable inverse text normalization (ITN). A value of true indicates that Chinese numerals are converted to Arabic numerals
  110. transcriber.setEnableITN(false);
  111. //Serialize preceding parameters to the JSON format and send them to the server for confirmation
  112. transcriber.start();
  113. File file = new File(filepath);
  114. FileInputStream fis = new FileInputStream(file);
  115. byte[] b = new byte[3200];
  116. int len;
  117. while ((len = fis.read(b)) > 0) {
  118. logger.info("send data pack length: " + len);
  119. transcriber.send(b);
  120. // if it is real-time speech, then no sleep, if it is 8k sample rate, the second parameter is changed to 8000
  121. // if 8000 sample rate, 3200 bytes is recommended for sleep 200ms. if 16000 sample rate, 3200 bytes is recommended for sleep 100ms.
  122. int deltaSleep = getSleepDelta(len, 16000);
  123. Thread.sleep(deltaSleep);
  124. }
  125. //Notify the server that all audio data has been sent and wait for the completion message from the server.
  126. long now = System.currentTimeMillis();
  127. logger.info("ASR wait for complete");
  128. transcriber.stop();
  129. logger.info("ASR latency : " + (System.currentTimeMillis() - now) + " ms");
  130. } catch (Exception e) {
  131. System.err.println(e.getMessage());
  132. } finally {
  133. if (null != transcriber) {
  134. transcriber.close();
  135. }
  136. }
  137. }
  138. public void shutdown() {
  139. client.shutdown();
  140. }
  141. public static void main(String[] args) throws Exception {
  142. String appKey = null;
  143. String token = null;
  144. String url = ""; // 默认即可,默认值:wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1
  145. if (args.length == 2) {
  146. appKey = args[0];
  147. token = args[1];
  148. } else if (args.length == 3) {
  149. appKey = args[0];
  150. token = args[1];
  151. url = args[2];
  152. } else {
  153. System.err.println("run error, need params(url is optional): " + "<app-key> <token> [url]");
  154. System.exit(-1);
  155. }
  156. String filepath = "nls-sample-16k.wav";
  157. SpeechTranscriberDemo demo = new SpeechTranscriberDemo(appKey, token, url);
  158. demo.process(filepath);
  159. demo.shutdown();
  160. }
  161. }