提示:
- 在使用SDK之前,请先确保已阅读了 接口说明文档。
- 从2.1.0版本开始原有nls-sdk-short-asr 更名为 nls-sdk-recognizer.升级时需确认删除掉nls-sdk-short-asr,并按编译提示添加相应回调方法即可。
下载安装
可从maven 服务器下载最新版本SDK:
<dependency>
<groupId>com.alibaba.nls</groupId>
<artifactId>nls-sdk-recognizer</artifactId>
<version>2.1.0</version>
</dependency>
使用方式参见下面代码示例。Demo 源码下载链接。
demo 解压后,在pom 目录运行mvn package ,会在target目录生成可执行jar:nls-example-recognizer-2.0.0-jar-with-dependencies.jar 将此jar拷贝到目标服务器,可用于快速验证及压测服务。
服务验证
java -cp nls-example-recognizer-2.0.0-jar-with-dependencies.jar com.alibaba.nls.client.SpeechRecognizerDemo并按提示提供相应参数,运行后在命令执行目录生成logs/nls.log
服务压测
java -jar nls-example-recognizer-2.0.0-jar-with-dependencies.jar并按提示提供相应参数,其中阿里云服务url参数为: wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1 ,语音文件请提供16k采样率 pcm 格式文件,并发数根据用户已购买并发谨慎选择。
温馨提示:自行压测超过2并发会产生费用。
关键接口
- NlsClient:语音处理client,相当于所有语音相关处理类的factory,全局创建一个实例即可。线程安全。
- SpeechRecognizer:一句话识别处理类,设置请求参数,发送请求及声音数据。非线程安全。
- SpeechRecognizerListener:识别结果监听类,监听识别结果。非线程安全。
更多介绍参见API文档链接: Java API接口说明
SDK 调用注意事项
- NlsClient对象创建一次可以重复使用,每次创建消耗性能。NlsClient使用了netty的框架,创建时比较消耗时间和资源,但创建之后可以重复利用。建议调用程序将NlsClient的创建和关闭与程序本身的生命周期结合。
- SpeechRecognizer对象不能重复使用,一个识别任务对应一个SpeechRecognizer对象。例如有N个音频文件,则要进行N次识别任务,创建N个SpeechRecognizer对象。
- 实现的SpeechRecognizerListener对象和SpeechRecognizer对象是一一对应的,不能将一个SpeechRecognizerListener对象设置到多个SpeechRecognizer对象中,否则不能区分是哪个识别任务。
- Java SDK依赖了Netty网络库,版本需设置为4.1.17.Final及以上。如果您的应用中依赖了Netty,请确保版本符合要求。
代码示例
说明1:Demo中使用的音频文件为16000Hz采样率,请在管控台中将appKey对应项目的模型设置为通用模型,以获取正确的识别结果;如果使用其他音频,请设置为支持该音频场景的模型,模型设置请阅读管理项目一节。
nls-sample-16k.wav
说明2:Demo显示了如何在在创建NlsClient对象的时候设置URL:
client = new NlsClient("ws://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1", accessToken);
说明3:更多代码细节和多线程调用示例请参考SDK的示例程序。
示例:
package com.alibaba.nls.client;
import java.io.File;
import java.io.FileInputStream;
import com.alibaba.nls.client.protocol.InputFormatEnum;
import com.alibaba.nls.client.protocol.NlsClient;
import com.alibaba.nls.client.protocol.SampleRateEnum;
import com.alibaba.nls.client.protocol.asr.SpeechRecognizer;
import com.alibaba.nls.client.protocol.asr.SpeechRecognizerListener;
import com.alibaba.nls.client.protocol.asr.SpeechRecognizerResponse;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
* for demo show only
*/
public class SpeechRecognizerDemo {
private static final Logger logger = LoggerFactory.getLogger(SpeechRecognizerDemo.class);
private String appKey;
NlsClient client;
public SpeechRecognizerDemo(String appKey, String token, String url) {
this.appKey = appKey;
// Create an NlsClient object. You can globally create an NlsClient object and specify the endpoint.
if(url.isEmpty()) {
client = new NlsClient("wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1", token);
}else {
client = new NlsClient(url, token);
}
}
// user-define params
private static SpeechRecognizerListener getRecognizerListener(int myOrder, String userParam) {
SpeechRecognizerListener listener = new SpeechRecognizerListener() {
// Return intermediate results. The server returns this message when it recognizes a word.
// This message is returned only when the setEnableIntermediateResult parameter is set to true.
@Override
public void onRecognitionResultChanged(SpeechRecognizerResponse response) {
// The message name RecognitionResultChanged.
// The status code. The code 20000000 indicates that the request is successful
// The recognized text.
System.out.println("name: " + response.getName() + ", status: " + response.getStatus() + ", result: " + response.getRecognizedText());
}
//Indicate that the recognition is completed.
@Override
public void onRecognitionCompleted(SpeechRecognizerResponse response) {
System.out.println("name: " + response.getName() + ", status: " + response.getStatus() + ", result: " + response.getRecognizedText());
}
@Override
public void onStarted(SpeechRecognizerResponse response) {
System.out.println("myOrder: " + myOrder + "; myParam: " + userParam + "; task_id: " + response.getTaskId());
}
@Override
public void onFail(SpeechRecognizerResponse response) {
// response.getStatus() : the error message.
// task_id : very important, unique id
System.out.println("task_id: " + response.getTaskId() + ", status: " + response.getStatus() + ", status_text: " + response.getStatusText());
}
};
return listener;
}
// calculate the corresponding equivalent voice length based on the binary data size
public static int getSleepDelta(int dataSize, int sampleRate) {
int sampleBytes = 16;
// only supports single channel
int soundChannel = 1;
return (dataSize * 10 * 8000) / (160 * sampleRate);
}
public void process(String filepath, int sampleRate) {
SpeechRecognizer recognizer = null;
try {
String myParam = "user-param";
int myOrder = 1234;
SpeechRecognizerListener listener = getRecognizerListener(myOrder, myParam);
recognizer = new SpeechRecognizer(client, listener);
recognizer.setAppKey(appKey);
//audo format
recognizer.setFormat(InputFormatEnum.PCM);
// Specify the audio coding format.
if(sampleRate == 16000) {
recognizer.setSampleRate(SampleRateEnum.SAMPLE_RATE_16K);
} else if(sampleRate == 8000) {
recognizer.setSampleRate(SampleRateEnum.SAMPLE_RATE_8K);
}
//intermediate result
recognizer.setEnableIntermediateResult(true);
long now = System.currentTimeMillis();
recognizer.start();
logger.info("ASR start latency : " + (System.currentTimeMillis() - now) + " ms");
File file = new File(filepath);
FileInputStream fis = new FileInputStream(file);
byte[] b = new byte[3200];
int len;
while ((len = fis.read(b)) > 0) {
logger.info("send data pack length: " + len);
recognizer.send(b);
// if it is real-time speech, then no sleep, if it is 8k sample rate, the second parameter is changed to 8000
// if 8000 sample rate, 3200 bytes is recommended for sleep 200ms. if 16000 sample rate, 3200 bytes is recommended for sleep 100ms.
int deltaSleep = getSleepDelta(len, sampleRate);
Thread.sleep(deltaSleep);
}
now = System.currentTimeMillis();
logger.info("ASR wait for complete");
recognizer.stop();
logger.info("ASR stop latency : " + (System.currentTimeMillis() - now) + " ms");
fis.close();
} catch (Exception e) {
System.err.println(e.getMessage());
} finally {
//close
if (null != recognizer) {
recognizer.close();
}
}
}
public void shutdown() {
client.shutdown();
}
public static void main(String[] args) throws Exception {
String appKey = null;
String token = null;
String url = ""; // default:wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1
if (args.length == 2) {
appKey = args[0];
token = args[1];
} else if (args.length == 3) {
appKey = args[0];
token = args[1];
url = args[2];
} else {
System.err.println("run error, need params(url is optional): " + "<app-key> <token> [url]");
System.exit(-1);
}
SpeechRecognizerDemo demo = new SpeechRecognizerDemo(appKey, token, url);
demo.process("./nls-sample-16k.wav", 16000);
demo.shutdown();
}
}