SDK for Java - Intelligent Speech Interaction - Alibaba Cloud Documentation Center

The short sentence recognition service provides an SDK for Java. This topic describes how to download and install the SDK. This topic also provides sample code for you to use the SDK.

Precautions

Before you use the SDK, make sure that you understand how the SDK works. For more information, see Overview.
The nls-sdk-short-asr SDK is renamed as nls-sdk-recognizer since version 2.1.0. If you use the nls-sdk-short-asr SDK and need to upgrade the SDK, you must delete it and add callbacks as prompted.

Download and installation

Download the latest version of the SDK from the Maven repository and the nls-sdk-java-demo package.

Add the following dependency:

<dependency>
    <groupId>com.alibaba.nls</groupId>
    <artifactId>nls-sdk-recognizer</artifactId>
    <version>2.1.6</version>
</dependency>

Decompress the .zip demo package. Run the mvn package command from the pom directory. An executable JAR package named nls-example-recognizer-2.0.0-jar-with-dependencies.jar is generated in the target directory. Copy the JAR package to the destination server. You can use the JAR package for quick service validation and stress testing.

Service validation:

Run the following command and set parameters as prompted.

Then, the logs/nls.log file is generated in the directory where the command is run.

java -cp nls-example-recognizer-2.0.0-jar-with-dependencies.jar com.alibaba.nls.client.SpeechRecognizerDemo

Stress testing:

Run the following command and set parameters as prompted.

Set the service URL to wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1. Provide .pcm audio files with a sampling rate of 16,000 Hz. Set the maximum number of concurrent calls based on your purchased resources.

java -jar nls-example-recognizer-2.0.0-jar-with-dependencies.jar

Note

You are charged if you make more than two concurrent calls to perform stress testing.

Key objects

NlsClient: the speech processing client. You can use this client to process short sentence recognition, real-time speech recognition, and speech synthesis tasks. This object is thread-safe. You can globally create one NlsClient object.
SpeechRecognizer: the short sentence recognition object. You can use this object to set request parameters, send a request, and send audio data. This object is not thread-safe.
SpeechRecognizerListener: the recognition result listener, which listens to recognition results. This object is not thread-safe.

For more information, see Java API overview.

Important

Notes on SDK calls:

Based on Netty, the creation of an NlsClient object consumes time and resources, but the created NlsClient object can be reused. We recommend that you create and disable an NlsClient object based on the lifecycle of your project.
The SpeechRecognizer object cannot be reused. You must create a SpeechRecognizer object for each recognition task. For example, to process N audio files, you must create N SpeechRecognizer objects to complete N recognition tasks.
One SpeechRecognizerListener object corresponds to one SpeechRecognizer object. You cannot use one SpeechRecognizerListener object for multiple SpeechRecognizer objects. Otherwise, you may fail to distinguish recognition tasks.
The SDK for Java depends on Netty. If your application is dependent on Netty, make sure that the version of Netty is 4.1.17.Final or later.

Sample code

Note

Download the nls-sample-16k.wav file.
The demo uses an audio file with a sampling rate of 16,000 Hz. To obtain an accurate recognition result, set the model to universal model for the project to which the appkey is bound in the Intelligent Speech Interaction console. In actual use, you can select a model based on the audio sampling rate. For more information about model setting, see Manage projects.

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import com.alibaba.nls.client.protocol.InputFormatEnum;
import com.alibaba.nls.client.protocol.NlsClient;
import com.alibaba.nls.client.protocol.SampleRateEnum;
import com.alibaba.nls.client.protocol.asr.SpeechRecognizer;
import com.alibaba.nls.client.protocol.asr.SpeechRecognizerListener;
import com.alibaba.nls.client.protocol.asr.SpeechRecognizerResponse;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
 * The sample code demonstrates how to perform the following operations:
 *      Call the API of the short sentence recognition service.
 *      Dynamically obtain a token.
 *      Use local files to simulate the sending of real-time streams.
 *      Calculate time consumed for recognition.
 */
public class SpeechRecognizerDemo {
    private static final Logger logger = LoggerFactory.getLogger(SpeechRecognizerDemo.class);
    private String appKey;
    NlsClient client;
    public SpeechRecognizerDemo(String appKey, String id, String secret, String url) {
        this.appKey = appKey;
        // Globally create an NlsClient object. The default endpoint is the Internet access URL of the short sentence recognition service.
        // Obtain a token. You must obtain another token before the current token expires. You can call the accessToken.getExpireTime() method to query the expiration time of a token.
        AccessToken accessToken = new AccessToken(id, secret);
        try {
            accessToken.apply();
            System.out.println("get token: " + accessToken.getToken() + ", expire time: " + accessToken.getExpireTime());
            if(url.isEmpty()) {
                client = new NlsClient(accessToken.getToken());
            }else {
                client = new NlsClient(url, accessToken.getToken());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    private static SpeechRecognizerListener getRecognizerListener(int myOrder, String userParam) {
        SpeechRecognizerListener listener = new SpeechRecognizerListener() {
            // Return intermediate results. This message is returned only if the setEnableIntermediateResult parameter is set to true.
            @Override
            public void onRecognitionResultChanged(SpeechRecognizerResponse response) {
                // getName means to obtain the name of the event. getStatus means to obtain the status code. getRecognizedText means to obtain the recognized text.
                System.out.println("name: " + response.getName() + ", status: " + response.getStatus() + ", result: " + response.getRecognizedText());
            }
            // Recognition is completed.
            @Override
            public void onRecognitionCompleted(SpeechRecognizerResponse response) {
                // getName means to obtain the name of the event. getStatus means to obtain the status code. getRecognizedText means to obtain the recognized text.
                System.out.println("name: " + response.getName() + ", status: " + response.getStatus() + ", result: " + response.getRecognizedText());
            }
            @Override
            public void onStarted(SpeechRecognizerResponse response) {
                System.out.println("myOrder: " + myOrder + "; myParam: " + userParam + "; task_id: " + response.getTaskId());
            }
            @Override
            public void onFail(SpeechRecognizerResponse response) {
                // task_id is the unique identifier that indicates the interaction between the caller and the server. If an error occurs, you can submit a ticket and provide the task ID to Alibaba Cloud to facilitate troubleshooting.
                System.out.println("task_id: " + response.getTaskId() + ", status: " + response.getStatus() + ", status_text: " + response.getStatusText());
            }
        };
        return listener;
    }
    // Calculate the equivalent voice length based on the binary data size.
    // Set the sampling rate to 8,000 Hz or 16,000 Hz.
    public static int getSleepDelta(int dataSize, int sampleRate) {
        // Set the sampling size to 16-bit.
        int sampleBytes = 16;
        // Use a single sound channel.
        int soundChannel = 1;
        return (dataSize * 10 * 8000) / (160 * sampleRate);
    }
    public void process(String filepath, int sampleRate) {
        SpeechRecognizer recognizer = null;
        try {
            // Pass the user-defined parameters.
            String myParam = "user-param";
            int myOrder = 1234;
            SpeechRecognizerListener listener = getRecognizerListener(myOrder, myParam);
            recognizer = new SpeechRecognizer(client, listener);
            recognizer.setAppKey(appKey);
            // Set the audio encoding format. For the .opus file, name the file as InputFormatEnum.OPUS.
            recognizer.setFormat(InputFormatEnum.PCM);
            // Set the audio sampling rate.
            if(sampleRate == 16000) {
                recognizer.setSampleRate(SampleRateEnum.SAMPLE_RATE_16K);
            } else if(sampleRate == 8000) {
                recognizer.setSampleRate(SampleRateEnum.SAMPLE_RATE_8K);
            }
            // Specify whether to return intermediate results.
            recognizer.setEnableIntermediateResult(true);
            // Serialize preceding parameter settings to the JSON format. Then, send the JSON file to the server for confirmation.
            long now = System.currentTimeMillis();
            recognizer.start();
            logger.info("ASR start latency : " + (System.currentTimeMillis() - now) + " ms");
            File file = new File(filepath);
            FileInputStream fis = new FileInputStream(file);
            byte[] b = new byte[3200];
            int len;
            while ((len = fis.read(b)) > 0) {
                logger.info("send data pack length: " + len);
                recognizer.send(b, len);
                // In this example, local files are read to simulate real-time speech data streams. You must set the sleep duration because files are fast read.
                // To recognize real-time speech, you do not need to set the sleep duration. If the audio sampling rate is 8,000 Hz, you must set the second parameter to 8000.
                int deltaSleep = getSleepDelta(len, sampleRate);
                Thread.sleep(deltaSleep);
            }
            // Notify the server that audio data has been sent. Wait until the server completes processing.
            now = System.currentTimeMillis();
            // Calculate the latency. The time when a response is returned after the stop method is called is regarded as the time when the final recognition result is returned.
            logger.info("ASR wait for complete");
            recognizer.stop();
            logger.info("ASR stop latency : " + (System.currentTimeMillis() - now) + " ms");
            fis.close();
        } catch (Exception e) {
            System.err.println(e.getMessage());
        } finally {
            // Close the connection.
            if (null != recognizer) {
                recognizer.close();
            }
        }
    }
    public void shutdown() {
        client.shutdown();
    }
    public static void main(String[] args) throws Exception {
        String appKey = null; // Enter the appkey.
        String id = null; // Enter the AccessKey ID.
        String secret = null; // Enter the AccessKey secret.
        String url = ""; // Default value: wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1
        if (args.length == 3) {
            appKey   = args[0];
            id       = args[1];
            secret   = args[2];
        } else if (args.length == 4) {
            appKey   = args[0];
            id       = args[1];
            secret   = args[2];
            url      = args[3];
        } else {
            System.err.println("run error, need params(url is optional): " + "<app-key> <AccessKeyId> <AccessKeySecret> [url]");
            System.exit(-1);
        }
        SpeechRecognizerDemo demo = new SpeechRecognizerDemo(appKey, id, secret, url);
        // In this example, local files are used to simulate the sending of real-time streams.
        demo.process("./nls-sample-16k.wav", 16000);
        //demo.process("./nls-sample.opus", 16000);
        demo.shutdown();
    }
}