All Products
Search
Document Center

Intelligent Speech Interaction:SDK for Java

Last Updated:Jan 15, 2024

The speech synthesis service provides an SDK for Java. This topic describes how to download and install the SDK. This topic also provides sample code for you to use the SDK.

Prerequisites

You understand how the SDK works. For more information, see Overview.

Download and installation

Download the latest version of the SDK from the Maven repository. To download the nls-sdk-java-demo package.

<dependency>
    <groupId>com.alibaba.nls</groupId>
    <artifactId>nls-sdk-tts</artifactId>
    <version>2.1.6</version>
</dependency>

Decompress the .zip demo package. Run the mvn package command from the pom directory. An executable JAR package named nls-example-tts-2.0.0-jar-with-dependencies.jar is generated in the target directory. Copy the JAR package to the destination server. You can use the JAR package for quick service validation and stress testing.

Service validation:

Run the following command and set parameters as prompted.

Then, the logs/nls.log file is generated in the directory where the command is run.

java -cp nls-example-tts-2.0.0-jar-with-dependencies.jar com.alibaba.nls.client.SpeechSynthesizerDemo

Stress testing:

Run the following command and set parameters as prompted.

Set the service URL to wss://nls-gateway.cn-shanghai.aliyuncs.com/ws/v1. Set the maximum number of concurrent calls based on your purchased resources.

java -jar nls-example-tts-2.0.0-jar-with-dependencies.jar
Note

You are charged if you make more than two concurrent calls to perform stress testing.

Key objects

  • NlsClient: the speech processing client. You can use the client to process short sentence recognition, real-time speech recognition, and speech synthesis tasks. This object is thread-safe. You can globally create one NlsClient object.

  • SpeechSynthesizer: the speech synthesis processor. You can use this object to set request parameters and send a request. This object is not thread-safe.

  • SpeechSynthesizerListener: the speech synthesis result listener, which listens to synthesis results. This object is not thread-safe. It implements the following two abstract methods:

      /**
       * Receive the synthesized audio data in binary format.
       */
      abstract public void onMessage(ByteBuffer message);
      /**
       * Notify the client that the synthesis task is completed.
       *
       * @param response
       */
      abstract public void onComplete(SpeechSynthesizerResponse response);

    For more information, see Java API overview.

Important

Notes on SDK calls:

  • You can globally create one NlsClient object and reuse it as required. Based on Netty, the creation of an NlsClient object consumes time and resources, but the created NlsClient object can be reused. We recommend that you create and disable an NlsClient object based on the lifecycle of your project.

  • The SpeechSynthesizer object cannot be reused. You must create a SpeechSynthesizer object for each recognition task. For example, to synthesize speech from N text files, you must create N SpeechSynthesizer objects to complete N synthesis tasks.

  • One SpeechSynthesizerListener object corresponds to one SpeechSynthesizer object. You cannot use one SpeechSynthesizerListener object for multiple SpeechSynthesizer objects. Otherwise, you may fail to distinguish synthesis tasks.

  • The SDK for Java depends on Netty. If your application depends on Netty, make sure that the version of Netty is 4.1.17.Final or later.

Sample code

Note

  • The demo uses the default Internet access URL that is built in the SDK to access the speech synthesis service. To use an Elastic Compute Service (ECS) instance in the China (Shanghai) region to access the service over an internal network, you must set the URL for internal access when you create the NlsClient object.

    client = new NlsClient("ws://nls-gateway.cn-shanghai-internal.aliyuncs.com/ws/v1", accessToken);

  • In the demo, the synthesized audio is stored in a file. If you need to play the synthesized audio in real time, we recommend that you use stream playback. The stream playback mode allows you to play the synthesized audio while audio data is being received. You do not need to wait until the synthesis task is completed. This reduces latency.

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import com.alibaba.nls.client.protocol.NlsClient;
import com.alibaba.nls.client.protocol.OutputFormatEnum;
import com.alibaba.nls.client.protocol.SampleRateEnum;
import com.alibaba.nls.client.protocol.tts.SpeechSynthesizer;
import com.alibaba.nls.client.protocol.tts.SpeechSynthesizerListener;
import com.alibaba.nls.client.protocol.tts.SpeechSynthesizerResponse;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
 * The sample code demonstrates how to perform the following operations:
 *      Call the API of the speech synthesis service.
 *      Dynamically obtain a token.
 *      Synthesize speech from text in stream mode.
 *      Calculate the first packet latency.
 */
public class SpeechSynthesizerDemo {
    private static final Logger logger = LoggerFactory.getLogger(SpeechSynthesizerDemo.class);
    private static long startTime;
    private String appKey;
    NlsClient client;
    public SpeechSynthesizerDemo(String appKey, String accessKeyId, String accessKeySecret) {
        this.appKey = appKey;
        // Globally create an NlsClient object. The default endpoint is the Internet access URL of the speech synthesis service.
        // Obtain a token. You must obtain another token before the current token expires. You can call the accessToken.getExpireTime() method to query the expiration time of a token.
        AccessToken accessToken = new AccessToken(accessKeyId, accessKeySecret);
        try {
            accessToken.apply();
            System.out.println("get token: " + accessToken.getToken() + ", expire time: " + accessToken.getExpireTime());
            client = new NlsClient(accessToken.getToken());
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    public SpeechSynthesizerDemo(String appKey, String accessKeyId, String accessKeySecret, String url) {
        this.appKey = appKey;
        AccessToken accessToken = new AccessToken(accessKeyId, accessKeySecret);
        try {
            accessToken.apply();
            System.out.println("get token: " + accessToken.getToken() + ", expire time: " + accessToken.getExpireTime());
            if(url.isEmpty()) {
                client = new NlsClient(accessToken.getToken());
            }else {
                client = new NlsClient(url, accessToken.getToken());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    private static SpeechSynthesizerListener getSynthesizerListener() {
        SpeechSynthesizerListener listener = null;
        try {
            listener = new SpeechSynthesizerListener() {
                File f=new File("tts_test.wav");
                FileOutputStream fout = new FileOutputStream(f);
                private boolean firstRecvBinary = true;
                // Complete the synthesis task.
                @Override
                public void onComplete(SpeechSynthesizerResponse response) {
                    // An onComplete event indicates that all the text data for speech synthesis is received. The latency is calculated for the entire synthesis task. Real-time playback may not be implemented due to large latency.
                    System.out.println("name: " + response.getName() +
                        ", status: " + response.getStatus()+
                        ", output file :"+f.getAbsolutePath()
                    );
                }
                // Receive the synthesized binary audio data.
                @Override
                public void onMessage(ByteBuffer message) {
                    try {
                        if(firstRecvBinary) {
                            // Calculate the first packet latency when the client receives the audio data for the first time. The client starts playback when it receives the first audio stream. This improves the response speed, especially during real-time speech interaction.
                            firstRecvBinary = false;
                            long now = System.currentTimeMillis();
                            logger.info("tts first latency : " + (now - SpeechSynthesizerDemo.startTime) + " ms");
                        }
                        byte[] bytesArray = new byte[message.remaining()];
                        message.get(bytesArray, 0, bytesArray.length);
                        fout.write(bytesArray);
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
                @Override
                public void onFail(SpeechSynthesizerResponse response){
                    // The task ID is the unique identifier that indicates interaction between the caller and the server. You must record the task ID. If an error occurs, you can submit a ticket and provide the task ID to Alibaba Cloud to facilitate troubleshooting.
                    System.out.println(
                        "task_id: " + response.getTaskId() +
                            // The status code. The code 20000000 indicates that the recognition is successful.
                            ", status: " + response.getStatus() +
                            // The error message.
                            ", status_text: " + response.getStatusText());
                }
            };
        } catch (Exception e) {
            e.printStackTrace();
        }
        return listener;
    }
    public void process() {
        SpeechSynthesizer synthesizer = null;
        try {
            // Create an object and establish a connection.
            synthesizer = new SpeechSynthesizer(client, getSynthesizerListener());
            synthesizer.setAppKey(appKey);
            // Set the audio coding format of the returned audio file.
            synthesizer.setFormat(OutputFormatEnum.WAV);
            // Set the audio sampling rate of the returned audio file.
            synthesizer.setSampleRate(SampleRateEnum.SAMPLE_RATE_16K);
            // The speaker type.
            synthesizer.setVoice("siyue");
            // Optional. The intonation of the speaker. Valid values: -500 to 500. Default value: 0.
            synthesizer.setPitchRate(100);
            // The speed of speaker. Valid values: -500 to 500. Default value: 0.
            synthesizer.setSpeechRate(100);
            // Set the text used for speech synthesis.
            synthesizer.setText("Welcome to use the Alibaba Cloud intelligent speech synthesis service. You can say what's the weather like in Beijing tomorrow.");
            // Specify whether to enable subtitling for the generated speech. By default, this feature is disabled. Note that not all the speaker types support subtitling.
            synthesizer.addCustomedParam("enable_subtitle", false);
            // Serialize preceding parameter settings into the JSON format. Then, send the JSON file to the server for confirmation.
            long start = System.currentTimeMillis();
            synthesizer.start();
            logger.info("tts start latency " + (System.currentTimeMillis() - start) + " ms");
            SpeechSynthesizerDemo.startTime = System.currentTimeMillis();
            // Wait until the synthesis task is completed.
            synthesizer.waitForComplete();
            logger.info("tts stop latency " + (System.currentTimeMillis() - start) + " ms");
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            // Disconnect the client from the server.
            if (null != synthesizer) {
                synthesizer.close();
            }
        }
    }
    public void shutdown() {
        client.shutdown();
    }
    public static void main(String[] args) throws Exception {
        String appKey = "Your appkey";
        String id = "Your AccessKey ID";
        String secret = "Your AccessKey secret";
        String url = ""; // Default value: wss://nls-gateway.cn-shanghai.aliyuncs.com/ws/v1.
        if (args.length == 3) {
            appKey   = args[0];
            id       = args[1];
            secret   = args[2];
        } else if (args.length == 4) {
            appKey   = args[0];
            id       = args[1];
            secret   = args[2];
            url      = args[3];
        } else {
            System.err.println("run error, need params(url is optional): " + "<app-key> <AccessKeyId> <AccessKeySecret> [url]");
            System.exit(-1);
        }
        SpeechSynthesizerDemo demo = new SpeechSynthesizerDemo(appKey, id, secret, url);
        demo.process();
        demo.shutdown();
    }
}