All Products
Document Center

SDK for Java

Last Updated: Sep 08, 2020

This topic describes how to use the Java SDK provided by Alibaba Cloud Intelligent speech interaction and provides the SDK installation method and SDK sample code.


  • Before you use this SDK, make sure that you understand how this SDK works. Fore more information, see Overview.

  • To run the demo project, you must update the SDK to the latest version. To synthesize speech from long text, you must use the SDK 2.1.1 or later.

Download and installation

You can download the latest version of the SDK from the Maven repository. To download the demo project, click here.

The dependency file is as follows:


Decompress the demo package. Run the mvn package command from the pom directory. An executable JAR package nls-example-long-tts-2.0.0-jar-with-dependencies.jar is generated in the target directory. Copy this JAR package to the target server. You can use the JAR package for quick service validation and stress testing.

Service validation:

Run the following command and set parameters as prompted.

Then, the logs/nls.log file is generated in the directory where the command is run.

 java -cp nls-example-long-tts-2.0.0-jar-with-dependencies.jar

Stress testing:

Run the following command and set parameters as prompted.

Set the service URL to wss://, Set the maximum number of concurrent calls based on your purchased resources.

java -jar nls-example-long-tts-2.0.0-jar-with-dependencies.jar

Charges are incurred if you make more than two concurrent calls to perform stress testing.

Key objects

  • NlsClient: The speech processing client. You can use this client to process short sentence recognition, real-time speech recognition, and speech synthesis tasks. You can globally create an NlsClient object. This object is thread-safe.

  • SpeechSynthesizer: the speech synthesis object. You can use this object to set request parameters and send a request. This object is not thread-safe.

  • SpeechSynthesizerListener: the speech synthesis result listener, which listens to synthesis results. This object is not thread-safe. It implements the following two abstract methods:

         * Receive synthesized binary audio data.
        abstract public void onMessage(ByteBuffer message);
         * Notify the client that the synthesis task is completed.
        abstract public void onComplete(SpeechSynthesizerResponse response);


Notes on SDK calls:

  • You can globally create an NlsClient object and reuse it as required. Based on Netty, the creation of an NlsClient object consumes time and resources, but the created NlsClient object can be reused. We recommend that you create and disable an NlsClient object based on the lifecycle of your project.

  • The SpeechSynthesizer object cannot be reused. You must create a SpeechSynthesizer object for each synthesis task. For example, to synthesize speech from N text files, you must create N SpeechSynthesizer objects to complete N synthesis tasks.

  • A SpeechSynthesizerListener object corresponds to a SpeechSynthesizer object. You cannot use a SpeechSynthesizerListener object for multiple SpeechSynthesizer objects. Otherwise, you may fail to distinguish synthesis tasks.

  • The SDK for Java depends on Netty. The version of Netty must be 4.1.17.Final or later. If your project depends on Netty, ensure that the version of Netty is appropriate.

Sample code


  • The demo uses the default Internet access URL built in the SDK to access the speech synthesis service. To use an Elastic Compute Service (ECS) instance located in the China (Shanghai) region to access this service in an internal network, you must set the URL for internal access when you create the SpeechSynthesizerRequest object.

    client = new NlsClient("ws://", accessToken);
  • In the demo, the synthesized audio is stored in a file. If you need to play the synthesized audio in real time, we recommend that you use stream playback. The stream playback mode allows you to play the synthesized audio while audio data is being received. You do not need to wait until the synthesis task is completed. This reduces the latency.

import java.nio.ByteBuffer;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

 * The sample code demonstrates how to perform the following operations:
 *      Call the setLongText method of the long-text-to-speech synthesis service.
 *      Run a synthesis task in real time.
 *      Calculate the first packet latency.
 * The sample code is used only for demonstration. Compile your project based on your business requirements.
 * Note: This demo is different from SpeechSynthesizerLongTextDemo under nls-example-tts. Long-text-to-speech synthesis is an independent service, which sends the complete long text to the server for speech synthesis.
 * In contrast, the synthesis task demonstrated in SpeechSynthesizerLongTextDemo splits long text into segments upon service calls and sends the segments to the server for speech synthesis.
public class SpeechLongSynthesizerDemo {
    private static final Logger logger = LoggerFactory.getLogger(SpeechLongSynthesizerDemo.class);
    private static long startTime;
    private String appKey;
    NlsClient client;

    public SpeechLongSynthesizerDemo(String appKey, String token, String url) {
        this.appKey = appKey;
        // Note: You can globally create an NlsClient object and disable the object based on the lifecycle of your project. The default endpoint is the online service URL on Alibaba Cloud.
        if(url.isEmpty()) {
            client = new NlsClient(token);
        } else {
            client = new NlsClient(url, token);

    private static SpeechSynthesizerListener getSynthesizerListener() {
        SpeechSynthesizerListener listener = null;
        try {
            listener = new SpeechSynthesizerListener() {
                File f=new File("ttsForLongText.wav");
                FileOutputStream fout = new FileOutputStream(f);
                private boolean firstRecvBinary = true;

                // Complete the synthesis task.
                public void onComplete(SpeechSynthesizerResponse response) {
                    // An onComplete event indicates that all the text data for speech synthesis is received. The latency is calculated for the entire synthesis task. Real-time playback may not be implemented due to large latency.
                    System.out.println("name: " + response.getName() + ", status: " + response.getStatus()+", output file :"+f.getAbsolutePath());

                // Receive the synthesized binary audio data.
                public void onMessage(ByteBuffer message) {
                    try {
                        if(firstRecvBinary) {
                            // Calculate the first packet latency when the client receives the audio data for the first time. The client starts the playback when it receives the first audio stream. This improves the response speed, especially in real-time speech interaction scenarios.
                            firstRecvBinary = false;
                            long now = System.currentTimeMillis();
                  "tts first latency : " + (now - SpeechLongSynthesizerDemo.startTime) + " ms");
                        byte[] bytesArray = new byte[message.remaining()];
                        message.get(bytesArray, 0, bytesArray.length);
                        //System.out.println("write array:" + bytesArray.length);
                    } catch (IOException e) {
                public void onFail(SpeechSynthesizerResponse response){
                    // Note: The task ID is the unique identifier that indicates the interaction between the caller and the server. You must record the task ID. If an error occurs, you can submit a ticket and provide the task ID to Alibaba Cloud to facilitate troubleshooting.
                        "task_id: " + response.getTaskId() +
                            // The status code. The code 20000000 indicates that the request is successful.
                            ", status: " + response.getStatus() +
                            // The error message.
                            ", status_text: " + response.getStatusText());
        } catch (Exception e) {
        return listener;

    public void process(String text) {
        SpeechSynthesizer synthesizer = null;
        try {
            // Create an object and establish a connection.
            synthesizer = new SpeechSynthesizer(client, getSynthesizerListener());
            // Specify the audio encoding format of the returned audio file.
            // Specify the audio sampling rate of the returned audio file.
            // The speaker type.
            // Optional. The intonation of the speaker. Valid values: -500 to 500. Default value: 0.
            // The speed of the speaker. Valid values: -500 to 500. Default value: 0.

            // Set the text used for speech synthesis.
            // Note: The setLongText method is used for long-text-to-speech synthesis, whereas the setText method is used for speech synthesis.

            // Serialize preceding parameters in the JSON format and send them to the server for confirmation.
            long start = System.currentTimeMillis();
  "tts start latency " + (System.currentTimeMillis() - start) + " ms");

            SpeechLongSynthesizerDemo.startTime = System.currentTimeMillis();
            // Wait until the speech synthesis is completed.
  "tts stop latency " + (System.currentTimeMillis() - start) + " ms");
        } catch (Exception e) {
        } finally {
            // Disconnect the client from the server.
            if (null != synthesizer) {

    public void shutdown() {

    public static void main(String[] args) throws Exception {
        String appKey = "";
        String token = "Your token";
        // Specify the default URL.
        String url = "wss://";

        if (args.length == 2) {
            appKey   = args[0];
            token    = args[1];
        } else if (args.length == 3) {
            appKey   = args[0];
            token    = args[1];
            url      = args[2];
        } else {
            System.err.println("run error, need params(url is optional): " + "<app-key> <token> [url]");

        String ttsTextLong = "From Hundred-Plant Garden to Three-flavour Study by Lu Xun \n" +
            "Behind our house was a great garden known in our family as Hundred-Plant Garden. It has long since been sold, together with the house, to the descendants of Zhu Xi; and the last time I saw it, already seven or eight years ago. I am pretty sure there were only weeds growing there. But in my childhood it was my paradise. \n" +
            "I need not speak of the green vegetable plots, the slippery stone coping round the well, the tall honey-locust tree, or the purple mud berries. Nor need I speak of the long shrilling of the cicadas among the leaves, the fat wasps couches in the flowering rape, or the nimble skylarks who suddenly soared straight up from the grass to the sky. \n" +
            "Just the foot of the low mud wall around the garden was a source of unfailing interest. Here field crickets droned away while house crickets chirped merrily. Turning over a broken brick, you might find a centipede. There were stink-beetles as well, and if you pressed a finger on their backs\n" +
            "they emitted puffs of vapor from their rear orifices. Milkwort interwove with climbing fig which had fruit shaped like the calyx of a lotus, while the milk mort had swollen tubers. Fork said that some of these had human shapes and if you ate them you would become immortal, so I kept on pulling them up. By uprooting one I pulled out those next to it,\n" +
            "and in this way destroyed part of the mud wall, but I never found a tuber shaped like a man. If you were not afraid of thorns you could pick raspberries too, like clusters of little coral beads, sweet yet tart, with a much finer color and flavor than mulberries..." ;

        SpeechLongSynthesizerDemo demo = new SpeechLongSynthesizerDemo(appKey, token, url);