All Products
Search
Document Center

Intelligent Speech Interaction:SDK for C++

Last Updated:Mar 03, 2023

This topic describes how to use the C ++ SDK provided by Alibaba Cloud Intelligent speech interaction, including the SDK installation method and SDK sample code.

Note

  • The latest version of the SDK for C++ is 3.0.8, which was released on January 09, 2020.

    This version applies only to the Linux operating system. The Windows operating system is not supported.

  • Before you use this SDK, make sure that you understand how this SDK works. Fore more information, see API reference.

  • The methods of this SDK version are different from those of the earlier version. If you are familiar with the earlier version, pay attention to the updated methods described in this topic.

Download and installation

Download the SDK:

To download the SDK for C++, The compressed package contains the following files or folders:

  • CMakeLists.txt: the CMakeList file of the demo project.

  • readme.txt: the SDK description.

  • release.log: The release notes.

  • version: the version number.

  • build.sh: the demo compilation script.

  • lib: the SDK libraries.

  • build: the compilation directory.

  • demo: the folder that contains demo.cpp files, which are the configuration files of various Intelligent Speech Interaction services. The following table describes the files contained in the folder.

    File name

    Description

    speechRecognizerDemo.cpp

    The demo of short sentence recognition.

    speechSynthesizerDemo.cpp

    The demo of speech synthesis.

    speechTranscriberDemo.cpp

    The demo of real-time speech recognition.

    speechLongSynthesizerDemo.cpp

    The demo of long-text-to-speech synthesis.

    test0.wav/test1.wav

    The 16-bit audio files with a sampling rate of 16,000 Hz for testing.

  • include: the folder that contains SDK header files. The following table describes the files contained in the folder.

    File name

    Description

    nlsClient.h

    The header file of the NlsClient object.

    nlsEvent.h

    The header file of callback events.

    speechRecognizerRequest.h

    The header file of short sentence recognition.

    speechSynthesizerRequest.h

    The header file of speech synthesis and long-text-to-speech synthesis.

    speechTranscriberRequest.h

    The header file of real-time speech recognition.

Compile and run the demo project:

  1. Check the local operating system to ensure that required tools are installed based on the following minimum requirements:

    • Cmake 3.1

    • Glibc 2.5

    • Gcc 4.1.2

  2. Run the following script on the Linux Terminal.

    mkdir build
    cd build && cmake .. && make
    cd... /demo# The following executable demo programs are generated: srDemo for short sentence recognition, stDemo for real-time speech recognition, syDemo for speech synthesis, and syLongDemo for long-text-to-speech synthesis.
    ./stDemo <yourAppkey> <yourAccessKeyId> <yourAccessKeySecret> # The data is used for testing.

Key objects

  • Basic objects

    • NlsClient: the speech processing client, which is equivalent to a factory for all speech processing classes. You can globally create an NlsClient object.

    • NlsEvent: the event object. You can use this object to obtain the request status code, response from the server, and error message.

  • Synthesis object

    SpeechSynthesizerRequest: the request object of the speech synthesis service. It applies to both long and short text.

Custom error codes of the SDK for C++

Error code

Error message

Description and solution

10000001

SSL: couldn't create a ......!

The error message returned because an internal error has occurred. Try again later.

10000002

An official OpenSSL error message.

The error message returned because an internal error has occurred. Resolve the error based on the error message and try again later.

10000003

A system error message.

The error message returned because a system error has occurred. Resolve the error based on the error message

10000004

URL: The url is empty.

The error message returned because no endpoint is specified. Check whether an endpoint is specified.

10000005

URL: Could not parse WebSocket url.

The error message returned because the specified endpoint is invalid. Check whether the specified endpoint is correct.

10000006

MODE: unsupport mode.

The error message returned because the specified Intelligent Speech Interaction service is not supported. Check whether the Intelligent Speech Interaction service is correctly configured.

10000007

JSON: Json parse failed.

The error message returned because the server returns an invalid response. Submit a ticket and provide the task ID to Alibaba Cloud.

10000008

WEBSOCKET: unkown head type.

The error message returned because the server returns an invalid WebSocket type. Submit a ticket and provide the task ID to Alibaba Cloud.

10000009

HTTP: connect failed.

The error message returned because the client fails to connect to the server. Check the network and try again later.

Official HTTP status code

HTTP: Got bad status.

The error message returned because an internal error has occurred. Resolve the error based on the error message.

System error code

IP: ip address is not valid.

The error message returned because the IP address is invalid. Resolve the error based on the error message.

System error code

ENCODE: convert to utf8 error.

The error message returned because the file fails to be converted to the UTF-8 format. Resolve the error based on the error message.

10000010

please check if the memory is enough.

The error message returned because the memory is insufficient. Check the memory of the local device.

10000011

Please check the order of execution.

The error message returned because the client calls methods in an invalid order. For example, if the client receives a failed or complete message, the SDK disconnects the client from the server. If the client calls the relevant method to send data, this error message is returned.

10000012

StartCommand/StopCommand Send failed.

The error message returned because the request contains invalid parameters. Check the settings of request parameters.

10000013

The sent data is null or dataSize <= 0.

The error message returned because the client sends invalid data. Check the settings of request parameters.

10000014

Start invoke failed.

The error message returned because the start method times out. Call the stop method to release resources, and then start the synthesis process again.

10000015

connect failed.

The error message returned because the connection between the client and the server fails. Release resources and start the synthesis process again.

Service status codes

For more information about the service status codes, see the "Service status codes" section of the API reference.

Sample code

Note

  • In the demo, the synthesized audio is stored in a file. If you need to play the audio in real time, we recommend that you use stream playback to receive audio data while playing. This reduces the latency, and does not require you to wait for the synthesized audio streams.

  • You can obtain the complete sample code from the speechSynthesizerDemo.cpp file in the demo folder of the SDK package.

#include <pthread.h>
#include <unistd.h>
#include <sys/time.h>
#include <ctime>
#include <string>
#include <vector>
#include <fstream>
#include "nlsClient.h"
#include "nlsEvent.h"
#include "speechSynthesizerRequest.h"
#include "nlsCommonSdk/Token.h"
using namespace AlibabaNlsCommon;
using AlibabaNls::NlsClient;
using AlibabaNls::NlsEvent;
using AlibabaNls::LogDebug;
using AlibabaNls::LogInfo;
using AlibabaNls::SpeechSynthesizerRequest;

// Customize the thread parameters.
struct ParamStruct {
        std::string text;
        std::string token;
        std::string appkey;
        std::string audioFile;
};

// Customize the callback parameters.
struct ParamCallBack {
        std::string binAudioFile;
        std::ofstream audioFile;
        uint64_t startMs;
};

// Specify a token for service authentication and the timestamp that indicates the validity period of the token. The token and timestamp can be used throughout the project.
// Each time before you call the service, you must check whether the specified token expires.
// If the token expires, you can use the AccessKey ID and AccessKey secret of your Alibaba Cloud account to obtain a new token. Then, reset the g_token and g_expireTime parameters.
// Note: Do not obtain a new token each time you call the speech synthesis service. A token can be used for service authentication when it is valid. In addition, you can use the same token for all Intelligent Speech Interaction services.
std::string g_akId = "";
std::string g_akSecret = "";
std::string g_token = "";
long g_expireTime = -1;

uint64_t getNow() {
        struct timeval now;
        gettimeofday(&now, NULL);
        return now.tv_sec * 1000 * 1000 + now.tv_usec;
}

// Obtain a new token by using the AccessKey ID and AccessKey secret and obtain a timestamp for the validity period of the token.
// A token can be used when it is valid. You can use the same token for multiple processes, multiple threads, or multiple applications. We recommend that you apply for a new token when the current token is about to expire.
int generateToken(std::string akId, std::string akSecret, std::string* token, long* expireTime) {
    NlsToken nlsTokenRequest;
    nlsTokenRequest.setAccessKeyId(akId);
    nlsTokenRequest.setKeySecret(akSecret);

    if (-1 == nlsTokenRequest.applyNlsToken()) {
        // Receive the error message.
        printf("generateToken Failed: %s\n", nlsTokenRequest.getErrorMsg());
        return -1;
    }

    *token = nlsTokenRequest.getToken();
    *expireTime = nlsTokenRequest.getExpireTime();
    return 0;
}

// @brief: When the SDK receives a message from the server indicating that the synthesis task is completed, the SDK reports a Completed event in an internal thread.
// @note: After a Completed event is reported, the SDK disconnects the client from the server in an internal thread.
// @param cbEvent: the syntax of the event in a callback. For more information, see the nlsEvent.h file.
// @param cbParam: the custom parameter in a callback. The default value is null. You can set this parameter based on your business requirements.
void OnSynthesisCompleted(NlsEvent* cbEvent, void* cbParam) {
    ParamCallBack* tmpParam = (ParamCallBack*)cbParam;
    // The following code demonstrates how to obtain details of the Completed event and customize callback parameters.
    printf("OnSynthesisCompleted: %s\n", tmpParam->binAudioFile.c_str());
    // Obtain the status code returned for the call. If the call is successful, the code 0 or 20000000 is returned. If the call fails, the corresponding error code is returned.
    // The ID of the current synthesis task. The task ID is the unique identifier that indicates the interaction between the caller and the server. You must record the task ID. If an error occurs, you can submit a ticket and provide the task ID to Alibaba Cloud to facilitate troubleshooting.
    printf("OnSynthesisCompleted: status code=%d, task id=%s\n", cbEvent->getStatusCode(), cbEvent->getTaskId());
    // Obtain the complete information returned by the server.
    //printf("OnSynthesisCompleted: all response=%s\n", cbEvent->getAllResponse());
}

// @brief: When an error occurs during the synthesis process, the SDK reports a TaskFailed event in an internal thread.
// @note: After a TaskFailed event is reported, the SDK disconnects the client from the server in an internal thread.
// @param cbEvent: the syntax of the event in a callback. For more information, see the nlsEvent.h file.
// @param cbParam: the custom parameter in a callback. The default value is null. You can set this parameter based on your business requirements.
void OnSynthesisTaskFailed(NlsEvent* cbEvent, void* cbParam) {
        ParamCallBack* tmpParam = (ParamCallBack*)cbParam;
    // The following code demonstrates how to obtain details of the TaskFailed event and customize callback parameters.
    printf("OnSynthesisTaskFailed: %s\n", tmpParam->binAudioFile.c_str());
    // The ID of the current synthesis task. The task ID is the unique identifier that indicates the interaction between the caller and the server. You must record the task ID. If an error occurs, you can submit a ticket and provide the task ID to Alibaba Cloud to facilitate troubleshooting.
    printf("OnSynthesisTaskFailed: status code=%d, task id=%s, error message=%s\n", cbEvent->getStatusCode(), cbEvent->getTaskId(), cbEvent->getErrorMessage());
}

// @brief: When the synthesis ends or an error occurs during the synthesis process, the SDK disconnects the client from the server and reports a ChannelClosed event in an internal thread.
// @param cbEvent: the syntax of the event in a callback. For more information, see the nlsEvent.h file.
// @param cbParam: the custom parameter in a callback. The default value is null. You can set this parameter based on your business requirements.
void OnSynthesisChannelClosed(NlsEvent* cbEvent, void* cbParam) {
        ParamCallBack* tmpParam = (ParamCallBack*)cbParam;
    // The following code demonstrates how to obtain details of the ChannelCloseed event and customize callback parameters.
    printf("OnSynthesisChannelClosed: %s\n", tmpParam->binAudioFile.c_str());
        printf("OnSynthesisChannelClosed: %s\n", cbEvent->getAllResponse());
        tmpParam->audioFile.close();
        delete tmpParam; // The synthesis process ends and the callback parameter is released.
}

//@brief: After the text is sent and the synthesized binary audio data is returned by the server, the SDK reports a BinaryDataRecved event to the client in an internal thread.
// @param cbEvent: the syntax of the event in a callback. For more information, see the nlsEvent.h file.
// @param cbParam: the custom parameter in a callback. The default value is null. You can set this parameter based on your business requirements.
void OnBinaryDataRecved(NlsEvent* cbEvent, void* cbParam) {
        ParamCallBack* tmpParam = (ParamCallBack*)cbParam;
        if(tmpParam->startMs > 0 ) {
                // Note: After the client detects an audio stream, for example, when the client receives the synthesized audio stream from the server for the first time, the client starts to process the audio data. The audio data may be used for playback. The sample code demonstrates how to save the audio data in a local file.
                // Calculate the first packet latency of the speech synthesis service when the client receives the audio data for the first time. The calculated latency contains the time used to call the start method, namely the time used for the client to connect to the server. This time may vary greatly with the network conditions.
                uint64_t now = getNow();
                printf("first latency = %lld ms, task id = %s\n", (now - tmpParam->startMs) / 1000, cbEvent->getTaskId());
                tmpParam->startMs = 0;
        }

        // The following code demonstrates how to obtain details of the BinaryDataRecved event and customize callback parameters.
    printf("OnBinaryDataRecved: %s\n", tmpParam->binAudioFile.c_str());
        const std::vector<unsigned char>& data = cbEvent->getBinaryData(); // Call the getBinaryData method to obtain the synthesized binary audio data.
    printf("OnBinaryDataRecved: status code=%d, task id=%s, data size=%d\n", cbEvent->getStatusCode(), cbEvent->getTaskId(), data.size());
    // Append the binary audio data to the local file.
        if (data.size() > 0) {
                tmpParam->audioFile.write((char*)&data[0], data.size());
        }
}

//@brief: Return the logs corresponding to the processed text as well as the incremental subtitle data.
// @param cbEvent: the syntax of the event in a callback. For more information, see the nlsEvent.h file.
// @param cbParam: the custom parameter in a callback. The default value is null. You can set this parameter based on your business requirements.
void OnMetaInfo(NlsEvent* cbEvent, void* cbParam) {
        ParamCallBack* tmpParam = (ParamCallBack*)cbParam;
        // The following code demonstrates how to obtain details of the BinaryDataRecved event and customize callback parameters.
    printf("OnBinaryDataRecved: %s\n", tmpParam->binAudioFile.c_str());
    printf("OnMetaInfo: task id=%s, respose=%s\n", cbEvent->getTaskId(), cbEvent->getAllResponse());
}

// The worker thread.
void* pthreadFunc(void* arg) {
        // 0. Obtain parameters such as the token and configuration files from custom thread parameters.
        ParamStruct* tst = (ParamStruct*)arg;
        if (tst == NULL) {
                printf("arg is not valid\n");
                return NULL;
        }

        // 1. Initialize the custom callback parameters.
        ParamCallBack* cbParam = new ParamCallBack;
        cbParam->binAudioFile = tst->audioFile;
        cbParam->audioFile.open(cbParam->binAudioFile.c_str(), std::ios::binary | std::ios::out);

        // 2. Create a SpeechSynthesizerRequest object.
        SpeechSynthesizerRequest* request = NlsClient::getInstance()->createSynthesizerRequest();
        if (request == NULL) {
                printf("createSynthesizerRequest failed.\n");
                cbParam->audioFile.close();
                return NULL;
        }

        request->setOnSynthesisCompleted(OnSynthesisCompleted, cbParam); // Set a callback to be fired when the speech synthesis task is completed.
        request->setOnChannelClosed(OnSynthesisChannelClosed, cbParam); // Set a callback to be fired when the speech synthesis channel is closed.
        request->setOnTaskFailed(OnSynthesisTaskFailed, cbParam); // Set a callback to be fired when an unexpected failure occurs.
        request->setOnBinaryDataReceived(OnBinaryDataRecved, cbParam); // Set a callback for receiving text and audio data.
        request->setOnMetaInfo(OnMetaInfo, cbParam); // Set the subtitle of the generated speech.

        request->setAppKey(tst->appkey.c_str());
        request->setText(tst->text.c_str()); // Specify the text to be processed. This parameter is required. The content of the text must be UTF-8 encoded.
    request->setVoice("siqi");                          // Specify the speaker type. This parameter is optional. Valid values: xiaoyun, ruoxi, and xiaogang. Default value: xiaoyun. For more information about the available speaker types, see the relevant topic.
    request->setVolume(50);                          // Specify the volume. This parameter is optional. Valid values: 0 to 100. Default value: 50.
    request->setFormat("wav");                         // Specify the audio encoding format. This parameter is optional. Default value: wav. Valid values: pcm, wav, and mp3.
    request->setSampleRate(8000);                  // Specify the audio sampling rate. This parameter is optional. Valid values: 16000 and 8000. Default value: 16000.
    request->setSpeechRate(0);                          // Specify the speed. This parameter is optional. Valid values: -500 to 500. Default value: 0.
    request->setPitchRate(0);                          // Specify the intonation. This parameter is optional. Valid values: -500 to 500. Default value: 0.
        //request->setEnableSubtitle(true);          // Specify whether to enable subtitling for the generated speech. This parameter is optional. Note that not all the speaker types support subtitling.
        request->setToken(tst->token.c_str()); // Specify the token used for account authentication. This parameter is required.

        cbParam->startMs = getNow();
        // 3. Call the start method in asynchronous callback mode. If the method is called, a BinaryRecv event is returned. If the method fails, a TaskFailed event is returned.
        if (request->start() < 0) {
                printf("start() failed. may be can not connect server. please check network or firewalld\n");
                NlsClient::getInstance()->releaseSynthesizerRequest(request); // The start method fails. The SpeechSynthesizerRequest object is released.
                cbParam->audioFile.close();
                return NULL;
        }

        // 4: Notify the server that the audio data is sent.
        // Call the stop method in asynchronous callback mode. If the method fails, a TaskFailed event is returned.
        request->stop();

        // 5. Release the SpeechSynthesizerRequest object after the synthesis is completed.
        NlsClient::getInstance()->releaseSynthesizerRequest(request);
        return NULL;
}

// Synthesize speech from a single text file.
int speechSynthesizerFile(const char* appkey) {
        // Obtain the timestamp of the current system time to check whether the token expires.
    std::time_t curTime = std::time(0);
    if (g_expireTime - curTime < 10) {
                printf("the token will be expired, please generate new token by AccessKey-ID and AccessKey-Secret.\n");
        if (-1 == generateToken(g_akId, g_akSecret, &g_token, &g_expireTime)) {
            return -1;
        }
    }

        ParamStruct pa;
        pa.token = g_token;
    pa.appkey = appkey;

        // Note: In the Windows operating system, if the text to be processed contains Chinese characters, encode this CPP file in signed UTF-8 or GB2312.
        pa.text = "It is a nice day for an outdoor trip." ;
    pa.audioFile = "syAudio.wav";

        pthread_t pthreadId;
        // Start a worker thread to perform speech synthesis.
        pthread_create(&pthreadId, NULL, &pthreadFunc, (void *)&pa);
        pthread_join(pthreadId, NULL);
        return 0;
}

// Synthesize speech from multiple text files.
// If the SDK uses multiple concurrent threads at a time, the SDK processes each text file in a thread. The SDK does not process the same text file in different threads.
// In the sample code, two threads are used to synthesize speech from two text files.
// If you are a free-trial user, you can make only a maximum of two concurrent calls.
#define AUDIO_TEXT_NUMS 2
#define AUDIO_TEXT_LENGTH 64
#define AUDIO_FILE_NAME_LENGTH 32
int speechSynthesizerMultFile(const char* appkey) {
        // Obtain the timestamp of the current system time to check whether the token expires.
    std::time_t curTime = std::time(0);
    if (g_expireTime - curTime < 10) {
                printf("the token will be expired, please generate new token by AccessKey-ID and AccessKey-Secret.\n");
        if (-1 == generateToken(g_akId, g_akSecret, &g_token, &g_expireTime)) {
            return -1;
        }
    }

    const char syAudioFiles[AUDIO_TEXT_NUMS][AUDIO_FILE_NAME_LENGTH] = {"syAudio0.wav", "syAudio1.wav"};
        const char texts[AUDIO_TEXT_NUMS][AUDIO_TEXT_LENGTH] = {"It is a nice day. I want to play football on the playground.", "There will be a heavy rain tomorrow, so we'd better stay in home and watch a movie."} ;
        ParamStruct pa[AUDIO_TEXT_NUMS];

        for (int i = 0; i < AUDIO_TEXT_NUMS; i ++) {
                pa[i].token = g_token;
        pa[i].appkey = appkey;
                pa[i].text = texts[i];
        pa[i].audioFile = syAudioFiles[i];
        }

        std::vector<pthread_t> pthreadId(AUDIO_TEXT_NUMS);
        // Start two worker threads and synthesize speech from two text files at a time.
        for (int j = 0; j < AUDIO_TEXT_NUMS; j++) {
                pthread_create(&pthreadId[j], NULL, &pthreadFunc, (void *)&(pa[j]));
        }
        for (int j = 0; j < AUDIO_TEXT_NUMS; j++) {
                pthread_join(pthreadId[j], NULL);
        }
        return 0;
}

int main(int arc, char* argv[]) {
    if (arc < 4) {
                printf("params is not valid. Usage: ./demo <your appkey> <your AccessKey ID> <your AccessKey Secret>\n");
        return -1;
    }

    std::string appkey = argv[1];
    g_akId = argv[2];
    g_akSecret = argv[3];

        // Configure output logs of the SDK. The configuration is optional. As configured in the following code, the SDK logs are generated in the log-Synthesizer.txt. LogDebug specifies that logs at all levels are generated.
        int ret = NlsClient::getInstance()->setLogConfig("log-synthesizer", LogDebug);
        if (-1 == ret) {
                printf("set log failed\n");
                return -1;
        }

        // Start the worker thread.
        NlsClient::getInstance()->startWorkThread(4);

        // Synthesize speech from a single text file.
        speechSynthesizerFile(appkey.c_str());

        // Synthesize speech from multiple text files.
        // speechSynthesizerMultFile(appkey.c_str());

        // All the tasks are completed. Release the NlsClient object before the process exits. Note that the releaseInstance method is not thread-safe.
        NlsClient::releaseInstance();
        return 0;
}