All Products
Search
Document Center

Intelligent Speech Interaction:SDK for C++

Last Updated:Mar 03, 2023

This topic describes how to use the C ++ SDK provided by Alibaba Cloud Intelligent speech interaction, including the SDK installation method and SDK sample code.

Note

  • The latest version of the SDK for C++ is 3.0.8, which was released on January 09, 2020.

    This version applies only to the Linux operating system. The Windows operating system is not supported.

  • Before you use this SDK, make sure that you understand how this SDK works. Fore more information, see Overview.

  • The methods of this SDK version are different from those of the earlier version. If you are familiar with the earlier version, pay attention to the updated methods described in this topic.

Download and installation

Download the SDK:

To download the SDK for C++. The compressed package contains the following files or folders:

  • CMakeLists.txt: the CMakeList file of the demo project.

  • readme.txt: the SDK description.

  • release.log: the release notes.

  • version: the version number.

  • build.sh: the demo compilation script.

  • lib: the SDK libraries.

  • build: the compilation directory.

  • demo: the folder that contains demo.cpp files, which are the configuration files of various Intelligent Speech Interaction services. The following table describes the files contained in the folder.

    File name

    Description

    speechRecognizerDemo.cpp

    The demo of short sentence recognition.

    speechSynthesizerDemo.cpp

    The demo of speech synthesis.

    speechTranscriberDemo.cpp

    The demo of real-time speech recognition.

    speechLongSynthesizerDemo.cpp

    The demo of long-text-to-speech synthesis.

    test0.wav/test1.wav

    The 16-bit audio files with a sampling rate of 16,000 Hz for testing.

  • include: the folder that contains SDK header files. The following table describes the files contained in the folder.

    File name

    Description

    nlsClient.h

    The header file of the NlsClient object.

    nlsEvent.h

    The header file of callback events.

    speechRecognizerRequest.h

    The header file of short sentence recognition.

    speechSynthesizerRequest.h

    The header file of speech synthesis and long-text-to-speech synthesis.

    speechTranscriberRequest.h

    The header file of real-time speech recognition.

Compile and run the demo project:

  1. Check the local operating system to ensure that required tools are installed based on the following minimum requirements:

    1. Cmake 3.1

    2. Glibc 2.5

    3. Gcc 4.1.2

  2. Run the following script on the Linux terminal.

    mkdir build
    cd build && cmake .. && make
    cd... /demo# The following executable demo programs are generated: srDemo for short sentence recognition, stDemo for real-time speech recognition, syDemo for speech synthesis, and syLongDemo for long-text-to-speech synthesis.
    ./stDemo appkey <yourAccessKey Id> <yourAccessKey Secret> # The data is used for testing.

Key objects

  • Basic objects

    • NlsClient: the speech processing client, which is equivalent to a factory for all speech processing classes. You can globally create an NlsClient object.

    • NlsEvent: the event object. You can use this object to obtain the request status code, response from the server, and error message.

  • Recognition object

    SpeechTranscriberRequest: The request object of real-time speech recognition. It is used for real-time speech recognition.

Error codes of the SDK for C++

Error code

Error message

Description and solution

10000001

SSL: couldn't create a ......!

The error message returned because an internal error has occurred. Try again later.

10000002

An official OpenSSL error message.

The error message returned because an internal error has occurred. Resolve the error based on the error message and try again later.

10000003

A system error message.

The error message returned because a system error has occurred. Resolve the error based on the error message.

10000004

URL: The url is empty.

The error message returned because no endpoint is specified. Check whether an endpoint is specified.

10000005

URL: Could not parse WebSocket url.

The error message returned because the specified endpoint is invalid. Check whether the specified endpoint is correct.

10000006

MODE: unsupport mode.

The error message returned because the specified Intelligent Speech Interaction service is not supported. Check whether the Intelligent Speech Interaction service is correctly configured.

10000007

JSON: Json parse failed.

The error message returned because the server returns an invalid response. Submit a ticket and provide the task ID to Alibaba Cloud.

10000008

WEBSOCKET: unkown head type.

The error message returned because the server returns an invalid WebSocket type. Submit a ticket and provide the task ID to Alibaba Cloud.

10000009

HTTP: connect failed.

The error message returned because the client fails to connect to the server. Check the network and try again later.

Official HTTP status code

HTTP: Got bad status.

The error message returned because an internal error has occurred. Resolve the error based on the error message.

System error code

IP: ip address is not valid.

The error message returned because the IP address is invalid. Resolve the error based on the error message.

System error code

ENCODE: convert to utf8 error.

The error message returned because the file fails to be converted to the UTF-8 format. Resolve the error based on the error message.

10000010

please check if the memory is enough.

The error message returned because the memory is insufficient. Check the memory of the local device.

10000011

Please check the order of execution.

The error message returned because the client calls methods in an invalid order. For example, if the client receives a failed or complete message, the SDK disconnects the client from the server. If the client calls the relevant method to send data, this error message is returned.

10000012

StartCommand/StopCommand Send failed.

The error message returned because the request contains invalid parameters. Check the settings of request parameters.

10000013

The sent data is null or dataSize <= 0.

The error message returned because the client sends invalid data. Check the settings of request parameters.

10000014

Start invoke failed.

The error message returned because the start method times out. Call the stop method to release resources, and then start the recognition process again.

10000015

connect failed.

The error message returned because the connection between the client and the server fails. Release resources and start the recognition process again.

Service status codes

For more information about the service status codes, see the "Service status codes" section of the API reference.

Sample code

Note

  • The demo uses an audio file with the sampling rate of 16,000 Hz. To obtain correct recognition results, set the model to universal model for the project to which the appkey is bound in the Intelligent Speech Interaction console. You must select a model that matches the audio sampling rate based on your business scenario. For more information about model setting, see Manage projects.

  • You can obtain the complete sample code from the speechTranscriberDemo.cpp file in the demo folder of the SDK package.

#include <pthread.h>
#include <unistd.h>
#include <ctime>
#include <stdlib.h>
#include <string.h>
#include <string>
#include <vector>
#include <fstream>
#include "nlsClient.h"
#include "nlsEvent.h"
#include "speechTranscriberRequest.h"
#include "nlsCommonSdk/Token.h"

#define FRAME_SIZE 3200
#define SAMPLE_RATE 16000
using namespace AlibabaNlsCommon;
using AlibabaNls::NlsClient;
using AlibabaNls::NlsEvent;
using AlibabaNls::LogDebug;
using AlibabaNls::LogInfo;
using AlibabaNls::SpeechTranscriberRequest;

// Customize the thread parameters.
struct ParamStruct {
    std::string fileName;
    std::string token;
    std::string appkey;
};

// Customize the callback parameters.
struct ParamCallBack {
    int userId;
    char userInfo[10];
};

// Specify a token for service authentication and the timestamp that indicates the validity period of the token. The token and timestamp can be used throughout the project.
// Each time before you call the service, you must check whether the specified token expires.
// If the token expires, you can use the AccessKey ID and AccessKey secret of your Alibaba Cloud account to obtain a new token. Then, reset the g_token and g_expireTime parameters.
// Note: Do not obtain a new token each time you call the real-time speech recognition service. A token can be used for service authentication when it is valid. In addition, you can use the same token for all Intelligent Speech Interaction services.
std::string g_akId = "";
std::string g_akSecret = "";
std::string g_token = "";
long g_expireTime = -1;

// Obtain a new token by using the AccessKey ID and AccessKey secret and obtain a timestamp for the validity period of the token.
// A token can be used when it is valid. You can use the same token for multiple processes, multiple threads, or multiple applications. We recommend that you apply for a new token when the current token is about to expire.
int generateToken(std::string akId, std::string akSecret, std::string* token, long* expireTime) {
    NlsToken nlsTokenRequest;
    nlsTokenRequest.setAccessKeyId(akId);
    nlsTokenRequest.setKeySecret(akSecret);

    if (-1 == nlsTokenRequest.applyNlsToken()) {
        // Receive the error message.
        printf("generateToken Failed: %s\n", nlsTokenRequest.getErrorMsg());
        return -1;
    }

    *token = nlsTokenRequest.getToken();
    *expireTime = nlsTokenRequest.getExpireTime();
    return 0;
}

// @brief: Call the sendAudio method to obtain the sleep duration of audio data sending.
// @param dataSize: the size of the audio data to be sent.
// @param sampleRate: the audio sampling rate. Supported sampling rates include 8,000 Hz and 16,000 Hz.
// @param compressRate: the data compression rate. Set this parameter to 10 for Opus-encoded audio data with a sampling rate of 16,000 Hz and a data compression rate of 10:1. Set this parameter to 1 if the data is not compressed.
// @return: the sleep duration after the audio data is sent.
// @note: For 16-bit pulse-code modulation (PCM)-encoded audio data with a sampling rate of 8,000 Hz, we recommend that you set the sleep duration to 100 ms for every 1,600 bytes sent.
For 16-bit PCM-encoded audio data with a sampling rate of 16,000 Hz, we recommend that you set the sleep duration to 100 ms for every 3,200 bytes sent.
// For audio data in other formats, calculate the sleep duration based on the compression rate. For example, if the compression rate is 10:1 for Opus-encoded audio data with a sampling rate of 16,000 Hz,
the sleep duration is calculated in the following way: 3200/10 = 320 ms.
unsigned int getSendAudioSleepTime(int dataSize, int sampleRate, int compressRate) {
    // Only 16-bit audio data is supported.
    const int sampleBytes = 16;
    // Only mono audio data is supported.
    const int soundChannel = 1;

    // The current sampling rate, which indicates the size of data in the specified audio bit depth sampled per second.
    int bytes = (sampleRate * sampleBytes * soundChannel) / 8;
    // The current sampling rate, which indicates the size of data in the specified audio bit depth sampled per millisecond.
    int bytesMs = bytes / 1000;
    // The sleep duration is the size of the audio data to be sent divided by the sampling rate per millisecond.
    int sleepMs = (dataSize * compressRate) / bytesMs;
    return sleepMs;
}

// @brief: Call the start method to connect the client to the server. The SDK reports a started event in an internal thread.
// @param cbEvent: the syntax of the event in a callback. For more information, see the nlsEvent.h file.
// @param cbParam: the custom parameter in a callback. The default value is null. You can set this parameter based on your business requirements.
void onTranscriptionStarted(NlsEvent* cbEvent, void* cbParam) {
        ParamCallBack* tmpParam = (ParamCallBack*)cbParam;
    // The following code demonstrates how to obtain details of the started event and customize callback parameters.
    printf("onTranscriptionStarted: %d\n", tmpParam->userId);
    // The ID of the current recognition task. The task ID is the unique identifier that indicates the interaction between the caller and the server. You must record the task ID. If an error occurs, you can submit a ticket and provide the task ID to Alibaba Cloud to facilitate troubleshooting.
    printf("onTranscriptionStarted: status code=%d, task id=%s\n", cbEvent->getStatusCode(), cbEvent->getTaskId());
    // Obtain the complete information returned by the server.
    //printf("onTranscriptionStarted: all response=%s\n", cbEvent->getAllResponse());
}

// @brief: The server detects the beginning of a sentence. Then, the SDK reports a SentenceBegin event in an internal thread.
// @param cbEvent: the syntax of the event in a callback. For more information, see the nlsEvent.h file.
// @param cbParam: the custom parameter in a callback. The default value is null. You can set this parameter based on your business requirements.
void onSentenceBegin(NlsEvent* cbEvent, void* cbParam) {
        ParamCallBack* tmpParam = (ParamCallBack*)cbParam;
    // The following code demonstrates how to obtain details of the SentenceBegin event and customize callback parameters.
    printf("onSentenceBegin: %d\n", tmpParam->userId);
    // The ID of the current recognition task. The task ID is the unique identifier that indicates the interaction between the caller and the server. You must record the task ID. If an error occurs, you can submit a ticket and provide the task ID to Alibaba Cloud to facilitate troubleshooting.
    printf("onSentenceBegin: status code=%d, task id=%s, index=%d, time=%d\n", cbEvent->getStatusCode(), cbEvent->getTaskId(),
                cbEvent->getSentenceIndex(), // The sequence number of the sentence, which starts from 1.
                cbEvent->getSentenceTime() // The duration of the audio stream that has been processed, in milliseconds.
                );
    // Obtain the complete information returned by the server.
    //printf("onTranscriptionStarted: all response=%s\n", cbEvent->getAllResponse());
}

// @brief: The server detects the end of a sentence. Then, the SDK reports a SentenceEnd event in an internal thread.
// @param cbEvent: the syntax of the event in a callback. For more information, see the nlsEvent.h file.
// @param cbParam: the custom parameter in a callback. The default value is null. You can set this parameter based on your business requirements.
void onSentenceEnd(NlsEvent* cbEvent, void* cbParam) {
        ParamCallBack* tmpParam = (ParamCallBack*)cbParam;
    // The following code demonstrates how to obtain details of the SentenceEnd event and customize callback parameters.
    printf("onSentenceEnd: %d\n", tmpParam->userId);
    // The ID of the current recognition task. The task ID is the unique identifier that indicates the interaction between the caller and the server. You must record the task ID. If an error occurs, you can submit a ticket and provide the task ID to Alibaba Cloud to facilitate troubleshooting.
    printf("onSentenceEnd: status code=%d, task id=%s, index=%d, time=%d, begin_time=%d, result=%s\n", cbEvent->getStatusCode(), cbEvent->getTaskId(),
                cbEvent->getSentenceIndex(), // The sequence number of the sentence, which starts from 1.
                cbEvent->getSentenceTime() // The duration of the audio stream that has been processed, in milliseconds.
                cbEvent->getSentenceBeginTime(), // The time when the SentenceBegin event occurred.
                cbEvent->getResult()    // The recognition result of the current sentence.
                );
        //  << ", confidence: " << cbEvent->getSentenceConfidence()    // The confidence level of the recognition result. Valid values: 0.0 to 1.0. A larger value indicates a higher confidence level.
        //  << ", stashResult begin_time: " << cbEvent->getStashResultBeginTime() // The time when the next sentence begins.
        //  << ", stashResult current_time: " << cbEvent->getStashResultCurrentTime() // The current time when the next sentence is being processed.
        //  << ", stashResult Sentence_id: " << cbEvent->getStashResultSentenceId() //The ID of the sentence.
        //  << ", stashResult Text: " << cbEvent->getStashResultText() // The beginning words of the next sentence.
    // Obtain the complete information returned by the server.
    //printf("onTranscriptionStarted: all response=%s\n", cbEvent->getAllResponse());
}

// @brief: The recognition result is updated. When the SDK receives the updated result, the SDK reports a ResultChanged event in an internal thread.
// @param cbEvent: the syntax of the event in a callback. For more information, see the nlsEvent.h file.
// @param cbParam: the custom parameter in a callback. The default value is null. You can set this parameter based on your business requirements.
void onTranscriptionResultChanged(NlsEvent* cbEvent, void* cbParam) {
        ParamCallBack* tmpParam = (ParamCallBack*)cbParam;
    // The following code demonstrates how to obtain details of the ResultChanged event and customize callback parameters.
    printf("onTranscriptionResultChanged: %d\n", tmpParam->userId);
    // The ID of the current recognition task. The task ID is the unique identifier that indicates the interaction between the caller and the server. You must record the task ID. If an error occurs, you can submit a ticket and provide the task ID to Alibaba Cloud to facilitate troubleshooting.
    printf("onTranscriptionResultChanged: status code=%d, task id=%s, index=%d, time=%d, result=%s\n", cbEvent->getStatusCode(), cbEvent->getTaskId(),
                cbEvent->getSentenceIndex(), // The sequence number of the sentence, which starts from 1.
                cbEvent->getSentenceTime() // The duration of the audio stream that has been processed, in milliseconds.
                cbEvent->getResult()    // The recognition result of the current sentence.
                );
    // Obtain the complete information returned by the server.
    //printf("onTranscriptionStarted: all response=%s\n", cbEvent->getAllResponse());
}

// @brief: When the server stops the real-time recognition of the audio stream, the SDK reports a Completed event in an internal thread.
// @note: After a Completed event is reported, the SDK disconnects the client from the server in an internal thread. At this time, if you call the sendAudio method, -1 is returned. Stop sending audio data in this case.
// @param cbEvent: the syntax of the event in a callback. For more information, see the nlsEvent.h file.
// @param cbParam: the custom parameter in a callback. The default value is null. You can set this parameter based on your business requirements.
void onTranscriptionCompleted(NlsEvent* cbEvent, void* cbParam) {
        ParamCallBack* tmpParam = (ParamCallBack*)cbParam;
    // The following code demonstrates how to obtain details of the Completed event and customize callback parameters.
    printf("onTranscriptionCompleted: %d\n", tmpParam->userId);
    printf("onTranscriptionCompleted: status code=%d, task id=%s\n", cbEvent->getStatusCode(), cbEvent->getTaskId());
}

// @brief: When an error occurs during the recognition process that covers calls of the start, send, and stop methods, the SDK reports a TaskFailed event in an internal thread.
// @note: After a TaskFailed event is reported, the SDK disconnects the client from the server in an internal thread. At this time, if you call the sendAudio method, -1 is returned. Stop sending audio data in this case.
// @param cbEvent: the syntax of the event in a callback. For more information, see the nlsEvent.h file.
// @param cbParam: the custom parameter in a callback. The default value is null. You can set this parameter based on your business requirements.
void onTaskFailed(NlsEvent* cbEvent, void* cbParam) {
        ParamCallBack* tmpParam = (ParamCallBack*)cbParam;
    // The following code demonstrates how to obtain details of the TaskFailed event and customize callback parameters.
    printf("onTaskFailed: %d\n", tmpParam->userId);
    printf("onTaskFailed: status code=%d, task id=%s, error message=%s\n", cbEvent->getStatusCode(), cbEvent->getTaskId(), cbEvent->getErrorMessage());
    // Obtain the complete information returned by the server.
    //printf("onTaskFailed: all response=%s\n", cbEvent->getAllResponse());
}

// @brief: The SDK reports the final recognition result in an internal thread.
// @param cbEvent: the syntax of the event in a callback. For more information, see the nlsEvent.h file.
// @param cbParam: the custom parameter in a callback. The default value is null. You can set this parameter based on your business requirements.
void onSentenceSemantics(NlsEvent* cbEvent, void* cbParam) {
    ParamCallBack* tmpParam = (ParamCallBack*)cbParam;
    // The following code demonstrates how to obtain details of the SentenceSemantics event and customize callback parameters.
    printf("onSentenceSemantics: %d\n", tmpParam->userId);
    // Obtain the complete information returned by the server.
    printf("onSentenceSemantics: all response=%s\n", cbEvent->getAllResponse());
}

// @brief: When the recognition ends or an error occurs during the recognition process, the SDK disconnects the client from the server and reports a ChannelClosed event in an internal thread.
// @param cbEvent: the syntax of the event in a callback. For more information, see the nlsEvent.h file.
// @param cbParam: the custom parameter in a callback. The default value is null. You can set this parameter based on your business requirements.
void onChannelClosed(NlsEvent* cbEvent, void* cbParam) {
        ParamCallBack* tmpParam = (ParamCallBack*)cbParam;
    delete tmpParam; // The recognition process ends and the callback parameter is released.
}

// The worker thread.
void* pthreadFunc(void* arg) {
    int sleepMs = 0;
    ParamCallBack *cbParam = NULL;
    // Initialize custom callback parameters. The following settings are used as an example to demonstrate how to pass parameters. The settings have no effect on the demo.
    // The settings of callback parameters are stored in a heap. When the SDK clears the request objects, it clears the parameter settings as well. You do not need to manually release the parameters.
    cbParam = new ParamCallBack;
    cbParam->userId = 1234;
    strcpy(cbParam->userInfo, "User.");

    // 1. Obtain parameters such as the token and configuration files from custom thread parameters.
    ParamStruct* tst = (ParamStruct*)arg;
    if (tst == NULL) {
        printf("arg is not valid\n");
        return NULL;
    }

    /* Open the audio file and obtain audio data.*/
    std::ifstream fs;
    fs.open(tst->fileName.c_str(), std::ios::binary | std::ios::in);
    if (!fs) {
        printf("%s isn't exist..\n", tst->fileName.c_str());
        return NULL;
    }

    // 2. Create the SpeechTranscriberRequest object of real-time speech recognition.
    SpeechTranscriberRequest* request = NlsClient::getInstance()->createTranscriberRequest();
    if (request == NULL) {
        printf("createTranscriberRequest failed.\n");
        return NULL;
    }

    request->setOnTranscriptionStarted(onTranscriptionStarted, cbParam);                // Set a callback to be fired when the speech recognition starts.
    request->setOnTranscriptionResultChanged(onTranscriptionResultChanged, cbParam);    // Set a callback to be fired when a recognition result is returned.
    request->setOnTranscriptionCompleted(onTranscriptionCompleted, cbParam);            // Set a callback to be fired when the speech recognition is completed.
    request->setOnSentenceBegin(onSentenceBegin, cbParam);                              // Set a callback to be fired when the beginning of a sentence is detected.
    request->setOnSentenceEnd(onSentenceEnd, cbParam);                                  // Set a callback to be fired when the end of a sentence is detected.
    request->setOnTaskFailed(onTaskFailed, cbParam);                                    // Set a callback to be fired when an error occurs.
    request->setOnChannelClosed(onChannelClosed, cbParam);                              // Set a callback to be fired when the TCP connection set up for the recognition task is closed.
    request->setOnSentenceSemantics(onSentenceSemantics, cbParam);                      // Set a callback to be fired when an updated recognition result is returned. The result is returned when the enable_nlp parameter is used.

    request->setAppKey(tst->appkey.c_str());            // Specify the appkey. This parameter is required. If you do not have an appkey, obtain it as instructed on the Alibaba Cloud international site (alibabacloud.com).
        request->setFormat("pcm");                          // Specify the audio encoding format. Default value: pcm.
        request->setSampleRate(SAMPLE_RATE);                // Specify the audio sampling rate. This parameter is optional. Valid values: 16000 and 8000. Default value: 16000.
        request->setIntermediateResult(true);               // Specify whether to return intermediate recognition results. This parameter is optional. Default value: false.
        request->setPunctuationPrediction(true);            // Specify whether to add punctuation marks during post-processing. This parameter is optional. Default value: false.
        request->setInverseTextNormalization(true);         // Specify whether to convert Chinese numerals to Arabic numerals during post-processing. This parameter is optional. Default value: false.

    // Specify the threshold for detecting the end of a sentence. If the silence duration exceeds the specified threshold, the system determines the end of a sentence. Unit: milliseconds. Valid values: 200 to 2000. Default value: 800.
    //request->setMaxSentenceSilence(800);
    //request->setCustomizationId("TestId_123"); // Specify the ID of the custom model. This parameter is optional.
    //request->setVocabularyId("TestId_456"); // Specify the vocabulary ID of custom extensive hotwords. This parameter is optional.
    // Pass custom or advanced parameters in the JSON format of {"key": "value"}.
    //request->setPayloadParam("{\"vad_model\": \"farfield\"}");
    // Specify whether to return the recognition results of words.
    request->setPayloadParam("{\"enable_words\": true}");

    // Specify whether to enable voice activity detection (VAD). Default value: false. We recommend that you do not enable VAD unless otherwise required.
    //request->setPayloadParam("{\"enable_semantic_sentence_detection\": false}");
    // Specify whether to enable disfluency detection. Default value: false. We recommend that you do not enable disfluency detection unless otherwise required.
    //request->setPayloadParam("{\"disfluency\": true}");

    // Specify the ID of the VAD mode. By default, this parameter is left empty. We recommend that you do not set this parameter unless otherwise required.
    //request->setPayloadParam("{\"vad_model\": \"farfield\"}");
    // Specify whether to ignore the recognition timeout issue of a single sentence.
    //request->setPayloadParam("{\"enable_ignore_sentence_timeout\": false}");
    // Specify whether to enable post-processing for VAD. Default value: false. We recommend that you do not enable post-processing unless otherwise required.
    //request->setPayloadParam("{\"enable_vad_unify_post\": true}");

    request->setToken(tst->token.c_str());

    // 3. Call the start method in asynchronous callback mode. If the method is called, a started event is returned. If the method fails, a TaskFailed event is returned.
    if (request->start() < 0) {
                printf("start() failed. may be can not connect server. please check network or firewalld\n");
        NlsClient::getInstance()->releaseTranscriberRequest(request); // The start method fails. The SpeechTranscriberRequest object is released.
        return NULL;
    }

    while (!fs.eof()) {
        uint8_t data[FRAME_SIZE] = {0};

        fs.read((char *)data, sizeof(uint8_t) * FRAME_SIZE);
        size_t nlen = fs.gcount();
        if (nlen <= 0) {
            continue;
        }

        // 4. Send audio data. If the sendAudio method returns -1, indicating that data fails to be sent, the client stops sending data.
        int ret = request->sendAudio(data, nlen);
        if (ret < 0) {
            // Indicate that data fails to be sent. The client stops sending data cyclically.
            printf("send data fail.\n");
            break;
        }

        // Set the transmission speed of data sending:
        // If you recognize a real-time recording, you do not need to specify the transmission speed by using the sleep method.
        // If you recognize an audio file, you must specify the transmission speed. Ensure that the data size sent per unit interval approaches to the data size of a unit interval in the audio file.
        sleepMs = getSendAudioSleepTime(nlen, SAMPLE_RATE, 1); // Obtain the sleep duration based on the size of sent data, audio sampling rate, and data compression rate.

        // 5. Set the latency for audio data sending.
        usleep(sleepMs * 1000);
    }

    // Close the audio file.
    fs.close();

    // 6: Notify the server that the audio data is sent.
    // Call the stop method in asynchronous callback mode. If the method fails, a TaskFailed event is returned.
    request->stop();
    // 7. Release the SpeechRecognizerRequest object after the recognition is completed.
    NlsClient::getInstance()->releaseTranscriberRequest(request);
    return NULL;
}

// Recognize a single audio file.
int speechTranscriberFile(const char* appkey) {
    // Obtain the timestamp of the current system time to check whether the token expires.
    std::time_t curTime = std::time(0);
    if (g_expireTime - curTime < 10) {
                printf("the token will be expired, please generate new token by AccessKey-ID and AccessKey-Secret.\n");
        if (-1 == generateToken(g_akId, g_akSecret, &g_token, &g_expireTime)) {
            return -1;
        }
    }

    ParamStruct pa;
    pa.token = g_token;
    pa.appkey = appkey;
    pa.fileName = "test0.wav";

    pthread_t pthreadId;
    // Start a worker thread to perform speech recognition.
    pthread_create(&pthreadId, NULL, &pthreadFunc, (void *)&pa);
    pthread_join(pthreadId, NULL);
        return 0;
}

// Recognize multiple audio files.
// If the SDK uses multiple concurrent threads at a time, the SDK recognizes each audio file in a thread. The SDK does not recognize the same audio file in different threads.
// In the sample code, two threads are used to recognize two audio files.
// If you are a free-trial user, you can make only a maximum of two concurrent calls.
#define AUDIO_FILE_NUMS 2
#define AUDIO_FILE_NAME_LENGTH 32
int speechTranscriberMultFile(const char* appkey) {
    // Obtain the timestamp of the current system time to check whether the token expires.
    std::time_t curTime = std::time(0);
    if (g_expireTime - curTime < 10) {
                printf("the token will be expired, please generate new token by AccessKey-ID and AccessKey-Secret.\n");
        if (-1 == generateToken(g_akId, g_akSecret, &g_token, &g_expireTime)) {
            return -1;
        }
    }

    char audioFileNames[AUDIO_FILE_NUMS][AUDIO_FILE_NAME_LENGTH] = {"test0.wav", "test1.wav"};
    ParamStruct pa[AUDIO_FILE_NUMS];
    for (int i = 0; i < AUDIO_FILE_NUMS; i ++) {
        pa[i].token = g_token;
        pa[i].appkey = appkey;
        pa[i].fileName = audioFileNames[i];
    }

    std::vector<pthread_t> pthreadId(AUDIO_FILE_NUMS);
    // Start two worker threads and recognize two audio files at a time.
    for (int j = 0; j < AUDIO_FILE_NUMS; j++) {
        pthread_create(&pthreadId[j], NULL, &pthreadFunc, (void *)&(pa[j]));
    }
    for (int j = 0; j < AUDIO_FILE_NUMS; j++) {
        pthread_join(pthreadId[j], NULL);
    }
        return 0;
}

int main(int arc, char* argv[]) {
    if (arc < 4) {
                printf("params is not valid. Usage: ./demo <your appkey> <your AccessKey ID> <your AccessKey Secret>\n");
        return -1;
    }

    std::string appkey = argv[1];
    g_akId = argv[2];
    g_akSecret = argv[3];

    // Configure output logs of the SDK. The configuration is optional. As configured in the following code, the SDK logs are generated in the log-Transcriber.txt file. LogDebug specifies that logs at all levels are generated.
    int ret = NlsClient::getInstance()->setLogConfig("log-transcriber", LogDebug);
    if (-1 == ret) {
                printf("set log failed\n");
        return -1;
    }

    // Start the worker thread.
    NlsClient::getInstance()->startWorkThread(4);

    // Recognize a single audio file.
    speechTranscriberFile(appkey.c_str());

    // Recognize multiple audio files.
    // speechTranscriberMultFile(appkey.c_str());

    // All the tasks are completed. Release the NlsClient object before the process exits. Note that the releaseInstance method is not thread-safe.
    NlsClient::releaseInstance();
    return 0;
}