All Products
Search
Document Center

Intelligent Speech Interaction:SDK calls

Last Updated:Oct 25, 2022

This topic provides answers to some commonly asked questions about the calling of Intelligent Speech Interaction SDKs and API operations.

What can I do if my Intelligent Speech Interaction service returns recognition results with low accuracy and sometimes returns only a few words for a sentence?

Check whether the audio sample rate of your speech data is the same as the model you specify for your project in the Intelligent Speech Interaction console. You can also check whether the audio is recorded in mono mode.

Note

Intelligent Speech Interaction services except the recording file recognition service cannot recognize binaural audio.

What can I do if my Intelligent Speech Interaction service still returns inaccurate recognition results when I use the appropriate call method and audio sample rate?

You can use the following two methods to improve recognition accuracy:

  • Use the custom hotword feature to improve recognition accuracy in real time. For more information, see Introduction.

  • Enable the custom model training feature to customize and train models. These custom models can improve the recognition rate for large amounts of text data. For more information, see Introduction.

Do I need to send speech data in a continuous manner?

Yes, you must send speech data in a continuous manner.

If the server does not receive speech data for 10 seconds, the connection times out. The server then disconnects from the client and returns the error code 40000004. The client must initiate a new request to send data again.

Why do I still receive data from the server after I stop sending speech data?

If you stop sending speech data, the client is disconnected from the server due to a timeout. However, the server continues to process the data that is received before the disconnection and returns the recognition result. However, the recognition result for the whole sentence is incorrect in this case.

What does "endtime =-1" mean in the recognition result that is returned in the JSON format?

This indicates that the current sentence has not ended. The server returns intermediate results only when you use the speech recognition service in stream mode.

What error message is returned if I use the speech synthesis SDK for C++ but the uploaded text is not encoded in UTF-8?

If the uploaded text is not encoded in UTF-8 and contains Chinese characters, the speech synthesis SDK for C++ fails to call the start function and returns the following error message: Socket recv failed, errorCode: 0. The error code 0 indicates that the server has disconnected from the client. If you receive this error message, check whether the uploaded text is encoded in UTF-8.

What are the status codes that can be returned by the server?

  • HTTP status code 200 indicates that the request is successful.

  • HTTP status code 4XX indicates a client error.

  • HTTP status code 5XX indicates a server error.

For more information about status codes, see the API reference for each Intelligent Speech Interaction service.

What can I do if the server returns two identical results after I send a recording file recognition request?

If the audio file that you submitted is recorded in binaural mode and the speech content of both sound channels is the same, the server returns two identical results for your request. This case is normal.