Last Updated: Sep 30, 2019

1. Why does the Intelligent Speech Interaction server return incorrect recognition results and sometimes return only a few words for a sentence?

Check whether the audio sampling rate of your speech data is consistent with that of the model that you configure in the selected project in the Intelligent Speech Interaction console. Also, check whether mono audio is recorded. Only the recording file recognition service can recognize binaural audio.

2. What can I do if the server still returns incorrect recognition results when I use the appropriate call method and audio sampling rate?

If you have checked the audio sampling rate and recording format of your speech data and confirm that the speech data is appropriate, you can use the hotword feature to customize common words. You can customize up to 128 hotwords to improve recognition accuracy in real time. You can also use the custom model training feature to customize and train models. These custom models can help you correctly recognize lots of text.

3. Do I need to send speech data continuously?

Yes. You must send speech data continuously. If the server fails to receive speech data after a certain period of time (10 seconds for short sentence recognition and 20 seconds for real-time speech recognition), the server cuts off the connection with the client due to a timeout and returns the error code 40000004. In this case, the client needs to initiate a new request to send data again.

4. Why do I still receive data from the server after I stop sending speech data?

If you stop sending speech data, the client is disconnected from the server due to a timeout. The server continues to process the received data and returns the recognition result of the previously received data. However, the returned recognition result is incorrect for the whole sentence in this case.

5. What does endtime =-1 mean in the recognition result returned in JSON format?

It indicates that the current sentence has not ended. The server returns intermediate results only in stream mode.

6. What error message is returned if I use the C++ SDK to call the speech synthesis service but the uploaded text is not encoded in UTF-8?

If the uploaded text is not encoded in UTF-8 and contains Chinese characters, the C++ SDK may fail to call the start function and returns the error message “Socket recv failed, errorCode: 0.” The error code 0 indicates that the server has cut off the connection with the client. If you receive this error message, you need to check whether the uploaded text is encoded in UTF-8.

7. What HTTP status codes are returned by the server?

  • HTTP status code 200 indicates that the request is successful.
  • HTTP status code 4XX indicates a client error.
  • HTTP status code 5XX indicates a server error.

For more information, see the API reference of each Intelligent Speech Interaction service.

8. What can I do if the server returns two identical results after I send a request for the recording file recognition service?

If your recording file contains binaural audio that is recorded from the same speech content in two sound channels, the server returns two identical results for your request. This case is normal.