All Products
Search
Document Center

Service use

Last Updated: Sep 30, 2019

1. What is an AppKey?

You may use Intelligent Speech Interaction in multiple business scenarios, such as customer service and court scenarios. Required service capabilities may vary with the scenario. An AppKey is used to uniquely identify a business scenario in a project. You can obtain optimal results only when the configuration of a project matches the corresponding business scenario.

2 What is the difference between the stream mode and non-stream mode of speech recognition?

The non-stream mode is the common mode. In this mode, the server returns only the final result after determining that you finish a whole sentence. However, in stream mode, the server returns many intermediate results when you are speaking before it returns the final result of the sentence.

3. What audio coding formats does Intelligent Speech Interaction support?

Each Intelligent Speech Interaction service supports different audio coding formats. For more information, see the API reference of each service. You can use common audio editing software such as Audacity to view the audio coding format of audio files.

4. What audio sampling rates does Intelligent Speech Interaction support?

Currently, Intelligent Speech Interaction supports only the audio sampling rates of 16 kHz and 8 kHz. If your speech data is sampled at other sampling rates such as 48 kHz, we recommend that you resample your speech data at 16 kHz before calling an Intelligent Speech Interaction service. Note that you must select the AppKey of a project that matches the audio sampling rate of your audio file.

5. How can I view the audio sampling rate of an audio file?

You can use common audio editing software such as Audacity or the open-source command line tool FFmpeg to view the audio sampling rate of an audio file.

6. Can I use Intelligent Speech Interaction offline?

Currently, Intelligent Speech Interaction is available only online. You must send speech data to the server for completing a recognition task.

7. What is the endpoint of Intelligent Speech Interaction?

The endpoint of Intelligent Speech Interaction is wss://nls-gateway.cn-shanghai.aliyuncs.com/ws/v1.

8. Does Intelligent Speech Interaction block sensitive words for recognition results?

Currently, Intelligent Speech Interaction does not provide this feature. You can process recognition results as needed.

9. Does Intelligent Speech Interaction recognize English?

Yes. You can select the English language recognition model when you configure a project in the Intelligent Speech Interaction console. This model supports the audio sampling rate at 16 kHz. You can use this model to recognize only English spoken by Europeans and Americans, but not English with a Chinese accent.

10. Does Intelligent Speech Interaction recognize dialects?

Currently, you can select a model that can recognize dialects when you configure a project in the Intelligent Speech Interaction console. The 8 kHz telephone customer service and quality inspection model for Chinese dialects can recognize six dialects in Sichuan, Northeast China, Henan, Hunan, Shandong, and Hubei. The 8 kHz telephone customer service and quality inspection model for Cantonese can recognize Cantonese.

11. Can Intelligent Speech Interaction automatically break multiple sentences?

The real-time speech recognition service can break multiple sentences in a request. Each request of the short sentence recognition service can process only one sentence.

12. What are the limits in the trial edition of Intelligent Speech Interaction?

You can use the short sentence recognition or real-time speech recognition service to send a maximum of two concurrent requests for speech recognition. You can use the recording file recognition service to send requests per calendar day for recording files with a maximum duration of 2 hours.

13. What is the limit on the duration of speech in a request for Intelligent Speech Interaction?

The short sentence recognition service supports real-time speech that lasts within 60 seconds. The real-time speech recognition service does not limit the duration of speech in a request.

14. Does the existing token become invalid if I obtain another token?

No. The validity of a token depends only on the timestamp that indicates the validity period of the token. This validity period is not affected by another token that you obtain.

15. Can I obtain a whitelist of IP addresses that I can use to access Intelligent Speech Interaction?

An IP address whitelist for Intelligent Speech Interaction is unavailable because the Intelligent Speech Interaction server has a wide range of IP addresses. You can use the endpoint wss://nls-gateway.cn-shanghai.aliyuncs.com/ws/v1 to access Intelligent Speech Interaction.

16. How can I resolve the slow recognition and timeout issues when I use the real-time speech recognition service?

You can use any of the following troubleshooting methods: 1. Run the demo provided by Alibaba Cloud and compare the results in demo logs with those in your service logs to check whether the demo can run properly. Record the comparison results and provide your log information. 2. Record the task ID of the request for which the server returns an error response to facilitate troubleshooting. 3. Capture packets on the client to check the network condition.