You may use Intelligent Speech Interaction in multiple business scenarios, such as customer service and court scenarios. Required service capabilities may vary with the scenario. An AppKey is used to uniquely identify a business scenario in a project. You can obtain optimal results only when the configuration of a project matches the corresponding business scenario.
The non-stream mode is the common mode. In this mode, the server returns only the final result after determining that you finish a whole sentence. However, in stream mode, the server returns many intermediate results when you are speaking before it returns the final result of the sentence.
Each Intelligent Speech Interaction service supports different audio coding formats. For more information, see the API reference of each service. You can use common audio editing software such as Audacity to view the audio coding format of audio files.
Currently, Intelligent Speech Interaction supports only the audio sampling rates of 16 kHz and 8 kHz. If your speech data is sampled at other sampling rates such as 48 kHz, we recommend that you resample your speech data at 16 kHz before calling an Intelligent Speech Interaction service. Note that you must select the AppKey of a project that matches the audio sampling rate of your audio file.
You can use common audio editing software such as Audacity or the open-source command line tool FFmpeg to view the audio sampling rate of an audio file.
Currently, Intelligent Speech Interaction is available only online. You must send speech data to the server for completing a recognition task.
The endpoint of Intelligent Speech Interaction is wss://nls-gateway.cn-shanghai.aliyuncs.com/ws/v1.
Currently, Intelligent Speech Interaction does not provide this feature. You can process recognition results as needed.
Yes. You can select the English language recognition model when you configure a project in the Intelligent Speech Interaction console. This model supports the audio sampling rate at 16 kHz. You can use this model to recognize only English spoken by Europeans and Americans, but not English with a Chinese accent.
Currently, you can select a model that can recognize dialects when you configure a project in the Intelligent Speech Interaction console. The 8 kHz telephone customer service and quality inspection model for Chinese dialects can recognize six dialects in Sichuan, Northeast China, Henan, Hunan, Shandong, and Hubei. The 8 kHz telephone customer service and quality inspection model for Cantonese can recognize Cantonese.
The real-time speech recognition service can break multiple sentences in a request. Each request of the short sentence recognition service can process only one sentence.
You can use the short sentence recognition or real-time speech recognition service to send a maximum of two concurrent requests for speech recognition. You can use the recording file recognition service to send requests per calendar day for recording files with a maximum duration of 2 hours.
The short sentence recognition service supports real-time speech that lasts within 60 seconds. The real-time speech recognition service does not limit the duration of speech in a request.
No. The validity of a token depends only on the timestamp that indicates the validity period of the token. This validity period is not affected by another token that you obtain.
15. Can I obtain a whitelist of IP addresses that I can use to access Intelligent Speech Interaction?
An IP address whitelist for Intelligent Speech Interaction is unavailable because the Intelligent Speech Interaction server has a wide range of IP addresses. You can use the endpoint wss://nls-gateway.cn-shanghai.aliyuncs.com/ws/v1 to access Intelligent Speech Interaction.
16. How can I resolve the slow recognition and timeout issues when I use the real-time speech recognition service?
You can use any of the following troubleshooting methods: 1. Run the demo provided by Alibaba Cloud and compare the results in demo logs with those in your service logs to check whether the demo can run properly. Record the comparison results and provide your log information. 2. Record the task ID of the request for which the server returns an error response to facilitate troubleshooting. 3. Capture packets on the client to check the network condition.