This topic provides answers to some commonly asked questions about the usage of Intelligent Speech Interaction services.
What is an appkey?
An appkey is used to identify a business scenario that is specified in a project. You can use Intelligent Speech Interaction services in multiple business scenarios, such as customer service and judicial scenarios. Each scenario requires different service capabilities. To obtain optimal recognition results for a project, you must set an appropriate scenario in the project that is associated with the appkey.
What are the differences between the stream and non-stream modes of speech recognition?
The non-stream mode is also known as the common mode. In common mode, the server does not return the final recognition result until it determines that the speaker has finished a complete sentence. In stream mode, the server returns multiple intermediate results while the speaker is speaking before it returns the final recognition result at the end of the sentence.
What audio coding formats do Intelligent Speech Interaction services support?
Each Intelligent Speech Interaction service supports different audio coding formats. For more information, see the API reference for each Intelligent Speech Interaction service. You can use audio editing software such as Audacity to view the coding format of audio files.
What audio sample rates do Intelligent Speech Interaction services support?
Intelligent Speech Interaction services support only audio sample rates of 16 kHz and 8 kHz. If your speech data is sampled at a different sample rate such as 48 kHz, we recommend that you resample your speech data at 16 kHz before you call an Intelligent Speech Interaction service. You must select the appkey of a project that defines the same audio sample rate as your audio file.
How do I view the sample rate of an audio file?
You can use audio editing software such as Audacity or the open source command line tool FFmpeg to view the audio sample rate of audio files.
Can I use Intelligent Speech Interaction offline?
No, you cannot use Intelligent Speech Interaction services offline. Local offline speech recognition is not supported. You must send speech data to the server for speech recognition.
What is the endpoint of Intelligent Speech Interaction?
The endpoint of Intelligent Speech Interaction is wss://nls-gateway.ap-southeast-1.aliyuncs.com/ws/v1.
Do Intelligent Speech Interaction services block sensitive words in recognition results?
No, Intelligent Speech Interaction services do not provide this feature. You can process the recognition results to suit your business needs after they are obtained.
Do Intelligent Speech Interaction services support English speech recognition?
Yes, Intelligent Speech Interaction services support English speech recognition. Select the English language recognition model when you configure a project in the Intelligent Speech Interaction console. This model requires an audio sample rate of 16 kHz.
Various English accents such as British, American, and Chinese accents can be recognized.
Do Intelligent Speech Interaction services support dialect recognition?
Yes, Intelligent Speech Interaction services support dialect recognition. Set the dialect model in the console. For more information, see Manage projects.
Can Intelligent Speech Interaction services automatically detect sentence breaks?
Yes, the real-time speech recognition service can add breaks between sentences within a request. However, the short sentence recognition service can process only one sentence in each request and cannot add sentence breaks.
What are the limits of Intelligent Speech Interaction Trial Edition for each user?
The short sentence recognition or real-time speech recognition service supports a maximum of two concurrent call requests for speech recognition.
The recording file recognition service can recognize recording files that are up to 2 hours in length for each calendar day.
What are the speech length limits in a request for Intelligent Speech Interaction services?
The short sentence recognition service supports real-time speech that is less than 60 seconds in length.
The real-time speech recognition service does not limit the length of speech in a request.
Does the existing access token become invalid if I obtain another access token?
No, existing access tokens are not affected by newly obtained access tokens. The validity of an access token depends on the timestamp that indicates the validity period of the access token. This validity period is not affected by other access tokens.
Can I configure an IP whitelist to access Intelligent Speech Interaction services?
No, you cannot configure an IP whitelist to access Intelligent Speech Interaction. Intelligent Speech Interaction does not support the IP whitelist feature because the speech recognition server has a wide range of IP addresses.
How can I resolve slow recognition and timeout issues of the real-time speech recognition service?
To troubleshoot these issues, perform the following steps:
Run the demo that is provided by Alibaba Cloud and compare the running status of the demo with that of your service. Record the comparative results and provide your log information.
Record the task ID of the request to facilitate troubleshooting.
Use a packet capture tool such as tcpdump for Linux or Wireshark for Windows on your client to check the network condition.