Intelligent Speech Interaction - Alibaba Cloud Product Commercial Release: Intelligent Speech Interaction
Target customers: customers with requirements of Intelligent Speech Interaction (speech-to-text and text-to-speech) Features released: Intelligent Speech Interaction is developed based on state-of-the-art technologies such as speech recognition, speech synthesis, and natural language understanding. Enterprises can integrate Intelligent Speech Interaction into their products to enable them to listen, understand, and converse with users, providing users with an immersive human-computer interaction experience. Intelligent Speech Interaction is currently available in Chinese, Cantonese, English, Japanese, Spanish and Russian, and please stay tuned for other languages. Short sentence recognition: Converts short audio (< 1 min.) to text. Applicable to real-time scenarios, such as voice search, voice command control, and voice short message. Short Sentence Recognition can be integrated into various mobile applications, smart home appliances, and smart assistants. Real-time speech recognition: Converts audio streams into text in real time. Intelligent segmentation is used to identify when sentences start and end. Applicable to scenarios with high requirements for real-time response, such as real-time transcription for live videos, meetings and court trials. Recording file recognition: Converts audio from files uploaded by users into text within 24 hours. Applicable to scenarios that are not time-sensitive, such as call center quality assurance, transcription of court trials from recordings, summarization of meeting minutes, and medical record filing. Speech synthesis: Converts text to natural speech. Speech Synthesis provides a variety of voices and allows you to adjust the speed, intonation, and volume. It is ideal for scenarios such as intelligent customer service, speech interaction, audio book, and broadcasting. Self-learning platform: Allows you to upload business-related data to improve the recognition accuracy in specific user case. Currently, you can upload only text to customize language models. In the future, Self-learning Platform will allow you to upload audio data to customize acoustic models.