Intelligent Speech Interaction

Intelligent Speech Interaction is developed based on state-of-the-art technologies such as speech recognition, speech synthesis, and natural language understanding. Enterprises can integrate Intelligent Speech Interaction into their products to enable them to listen, understand, and converse with users, providing users with an immersive human-computer interaction experience. Intelligent Speech Interaction is currently available in Chinese and English, and please stay tuned for other languages.

Intelligent Speech Interaction is suitable for various scenarios, including intelligent Q&A, intelligent quality inspection, real-time subtitling for speeches, and transcription of audio recordings. Intelligent Speech Interaction has been successfully applied in many industries such as finance, insurance, e-commerce and smart home. Intelligent Speech Interaction allows you to use the self-learning platform to improve speech recognition accuracy, and provides a comprehensive management console and easy-to-use SDKs. You are welcome to activate Intelligent Speech Interaction.

Benefits

High Recognition Accuracy
Alibaba Cloud is the first cloud service provider in China to use word-level LC-BLSTM and DFSMN-CTC models. Compared with the traditional CTC method in the industry, these models reduce the error rate by 20%, greatly improving the accuracy of speech recognition.
Ultra-high Decoding Speed
Alibaba Cloud is the first cloud service provider in China to use the low frame rate (LFR) decoding technology. This technology increases the decoding speed by more than three times without compromising recognition accuracy, greatly shortening response time and improving user experience.
Novel Self-learning Platform
Intelligent Speech Interaction is the first system in the industry that provides a self-learning platform. It allows you to specify hotwords, and upload business-related data to build specific models for better recognition accuracy.
Extensive Industry Coverage
Currently, Intelligent Speech Interaction has customers in a wide variety of industries, such as finance, insurance, e-commerce and smart home. It is ideal for various scenarios, including intelligent Q&A, intelligent quality inspection, real-time subtitling for speeches, and voice assistants.

Products and Services

Recording File Recognition

Converts audio from files uploaded by users into text within 24 hours. Applicable to scenarios that are not time-sensitive, such as call center quality assurance, transcription of court trials from recordings, summarization of meeting minutes, and medical record filing.

Short Sentence Recognition

Converts short audio (< 1 min.) to text. Applicable to real-time scenarios, such as voice search, voice command control, and voice short message. Short Sentence Recognition can be integrated into various mobile applications, smart home appliances, and smart assistants.

Self-learning Platform

Allows you to upload business-related data to improve the recognition accuracy in specific user case. Currently, you can upload only text to customize language models. In the future, Self-learning Platform will allow you to upload audio data to customize acoustic models.

Real-time Speech Recognition

Converts audio streams into text in real time. Intelligent segmentation is used to identify when sentences start and end. Applicable to scenarios with high requirements for real-time response, such as real-time transcription for live videos, meetings and court trials.

Speech Synthesis

Converts text to natural speech. Speech Synthesis provides a variety of voices and allows you to adjust the speed, intonation, and volume. It is ideal for scenarios such as intelligent customer service, speech interaction, audio book, and broadcasting.

Scenarios

  • Real-time Subtitling and Monitoring
  • Service Call Monitoring
Intelligent Quality Inspection

Intelligent Quality Inspection

Intelligent Quality Inspection

Traditional quality inspection generally involves listening to customer service call recordings, which is inefficient and labor-intensive. Intelligent quality inspection performs real-time inspection on all service processes, helping enterprises to relief from labor constraints and gain full control over service quality.

Procedure and Benefits

  • Procedure

    After converting the voice recordings to text, quality inspection engine generates quality inspection results and statistics. Quality inspectors can verify the reported violation through the management console.

  • Benefits

    1. Full automation - The quality of all customer service calls can be automatically inspected.
    2. Real-time processing - Quality inspection can be completed immediately after a phone call ends and the results can be displayed in real time.
    3. Flexibility in rule configuration - Rules can be flexibly configured in various complex business scenarios.

Real-time Subtitling and Monitoring

Real-time Subtitling and Monitoring

Real-time Subtitling and Monitoring

Converts audio into subtitles in real time for live speeches and videos. In live video scenarios, Intelligent Speech Interaction can also monitor video content.

Business Pain Points and Benefits

  • Business Pain Points

    1. When you attend a conference or watch a live stream, you may not be able to hear the speech clearly due to far distance or background noise.
    2. Huge amount of videos need subtitles and monitoring: A live streaming application generates over 100,000 hours of videos every day. Live streaming of formal events requires subtitles and live streaming of entertainment requires monitoring.

  • Benefits

    1. High accuracy: Transcribes speeches delivered at the Apsara Conference and beats the runner-up of the international stenography competition in terms of accuracy. Intelligent Speech Interaction has become a standard product of the Apsara Conference.
    2. Low latency: Provides real-time transcription of live streaming with low latency.

Service Call Monitoring

Service Call Monitoring

Service Call Monitoring

In traditional intermediary businesses, agents tend to be abandoned once customers establish contact with each other. For example, a landlord convinces tenants to make direct payments, resulting in financial loss to the agency. Such behavior can often be discovered in the phone calls between two parties. The Alibaba Cloud speech recognition service can help agents promptly discover the preceding issue, thus avoid financial loss.

Procedure and Benefits

  • Procedure

    When the service call monitoring system receives the phone call recording from customers, it processes and returns the results in real time. Customers have the option to use quality inspection system or their own systems to analyze the returned text and identify problems in a timely manner.

  • Benefits

    1. Requires no manual intervention, saving labor costs.
    2. Provides excellent real-time performance to identify problems in a timely manner.

Get Started with Intelligent Speech Interaction

More Information About Intelligent Speech Interaction

Contact Us > Console >