This topic describes the Real-time Conversational AI solution that provides UI components.
Overview
This solution is based on AICallKit SDK and provides UI components for audio and video applications. You can flexibly reuse functional modules of AUI Kits based on your business requirements to quickly bring real-time and interactive AI to your app. This solution is designed for enterprises and developers who want to build Real-time Conversational AI scenarios in an efficient and quick manner. The functional modules of AUI Kits significantly reduce the development time and costs and ensure app quality and stability. For more information about how to integrate AUI Kits for Real-time Conversational AI, see the following topics:
For more information about server-side development, see Server-side integration and API references.
Features
Feature | Description |
Real-time workflow | You can orchestrate a workflow in the console. A workflow may contain the following nodes:
|
Custom agent profile | Upload an image for the AI agent. The image is displayed during voice calls. |
Emotion recognition | Recognize users' emotions and generate empathetic responses. |
Welcome message | Configure the welcome message in the IMS console. When the user starts a conversation, the agent broadcasts the welcome message first. |
Proactive messages | Configure the business server to allow the agent to proactively push audio and video content to the user by using OpenAPI. |
Live subtitles | The conversation content can be presented in real time on the user interface. |
Intelligent noise reduction | Automatically filter the noise from the user side during a conversation. If multiple users are speaking at the same time, the voice with the highest volume is preferentially collected. |
Intelligent interruption | Recognize the conversation interruption intention of users. |
Intelligent sentence segmentation | Automatically identify and segment long or complex sentences to improve text readability and user experience. |
Audio sentence callback | You can configure this callback in the console to store audio data in Object Storage Service (OSS). |
Push-to-talk mode | The user can set the call mode to the push-to-talk mode at the beginning of or during a call, and interact with the agent by pressing a button. |
ASR hotwords | You can define business-related hotwords to improve the speech recognition accuracy of intelligent agents |
Voiceprint-based noise suppression | In a multi-speaker scenario, the agent can identify the voiceprint characteristics of the main speaker to accurately capture their speech and minimize interference from background noise. |
Human takeover | When the agent encounters situations beyond its capabilities or requires critical decision-making, human agents can take over the conversations with users. |
Graceful shutdown | When the business server stops the agent, the agent can complete the current sentence. This prevents abrupt interruptions of conversations. |
Data archiving | The conversations between the agent and users are converted into text for storage. You can call API operations to consume the data. In addition, you can store audio and video data of calls OSS or ApsaraVideo VOD. |