All Products
Search
Document Center

Intelligent Media Services:Developer guide

Last Updated:Mar 03, 2025

This topic offers a range of integration solutions for AI real-time interactive technology, providing a comprehensive understanding of the benefits and suitable scenarios for each solution.

Background overview

Amidst the rise of AIGC, large language models (LLM) are gaining prominence as AI agents. These models, particularly those that can integrate with internal knowledge bases, cater to a broad spectrum of industry requirements, including intelligent customer service and personal assistants within intelligent interaction scenarios. Moreover, AI agents that rely solely on real-time text communication are falling short of the need for efficient communication, paving the way for AI agents based on real-time audio and video communication (RTC), which offer a more comprehensive and intuitive interactive experience.

Integration solutions

For scenarios involving audio and video calls and message dialogue, Alibaba Cloud offers two AICallKit SDK implementation solutions: one with UI and one without.

  • With UI integration: Alibaba Cloud provides audio and video application UI widgets. By running the demo with simple configuration and integrating the entire UI widget into your project, you can swiftly enable AI real-time interactive capabilities.

  • Without UI integration: You have the freedom to customize the UI interface. Utilizing AICallKit SDK, you can bypass the underlying complexities of AI real-time interaction and quickly implement AI real-time interactive capabilities.

Note

When integrating with AICallKit SDK, you can also utilize the relevant interfaces of ARTC SDK. AICallKit SDK is a reliable, scenario-based interface that encapsulates aspects of the RTC SDK, ensuring ease of use while maintaining flexibility.

Server-side features

  • AI Agent Call Records: Alibaba Cloud's integrated automatic speech recognition technology can transcribe call content automatically, aiding in the review of call records, model training, and other processes.

  • AI Agent Callback: The AI agent callback feature enables your application to automatically initiate predefined operations or responses when specific events occur.

  • Speech Recognition Hotwords: To enhance recognition accuracy for specific vocabulary, you can use the hotword feature.

  • Digital Human Integration: Integrating digital humans allows you to transform speech input within the workflow into digital human interactions, offering a more engaging and lifelike interactive experience.