Traditional live streaming focuses on one-way content delivery, resulting in low audience engagement and conversion rates. ApsaraVideo Real-time Communication (ARTC) transforms passive viewers into active participants through interactive features, such as voice chat, video co-streaming, and game interaction.
Architecture
Based on advanced technical architecture and algorithms, ARTC provides developers and enterprises with efficient, stable, and easy-to-use SDKs and APIs. It supports seamless integration across platforms, including iOS, Android, Web, and Windows. Additionally, you can combine ARTC with other Alibaba Cloud services to build solutions for a wider range of use cases.
Use cases
Voice chat
It supports up to 50 simultaneous speakers with an end-to-end latency between 150 and 400 ms. Various audio features are available, including voice changer, reverberation, and voice enhancement. For content compliance, ARTC provides content moderation services and supports third-party integration. For scenario-based solutions, see Voice chat room, Online karaoke, One-on-one audio and video calls.
Co-streaming
ARTC supports collaborative live streaming, which means viewers can chat with streamers alongside live content, and streamers from different rooms can engage in popularity-based battles. The end-to-end latency is between 150 and 400 ms, allowing viewers to seamlessly join and leave the live stream. Additionally, it supports both standard streaming and Real-Time Streaming (RTS), enabling concurrent viewership of over 100,000 viewers. For more information, see Multi-host streaming.
Co-streaming
| Streamer battle
|
Real-time Conversational AI
Real-time Conversational AI helps enterprises quickly build applications for audio and video interactions between AI agents and end users. You can build a dedicated agent within 10 minutes by following the instructions on the GUI. The agent can interact with end users in real time over the Global Realtime Transport Network (GRTN). For more information, see Overview.
Voice call
| Avatar call
| Vision call
|
Features
Feature | Description | Scenarios | Billing |
Video call | Supports one-on-one or group video calls with high-definition quality from 480P to 1080P. | Personal calls, conferences, video customer service | |
Voice call | Supports one-on-one or group audio calls. | Personal calls, group chat, voice chat | |
Video interaction | Supports multi-person video interaction with resolutions from 480P to 1080P, and end-to-end latency less than 300ms. | RTS, cross-channel streamer battle | |
Voice interaction | Supports high-fidelity voice interaction at a 48 kHz sampling rate. | Voice chat room, online karaoke, multi-host streaming | |
Recording | Records audio and video streams and stores them in Object Storage Service (OSS) or ApsaraVideo VOD. | Archiving, compliance review | |
Transcoding | Transcodes streams to ensure audio and video content can be smoothly transmitted and played across various platforms without compromising quality. | Recording format conversion | |
Stream mixing and relay | Mixes multiple streams into a single one based on specific rules. The mixed stream can then be relayed to ApsaraVideo Live or a third party platform. | Multi-view live streaming, large-scale multi-party conferences, multi-teacher collaborative teaching | |
Audio moderation | Reviews audio content by accessing the audio moderation capability provided by Alibaba Cloud or a third party. | Business security checks, content compliance | |
Video moderation | Reviews video content by accessing the video moderation capability provided by Alibaba Cloud or a third party. | Business security checks, content compliance | |
Face retouching | Provides multiple retouching effects. | Video calls, interactive streaming, online classes | |
Reverberation | Supports various reverberation effects such as hallway, church, studio, basement, and concert hall. | Voice calls, video calls, voice chat rooms, online karaoke. | Free |
Voice Changer | Supports various effects such as electric sound, old man voice, husky male voice, and lively female voice. | Online karaoke rooms, voice chat rooms | |
Smart noise reduction | Eliminates ambient noise, suppresses sudden loud noises, and cancels feedback from multiple devices while preserving high-fidelity voice quality. | Voice calls, multi-person conferences | |
Low-latency in-ear monitoring | During audio capture, processing, and playback, a user's voice is fed back to them through headphones (or other audio output devices) with minimal delay. | Interactive streaming, online karaoke, recording room | |
Audio 3A processing | Supports Acoustic Echo Cancellation (AEC), Automatic Noise Suppression (ANS), and Automatic Gain Control (AGC). | Voice-related scenarios | |
Screen sharing | Shares desktop, window, or specific screen areas with other users, and supports simultaneous display with camera feed. | Online classes, remote assistance | |
Spatial audio | Simulates sound propagation in three-dimensional space through advanced audio technology, creating an immersive audio experience with a sense of direction and position. | Online karaoke rooms, voice chat rooms | |
Custom audio/ video input | Supports input of external audio and video stream data. | Custom beauty effects, custom sound effects |
Benefits
High-quality service worldwide
ApsaraVideo Live boasts an extensive global presence, with:
9 live centers: China (Beijing), China (Shenzhen), China (Shanghai), China (Qingdao), Singapore, Germany (Frankfurt), Japan (Tokyo), Indonesia (Jakarta), and Saudi Arabia (Riyadh) regions
3 stream relay hubs: China (Shanghai), Singapore, and Saudi Arabia (Riyadh) regions
Over 3200 nodes worldwide
This ensures reliable and high-availability services around the globe.
Security compliance
ARTC maintains full compliance with global regulations regarding calling and adheres to stringent privacy protection standards.
Diverse product combinations
ARTC provides a one-stop solution that leverages diverse Alibaba Cloud products and services, including ECS, OSS, security services, live streaming, video-on-demand, avatars, and AI.
Easy to use
Scenario-based API integration: Encapsulates underlying API operations based on business scenarios to simplify development. For more information, see Client-side API.
Multi-scenario practices: Covers various scenarios, such as one-on-one calls, co-streaming, voice chat rooms, and online karaoke. For more information, see Scenario-specific solutions.
Limitations
User capacity per channel:
Interactive mode: By default, a channel supports a maximum of 17 streamers (on-stage) and 1,000 viewers (off-stage).
NoteTo support an unlimited number of viewers in the interactive mode, relay the streams to ApsaraVideo Live.
Communication mode: By default, a channel supports a maximum of 50 users.
Each user can publish only one main stream (audio-video, audio-only, or video-only) and one screen-sharing stream simultaneously.
Concepts
The following table lists concepts related to ARTC.
Concept | Description |
SDKAppID | To manage customer services, ARTC uses SDKAppID as a unique identifier for applications. You need to create an independent SDKAppID for each of your service to isolate their configurations and data. |
ChannelID | A channel, identified by a ChannelID, is an audio and video space defined by ARTC. Users in the same channel can interact with each other. In certain scenarios, ARTC also allows audio and video interaction between users across different channels. |
UserID | In ARTC, a UserID uniquely identifies a user in an application. |
Token | A token is a security signature designed by Alibaba Cloud to prevent malicious parties from accessing your cloud service resources. You need to provide information including the SDKAppID, UserID, ChannelID, timestamp, and token in the login function of the corresponding SDK. |
Stream | A stream is a continuous flow of audio and video data that has been compressed and encoded for transmission over a network and can be played instantly. |
Publish | Publish refers to the operation of uploading local audio and video data to Alibaba Cloud servers. This operation is equivalent to stream ingest. |
Subscribe | Subscription refers to the operation of pulling audio and video data from Alibaba Cloud servers to local devices. This operation is equivalent to stream pulling. |
Role | In ARTC, there are two kinds of roles: streamer and viewer. A streamer can publish or subscribe to audio and video streams. A viewer can only subscribe to audio and video streams. Users can switch between the roles during a session. |
Stream mixing and relay | The feature allows you to mix multiple audio and video streams, configure layout and encoding parameters, and then relay the processed streams to ApsaraVideo Live or a third-party live streaming platform. After relaying the stream to ApsaraVideo Live, you can use its features for transcoding, recording, and live viewing. |
Supplemental Enhancement Information (SEI) | SEI is a mechanism within video encoding standards like H.264/AVC and H.265/HEVC. SEI embeds metadata and other ancillary data directly into video streams. |




