All Products
Search
Document Center

ApsaraVideo Live:ARTC overview

Last Updated:Dec 15, 2025

Traditional live streaming focuses on one-way content delivery, resulting in low audience engagement and conversion rates. ApsaraVideo Real-time Communication (ARTC) transforms passive viewers into active participants through interactive features, such as voice chat, video co-streaming, and game interaction.

Architecture

Based on advanced technical architecture and algorithms, ARTC provides developers and enterprises with efficient, stable, and easy-to-use SDKs and APIs. It supports seamless integration across platforms, including iOS, Android, Web, and Windows. Additionally, you can combine ARTC with other Alibaba Cloud services to build solutions for a wider range of use cases.

image

Use cases

Voice chat

It supports up to 50 simultaneous speakers with an end-to-end latency between 150 and 400 ms. Various audio features are available, including voice changer, reverberation, and voice enhancement. For content compliance, ARTC provides content moderation services and supports third-party integration. For scenario-based solutions, see Voice chat room, Online karaoke, One-on-one audio and video calls.

Co-streaming

ARTC supports collaborative live streaming, which means viewers can chat with streamers alongside live content, and streamers from different rooms can engage in popularity-based battles. The end-to-end latency is between 150 and 400 ms, allowing viewers to seamlessly join and leave the live stream. Additionally, it supports both standard streaming and Real-Time Streaming (RTS), enabling concurrent viewership of over 100,000 viewers. For more information, see Multi-host streaming.

Co-streaming

image

Streamer battle

image

Real-time Conversational AI

Real-time Conversational AI helps enterprises quickly build applications for audio and video interactions between AI agents and end users. You can build a dedicated agent within 10 minutes by following the instructions on the GUI. The agent can interact with end users in real time over the Global Realtime Transport Network (GRTN). For more information, see Overview.

Voice call

lQDPJxfI6jx3dVXNC6zNBaCwBmqQRhnIAYIHiTmemZYXAA_1440_2988

Avatar call

lQDPJwMuwU90JFXNC6zNBaCwNbn8uKeIjbgHiTmd5-WQAA_1440_2988

Vision call

lQDPJwpRBT4ppFXNC6zNBaCwzODP1_m-L7MHiTmc7Nh_AA_1440_2988

Features

Feature

Description

Scenarios

Billing

Video call

Supports one-on-one or group video calls with high-definition quality from 480P to 1080P.

Personal calls, conferences, video customer service

Billing for audio/video communication

Voice call

Supports one-on-one or group audio calls.

Personal calls, group chat, voice chat

Video interaction

Supports multi-person video interaction with resolutions from 480P to 1080P, and end-to-end latency less than 300ms.

RTS, cross-channel streamer battle

Voice interaction

Supports high-fidelity voice interaction at a 48 kHz sampling rate.

Voice chat room, online karaoke, multi-host streaming

Recording

Records audio and video streams and stores them in Object Storage Service (OSS) or ApsaraVideo VOD.

Archiving, compliance review

Billing of live recording

Transcoding

Transcodes streams to ensure audio and video content can be smoothly transmitted and played across various platforms without compromising quality.

Recording format conversion

Billing of live transcoding

Stream mixing and relay

Mixes multiple streams into a single one based on specific rules. The mixed stream can then be relayed to ApsaraVideo Live or a third party platform.

Multi-view live streaming, large-scale multi-party conferences, multi-teacher collaborative teaching

Billing of stream relay

Audio moderation

Reviews audio content by accessing the audio moderation capability provided by Alibaba Cloud or a third party.

Business security checks, content compliance

Billing of automated review

Video moderation

Reviews video content by accessing the video moderation capability provided by Alibaba Cloud or a third party.

Business security checks, content compliance

Face retouching

Provides multiple retouching effects.

Video calls, interactive streaming, online classes

Billing of Queen SDK

Reverberation

Supports various reverberation effects such as hallway, church, studio, basement, and concert hall.

Voice calls, video calls, voice chat rooms, online karaoke.

Free

Voice Changer

Supports various effects such as electric sound, old man voice, husky male voice, and lively female voice.

Online karaoke rooms, voice chat rooms

Smart noise reduction

Eliminates ambient noise, suppresses sudden loud noises, and cancels feedback from multiple devices while preserving high-fidelity voice quality.

Voice calls, multi-person conferences

Low-latency in-ear monitoring

During audio capture, processing, and playback, a user's voice is fed back to them through headphones (or other audio output devices) with minimal delay.

Interactive streaming, online karaoke, recording room

Audio 3A processing

Supports Acoustic Echo Cancellation (AEC), Automatic Noise Suppression (ANS), and Automatic Gain Control (AGC).

Voice-related scenarios

Screen sharing

Shares desktop, window, or specific screen areas with other users, and supports simultaneous display with camera feed.

Online classes, remote assistance

Spatial audio

Simulates sound propagation in three-dimensional space through advanced audio technology, creating an immersive audio experience with a sense of direction and position.

Online karaoke rooms, voice chat rooms

Custom audio/ video input

Supports input of external audio and video stream data.

Custom beauty effects, custom sound effects

Benefits

High-quality service worldwide

ApsaraVideo Live boasts an extensive global presence, with:

  • 9 live centers: China (Beijing), China (Shenzhen), China (Shanghai), China (Qingdao), Singapore, Germany (Frankfurt), Japan (Tokyo), Indonesia (Jakarta), and Saudi Arabia (Riyadh) regions

  • 3 stream relay hubs: China (Shanghai), Singapore, and Saudi Arabia (Riyadh) regions

  • Over 3200 nodes worldwide

This ensures reliable and high-availability services around the globe.

Security compliance

ARTC maintains full compliance with global regulations regarding calling and adheres to stringent privacy protection standards.

Diverse product combinations

ARTC provides a one-stop solution that leverages diverse Alibaba Cloud products and services, including ECS, OSS, security services, live streaming, video-on-demand, avatars, and AI.

Easy to use

  • Scenario-based API integration: Encapsulates underlying API operations based on business scenarios to simplify development. For more information, see Client-side API.

  • Multi-scenario practices: Covers various scenarios, such as one-on-one calls, co-streaming, voice chat rooms, and online karaoke. For more information, see Scenario-specific solutions.

Limitations

  • User capacity per channel:

    • Interactive mode: By default, a channel supports a maximum of 17 streamers (on-stage) and 1,000 viewers (off-stage).

      Note

      To support an unlimited number of viewers in the interactive mode, relay the streams to ApsaraVideo Live.

    • Communication mode: By default, a channel supports a maximum of 50 users.

  • Each user can publish only one main stream (audio-video, audio-only, or video-only) and one screen-sharing stream simultaneously.

Concepts

The following table lists concepts related to ARTC.

Concept

Description

SDKAppID

To manage customer services, ARTC uses SDKAppID as a unique identifier for applications. You need to create an independent SDKAppID for each of your service to isolate their configurations and data. 

ChannelID

A channel, identified by a ChannelID, is an audio and video space defined by ARTC. Users in the same channel can interact with each other. In certain scenarios, ARTC also allows audio and video interaction between users across different channels.

UserID

In ARTC, a UserID uniquely identifies a user in an application. 

Token

token is a security signature designed by Alibaba Cloud to prevent malicious parties from accessing your cloud service resources. You need to provide information including the SDKAppIDUserIDChannelIDtimestamp, and token in the login function of the corresponding SDK.

Stream

A stream is a continuous flow of audio and video data that has been compressed and encoded for transmission over a network and can be played instantly.

Publish

Publish refers to the operation of uploading local audio and video data to Alibaba Cloud servers. This operation is equivalent to stream ingest

Subscribe

Subscription refers to the operation of pulling audio and video data from Alibaba Cloud servers to local devices. This operation is equivalent to stream pulling

Role

In ARTC, there are two kinds of roles: streamer and viewer. A streamer can publish or subscribe to audio and video streams. A viewer can only subscribe to audio and video streams. Users can switch between the roles during a session. 

Stream mixing and relay

The feature allows you to mix multiple audio and video streams, configure layout and encoding parameters, and then relay the processed streams to ApsaraVideo Live or a third-party live streaming platform. 

After relaying the stream to ApsaraVideo Live, you can use its features for transcoding, recording, and live viewing.

Supplemental Enhancement Information (SEI)

SEI is a mechanism within video encoding standards like H.264/AVC and H.265/HEVC. SEI embeds metadata and other ancillary data directly into video streams.