All Products
Search
Document Center

Intelligent Media Services:Integration overview

Last Updated:Dec 25, 2025

This topic describes how to integrate an audio/video agent into your Android application using the AICallKit software development kit (SDK).

Prerequisites

  • Android Studio plugin version 4.1.3

  • Gradle 7.0.2

  • JDK 11, which comes with Android Studio

Flowchart

image

Your app can obtain an RTC token from your AppServer, and then call the call(config) method to start a call. During the call, you can call AICallKit APIs to implement interactive features such as live subtitles and interruptions for the AI agent. AICallKit depends on real-time audio and video capabilities, so the features of ApsaraVideo Real-Time Communication (ARTC) have been integrated into AICallKit SDK. If your business scenario requires live streaming and VOD capabilities, consider using ApsaraVideo MediaBox SDK. For more information, see Select and download SDKs.

Integrate the SDK

  1. Add the Alibaba Cloud Maven repository to the project-level build.gradle file.

    allprojects {
        repositories {
            google()
            jcenter()
            maven { url 'https://maven.aliyun.com/repository/google' }
            maven { url 'https://maven.aliyun.com/repository/public' }
        }
    }
  2. In the corresponding build.gradle file, add the ARTCAICallKit dependency.

    dependencies {
        implementation 'com.aliyun.aio:AliVCSDK_ARTC:x.x.x'                  // Replace x.x.x with the version that is compatible with your project.
        implementation 'com.aliyun.auikits.android:ARTCAICallKit:x.x.x'
        implementation 'com.alivc.live.component:PluginAEC:2.0.0'
    }
    Note
    • Latest ARTC SDK version: 7.9.1

    • Latest AICallKit SDK version: 2.9.1.

SDK development guide

Step 1: Request audio and video permissions for the app

You can check for microphone and camera permissions. If the permissions are not granted, you can prompt the user for authorization. You must implement this feature in your application. For sample code, see PermissionUtils.java.

PermissionX.init(this)
.permissions(PermissionUtils.getPermissions())
.request((allGranted, grantedList, deniedList) -> {
});

Step 2: Create and initialize the engine

You can create and initialize the ARTCAICallEngine. The following code provides an example:

String userId = "123";  // Use the user ID from your app's logon system for userId.
ARTCAICallEngineImpl engine = new ARTCAICallEngineImpl(this, userId);

 // If the agent is a digital human, configure the view container for the digital human.
if (aiAgentType == AvatarAgent) {
    ViewGroup avatarlayer;
    engine.setAgentView(
        avatarlayer,
        new ViewGroup.LayoutParams(ViewGroup.LayoutParams.MATCH_PARENT,
                                   ViewGroup.LayoutParams.MATCH_PARENT)
    );
}
// If the agent is a visual understanding agent, configure the view container for the local video preview.
else if (aiAgentType == VisionAgent) {
    ViewGroup previewLayer;
    engine.setLocalView(previewLayer,
        new FrameLayout.LayoutParams(ViewGroup.LayoutParams.MATCH_PARENT,
                                     ViewGroup.LayoutParams.MATCH_PARENT)
    );
} else if(aiAgentType == VideoAgent) {
    ARTCAICallEngine.ARTCAICallVideoCanvas remoteCanvas = new ARTCAICallEngine.ARTCAICallVideoCanvas();
            remoteCanvas.zOrderOnTop = false;
            remoteCanvas.zOrderMediaOverlay = false;
    ViewGroup avatarlayer;
    engine.setAgentView(
        avatarlayer,
        new ViewGroup.LayoutParams(ViewGroup.LayoutParams.MATCH_PARENT,
                                   ViewGroup.LayoutParams.MATCH_PARENT), remoteCanvas
    );

    ViewGroup previewLayer;
    engine.setLocalView(previewLayer,
        new FrameLayout.LayoutParams(ViewGroup.LayoutParams.MATCH_PARENT,
                                     ViewGroup.LayoutParams.MATCH_PARENT)
    );

}

Step 3: Implement callback methods

You can implement engine callbacks as needed. For more information about the engine callback API operations, see API operation details.

protected ARTCAICallEngine.IARTCAICallEngineCallback mCallEngineCallback = new ARTCAICallEngine.IARTCAICallEngineCallback() {
    @Override
    public void onErrorOccurs(ARTCAICallEngine.AICallErrorCode errorCode) {
        // An error occurred. End the call.
        engine.handup();
    }

    @Override
    public void onCallBegin() {
        // The call starts (user joins the session).
    }

    @Override
    public void onCallEnd() {
        // The call ends (user leaves the session).
    }

    @Override
    public void onAICallEngineRobotStateChanged(ARTCAICallEngine.ARTCAICallRobotState oldRobotState, ARTCAICallEngine.ARTCAICallRobotState newRobotState) {
        // Agent state synchronization.
    }

    @Override
    public void onUserSpeaking(boolean isSpeaking) {
        // Callback for when the user is speaking.
    }

    @Override
    public void onUserAsrSubtitleNotify(String text, boolean isSentenceEnd, int sentenceId, VoicePrintStatusCode voicePrintStatusCode) {
    
    }

    @Override
    public void onAIAgentSubtitleNotify(String text, boolean end, int userAsrSentenceId) {
        // Sync the agent's response.
    }

    @Override
    public void onNetworkStatusChanged(String uid, ARTCAICallEngine.ARTCAICallNetworkQuality quality) {
        // Callback for network status.
    }

    @Override
    public void onVoiceVolumeChanged(String uid, int volume) {
        // Volume change.
    }

    @Override
    public void onVoiceIdChanged(String voiceId) {
        // The voice timbre for the current call has changed.
    }

    @Override
    public void onVoiceInterrupted(boolean enable) {
        // The voice interruption setting for the current call has changed.
    }

    @Override
    public void onAgentVideoAvailable(boolean available) {
        // Whether the agent's video is available (stream ingest).
    }

    @Override
    public void onAgentAudioAvailable(boolean available) {
        // Whether the agent's audio is available (stream ingest).
    }

    @Override
    public void onAgentAvatarFirstFrameDrawn() {
        // The first video frame of the digital human is rendered.
    }

    @Override
    public void onUserOnLine(String uid) {
        // Callback for when a user comes online.
    }

};

engine.setEngineCallback(mCallEngineCallback);

Step 4: Create and initialize ARTCAICallConfig

For more information about ARTCAICallConfig, see ARTCAICallConfig.

ARTCAICallEngine.ARTCAICallConfig artcaiCallConfig = new ARTCAICallEngine.ARTCAICallConfig();
artcaiCallConfig.agentId = "XXX";            // The agent ID. This parameter is required.
artcaiCallConfig.region = "cn-shanghai";// The agent region. This parameter is required.
artcaiCallConfig.agentType = VoiceAgent;// The agent type: voice-only, digital human, visual understanding, or video call.
engine.init(artcaiCallConfig);

Region name

Region ID

China (Hangzhou)

cn-hangzhou

China (Shanghai)

cn-shanghai

China (Beijing)

cn-beijing

China (Shenzhen)

cn-shenzhen

Singapore

ap-southeast-1

Step 5: Initiate a call to the agent

You can invoke the call() API operation to initiate a call to the agent. To obtain an authentication token, see Generate an ARTC authentication token. After the call starts, you can process captions, interrupt the agent, and perform other actions as needed. For more information, see Implement features.

engine.call(token);

// After the call is connected, the following callback is triggered.
public void onCallBegin() {
        // The call starts (user joins the session).
}

Step 6: Implement in-call features

After the call starts, you can process captions, interrupt the agent, and perform other actions as needed. For more information, see Implement features.

Step 7: End the call

You can invoke the handup() API operation to end the call with the agent.

engine.handup();