All Products
Search
Document Center

:Integration overview

Last Updated:Sep 12, 2025

This topic describes how to integrate an audio and video agent into your Android application using the AICallKit software development kit (SDK).

Environment requirements

  • Android Studio plug-in version 4.1.3

  • Gradle 7.0.2

  • JDK 11, which is included with Android Studio

Flowchart

image

Your app can obtain an RTC token from your AppServer, and then call the call(config) method to start a call. During the call, you can call AICallKit APIs to implement interactive features such as live subtitles and interruptions for the AI agent. AICallKit depends on real-time audio and video capabilities, so the features of ApsaraVideo Real-Time Communication (ARTC) have been integrated into AICallKit SDK. If your business scenario requires live streaming and VOD capabilities, consider using ApsaraVideo MediaBox SDK. For more information, see Select and download SDKs.

Integrate the SDK

  1. Add the Alibaba Cloud Maven repository to the project-level build.gradle file.

    allprojects {
        repositories {
            google()
            jcenter()
            maven { url 'https://maven.aliyun.com/repository/google' }
            maven { url 'https://maven.aliyun.com/repository/public' }
        }
    }
  2. Add the ARTCAICallKit dependency to the corresponding build.gradle file.

    dependencies {
        implementation 'com.aliyun.aio:AliVCSDK_ARTC:7.5.0'                  // Replace x.x.x with the version compatible with your project.
        implementation 'com.aliyun.auikits.android:ARTCAICallKit:2.8.0'
    }
    Note

    For the latest compatible version number of the Alibaba Real-Time Communication (ARTC) SDK, see Download and integrate the SDK.

SDK development guide

Step 1: Request audio and video permissions for the app

Your application must check for microphone and camera permissions. If the permissions are not granted, a dialog box must appear to prompt the user for authorization. For sample code, see PermissionUtils.java.

PermissionX.init(this)
.permissions(PermissionUtils.getPermissions())
.request((allGranted, grantedList, deniedList) -> {
});

Step 2: Create and initialize the engine

Create and initialize the ARTCAICallEngine instance. The following code provides an example:

String userId = "123";  // Use the ID of the user who has logged on to your app as the userId.
ARTCAICallEngineImpl engine = new ARTCAICallEngineImpl(this, userId);

 // If the agent is a digital human, configure the view container for displaying the digital human.
if (aiAgentType == AvatarAgent) {
    ViewGroup avatarlayer;
    engine.setAgentView(
        avatarlayer,
        new ViewGroup.LayoutParams(ViewGroup.LayoutParams.MATCH_PARENT,
                                   ViewGroup.LayoutParams.MATCH_PARENT)
    );
}
// If the agent is for visual understanding, configure the view container for displaying the local video preview.
else if (aiAgentType == VisionAgent) {
    ViewGroup previewLayer;
    engine.setLocalView(previewLayer,
        new FrameLayout.LayoutParams(ViewGroup.LayoutParams.MATCH_PARENT,
                                     ViewGroup.LayoutParams.MATCH_PARENT)
    );
} else if(aiAgentType == VideoAgent) {
    ARTCAICallEngine.ARTCAICallVideoCanvas remoteCanvas = new ARTCAICallEngine.ARTCAICallVideoCanvas();
            remoteCanvas.zOrderOnTop = false;
            remoteCanvas.zOrderMediaOverlay = false;
    ViewGroup avatarlayer;
    engine.setAgentView(
        avatarlayer,
        new ViewGroup.LayoutParams(ViewGroup.LayoutParams.MATCH_PARENT,
                                   ViewGroup.LayoutParams.MATCH_PARENT), remoteCanvas
    );

    ViewGroup previewLayer;
    engine.setLocalView(previewLayer,
        new FrameLayout.LayoutParams(ViewGroup.LayoutParams.MATCH_PARENT,
                                     ViewGroup.LayoutParams.MATCH_PARENT)
    );

}

Step 3: Implement callback methods

Implement the necessary engine callback methods. For more information about the engine callback API operations, see API reference.

protected ARTCAICallEngine.IARTCAICallEngineCallback mCallEngineCallback = new ARTCAICallEngine.IARTCAICallEngineCallback() {
    @Override
    public void onErrorOccurs(ARTCAICallEngine.AICallErrorCode errorCode) {
        // An error occurred. End the call.
        engine.handup();
    }

    @Override
    public void onCallBegin() {
        // The call starts (the user joins the session).
    }

    @Override
    public void onCallEnd() {
        // The call ends (the user leaves the session).
    }

    @Override
    public void onAICallEngineRobotStateChanged(ARTCAICallEngine.ARTCAICallRobotState oldRobotState, ARTCAICallEngine.ARTCAICallRobotState newRobotState) {
        // Robot status synchronization.
    }

    @Override
    public void onUserSpeaking(boolean isSpeaking) {
        // Callback for when the user is speaking.
    }

    @Override
    public void onUserAsrSubtitleNotify(String text, boolean isSentenceEnd, int sentenceId, VoicePrintStatusCode voicePrintStatusCode) {
    
    }

    @Override
    public void onAIAgentSubtitleNotify(String text, boolean end, int userAsrSentenceId) {
        // Synchronize the agent's response.
    }

    @Override
    public void onNetworkStatusChanged(String uid, ARTCAICallEngine.ARTCAICallNetworkQuality quality) {
        // Callback for network status.
    }

    @Override
    public void onVoiceVolumeChanged(String uid, int volume) {
        // Volume change.
    }

    @Override
    public void onVoiceIdChanged(String voiceId) {
        // The timbre of the current call has changed.
    }

    @Override
    public void onVoiceInterrupted(boolean enable) {
        // The voice interruption setting for the current call has changed.
    }

    @Override
    public void onAgentVideoAvailable(boolean available) {
        // Whether the agent's video is available (stream ingest).
    }

    @Override
    public void onAgentAudioAvailable(boolean available) {
        // Whether the agent's audio is available (stream ingest).
    }

    @Override
    public void onAgentAvatarFirstFrameDrawn() {
        // Rendering of the first video frame of the digital human.
    }

    @Override
    public void onUserOnLine(String uid) {
        // Callback for when the user is online.
    }

};

engine.setEngineCallback(mCallEngineCallback);

Step 4: Create and initialize ARTCAICallConfig

For more information about ARTCAICallConfig, see ARTCAICallConfig.

ARTCAICallEngine.ARTCAICallConfig artcaiCallConfig = new ARTCAICallEngine.ARTCAICallConfig();
artcaiCallConfig.agentId = "XXX";            // The agent ID. This parameter is required.
artcaiCallConfig.region = "cn-shanghai";     // The agent region. This parameter is required.
artcaiCallConfig.agentType = VoiceAgent;     // The agent type: voice-only, digital human, visual understanding, or video call.
engine.init(artcaiCallConfig);

Region name

Region ID

China (Hangzhou)

cn-hangzhou

China (Shanghai)

cn-shanghai

China (Beijing)

cn-beijing

China (Shenzhen)

cn-shenzhen

Singapore

ap-southeast-1

Step 5: Initiate a call to the agent

Call the call() method to initiate a call to the agent. To obtain an authentication token, see Generate an ARTC authentication token. After the call starts, you can handle captions, interrupt the agent's speech, and perform other actions as needed. For more information, see Implement features.

engine.call(token);

// After the call is connected, the following callback is triggered.
public void onCallBegin() {
        // The call starts (the user joins the session).
}

Step 6: Implement in-call features

After the call starts, you can handle captions, interrupt the agent's speech, and perform other actions as needed. For more information, see Implement features.

Step 7: End the call

Call the hangup() method to end the call with the agent.

engine.handup();