This topic describes how to integrate an audio and video agent into your iOS application using the AICallKit software development kit (SDK).
Prerequisites
Xcode 16.0 or later. The latest official version is recommended.
CocoaPods 1.9.3 or later
A physical device that runs iOS 11.0 or later
Flowchart
Your app can obtain an RTC token from your AppServer, and then call the call(config) method to start a call. During the call, you can call AICallKit APIs to implement interactive features such as live subtitles and interruptions for the AI agent. AICallKit depends on real-time audio and video capabilities, so the features of ApsaraVideo Real-Time Communication (ARTC) have been integrated into AICallKit SDK. If your business scenario requires live streaming and VOD capabilities, consider using ApsaraVideo MediaBox SDK. For more information, see Select and download SDKs.
Integrate the SDK
target 'YourTarget' do
# Import AliVCSDK_ARTC, AliVCSDK_Standard, or AliVCSDK_InteractiveLive for ApsaraVideo Real-time Communication capabilities.
pod 'AliVCSDK_ARTC', '~> x.x.x'
# Import the AICallKit SDK.
pod 'ARTCAICallKit', '~> x.x.x'
...
endFor the latest compatible version number of the ARTC software development kit (SDK), see Download and integrate the SDK.
Configure the project
Add permissions for the microphone and camera. Open the info.plist file of your project and add the NSMicrophoneUsageDescription and NSCameraUsageDescription permissions.
Open the project settings. On the Signing & Capabilities tab, enable Background Modes. This is strongly recommended. If you do not enable this mode, the call cannot continue after the application enters the background. In this case, your app must end the call.
SDK development guide
Step 1: Check microphone and camera permissions for the app
Check the microphone and camera permissions for the app. If the permissions are not granted, display a dialog box to prompt the user for authorization. You need to implement this feature in your app. For sample code, see AVDeviceAuth.h.
Step 2: Create and initialize the engine
Create and initialize an ARTCAICallEngine instance. The following code provides an example:
// Create an engine instance.
let engine = ARTCAICallEngineFactory.createEngine()
let agentType: ARTCAICallAgentType
// Initialize the engine instance.
public func setup() {
// Set the callback.
self.engine.delegate = self
// If the agent is a digital human, configure the view for the digital human.
if self.agentType == .AvatarAgent {
let agentViewConfig = ARTCAICallViewConfig(view: self.avatarAgentView)
self.engine.setAgentViewConfig(viewConfig: agentViewConfig)
}
// If the agent is for visual understanding, configure the local video preview.
else if self.agentType == .VisionAgent {
let cameraViewConfig = ARTCAICallViewConfig(view: self.cameraView)
self.engine.setLocalViewConfig(viewConfig: cameraViewConfig)
}
// If the agent is for a video call, configure the view for the digital human and the local video preview.
else if self.agentType == .VideoAgent {
let agentViewConfig = ARTCAICallViewConfig(view: self.avatarAgentView)
self.engine.setAgentViewConfig(viewConfig: agentViewConfig)
let cameraViewConfig = ARTCAICallViewConfig(view: self.cameraView)
self.engine.setLocalViewConfig(viewConfig: cameraViewConfig)
}
}Step 3: Implement callback methods
Implement the required engine callback methods. For more information about engine callback operations, see API reference.
// Callback processing (This example shows only non-core callback operations).
public func onErrorOccurs(code: ARTCAICallErrorCode) {
// An error occurred.
self.engine.handup()
}
public func onCallBegin() {
// The call started.
}
public func onCallEnd() {
// The call ended.
}
public func onAgentStateChanged(state: ARTCAICallAgentState) {
// The agent status changed.
}
public func onUserSubtitleNotify(text: String, isSentenceEnd: Bool, sentenceId: Int) {
// Notification for the result of the agent's recognition of the user's question.
}
public func onVoiceAgentSubtitleNotify(text: String, isSentenceEnd: Bool, userAsrSentenceId: Int) {
// Notification for the agent's answer.
}
public func onVoiceIdChanged(voiceId: String) {
// The timbre of the current call changed.
}
public func onVoiceInterrupted(enable: Bool) {
// Whether voice interruption is enabled for the current call.
}Step 4: Create and initialize ARTCAICallConfig
For more information about ARTCAICallConfig, see ARTCAICallConfig.
let callConfig = ARTCAICallConfig()
callConfig.agentId = "xxx" // The agent ID.
callConfig.agentType = self.agentType // The agent type.
callConfig.userId = "xxx" // Use the ID of the user who has logged on to your app.
callConfig.region = "xx-xxx" // The region where the agent service is located.
callConfig.userJoinToken = "xxxxxxxxx" // The RTC token.
// For visual understanding and video calls, you must set the video configuration.
// In this example, frameRate is set to 5. Adjust this value based on the agent's frame sampling rate, which is configured in the console and is typically 2. Do not set the frame rate to a value greater than 15 fps.
// bitrate: If frameRate is greater than 10, you can set the bitrate to 512.
if self.config.agentType == .VisionAgent{
callConfig.videoConfig = ARTCAICallVideoConfig(frameRate: 5, bitrate: 340, useFrontCameraDefault: false)
}
if self.config.agentType == .VideoAgent {
callConfig.videoConfig = ARTCAICallVideoConfig(frameRate: 5, bitrate: 340, useFrontCameraDefault: true)
}
Region name | Region ID |
China (Hangzhou) | cn-hangzhou |
China (Shanghai) | cn-shanghai |
China (Beijing) | cn-beijing |
China (Shenzhen) | cn-shenzhen |
Singapore | ap-southeast-1 |
For the userJoinToken parameter, you must obtain an RTC token. For more information, see Generate an ARTC authentication token.
Step 5: Initiate a call to the agent
Call the call(config) method to initiate a call to the agent.
// After the agent starts, start the call.
public func start() {
let callConfig = ... // Create a callConfig object based on the preceding code.
if self.engine.call(config: callConfig) {
// The API operation is successfully called.
}
}
// After the call is connected, the following callback is triggered.
public func onCallBegin() {
// The call started.
}Step 6: Implement in-call features
After the call starts, you can process captions, interrupt the agent, and perform other operations as needed. For more information, see Implement features.
Step 7: End the call
Call the hangup() method to end the call with the agent.
public func handup() {
self.engine.handup()
}