All Products
Search
Document Center

Intelligent Media Services:Integrate into iOS apps

Last Updated:Mar 05, 2025

This topic describes how to integrate AICallKit SDK into your iOS app.

Environment requirements

  • Xcode 16.0 or later. We recommend that you use the latest official version.

  • CocoaPods 1.9.3 or later.

  • A physical device that runs iOS 10.0 or later.

Integrate the SDK

target 'Your target' do

  # Integrate ApsaraVideo MediaBox SDK for Alibaba Real-Time Communication (ARTC). You can integrate AliVCSDK_ARTC, AliVCSDK_Standard, or AliVCSDK_InteractiveLive.
  pod 'AliVCSDK_ARTC', '~> x.x.x'
  # Integrate AICallKit SDK.
  pod 'ARTCAICallKit', '~> 1.6.0'

  ...

end
Note

You can go to the official website to download ARTC SDK of the latest version.

Configure the project

  • Open the info.Plist file of your project and add the NSMicrophoneUsageDescription and NSCameraUsageDescription permissions.

  • In the project settings, enable Background Modes on the Signing & Capabilities tab. We recommend that you enable Background Modes. Otherwise, calls that are switched to the background cannot continue. In this case, you must call the handup() method to end the calls.

Sample code for using the SDK

// Integrate the SDK.
import ARTCAICallKit

// Create an engine instance.
var engine: ARTCAICallEngineInterface = {
    return ARTCAICallEngineFactory.createEngine()
}()

// Configure callback events.
self.engine.delegate = self

// Start a call after an intelligent agent is started.
let agentInfo = ARTCAICallAgentInfo(agentType: workflow_type, channelId: channel_id, uid: ai_agent_user_id, instanceId: agent_instance_id)
self.engine.call(userId: self.userId, token: rtc_auth_token, agentInfo: agentInfo) { [weak self] error in
    if let error = error {
        // Handle the error.
    }
    else {
        // The call is successful.
    }
}

// End the call.
self.engine.handup()

// For more information about how to call other methods, see the "SDK reference" section of this topic.

// The callback events. The following examples include only callback events that do not require specialized main code.
public func onErrorOccurs(code: ARTCAICallErrorCode) {
    // An error occurred.
    self.engine.handup()
}

public func onCallBegin() {
    // The call starts.
}

public func onCallEnd() {
    // The call ends.
}

public func onAgentStateChanged(state: ARTCAICallAgentState) {
    // The status of the intelligent agent changes.
}

public func onUserSubtitleNotify(text: String, isSentenceEnd: Bool, sentenceId: Int) {
    // The question of the user is recognized by the intelligent agent.
}

public func onVoiceAgentSubtitleNotify(text: String, isSentenceEnd: Bool, userAsrSentenceId: Int) {
    // The intelligent agent returns an answer.
}

public func onVoiceIdChanged(voiceId: String) {
    // The voice of the current call changes.
}

public func onVoiceInterrupted(enable: Bool) {
    // Indicate whether voice interruption is enabled for the current call.
}

API reference

API overview

Class or protocol

API

Description

ARTCAICallEngineInterface

an engine instance

userId

Queries the ID of the user on the current call.

isOnCall

Indicates whether a call is in progress.

agentInfo

Queries the information about the current intelligent agent.

agentState

Queries the status of the current intelligent agent.

delegate

Configures and queries callback events.

call

Starts a call.

handup

Ends a call.

setAgentViewConfig

Configures the rendering view for the intelligent agent. This setting is valid only for the digital human agent.

interruptSpeaking

Interrupts the speech of the intelligent agent.

enableVoiceInterrupt

Enables or disables intelligent interruption.

switchVoiceId

Changes the voice.

enableSpeaker

Enables or disables the speaker.

enablePushToTalk

Enables or disables intercom mode.

startPushToTalk

Starts speaking in intercom mode.

finishPushToTalk

Finishes speaking in intercom mode.

cancelPushToTalk

Cancels speaking in intercom mode.

muteMicrophone

Mutes or unmutes the microphone.

visionConfig

The parameter configuration for visual understanding calls.

muteLocalCamera

Turns on or off a camera.

switchCamera

Switches between the front and rear cameras.

parseShareAgentCall

Parses the information about a shared intelligent agent.

generateShareAgentCall

Starts a call with a shared intelligent agent.

getRTCInstance

Queries the information about a Real-Time Communication (RTC) engine instance.

destroy

Releases resources.

ARTCAICallEngineDelegate

the callback events of an engine instance

onErrorOccurs

An error occurred.

onCallBegin

A call starts.

onCallEnd

A call ends.

onAgentVideoAvailable

Indicates whether the ingested video stream of the intelligent agent is available.

onAgentAudioAvailable

Indicates whether the ingested audio stream of the intelligent agent is available.

onPushToTalk

Indicates whether intercom mode is enabled for the current call.

onAgentWillLeave

The intelligent agent is about to end the current call.

onReceivedAgentCustomMessage

A custom message is received from the current intelligent agent.

onAgentStateChanged

The status of the intelligent agent changes.

onNetworkStatusChanged

The network status changes.

onVoiceVolumeChanged

The volume changes.

onUserSubtitleNotify

The intelligent agent recognizes the question of the user.

onVoiceAgentSubtitleNotify

The intelligent agent returns an answer.

onVoiceIdChanged

The voice of the intelligent agent changes.

onVoiceInterrupted

Indicates whether voice interruption is enabled for the current call.

onAgentAvatarFirstFrameDrawn

The first video frame of the digital human agent is rendered.

onHumanTakeoverWillStart

A human agent is stepping in to take over from the current intelligent agent.

onHumanTakeoverConnected

A human agent has taken over from the current intelligent agent.

onAgentEmotionNotify

The intelligent agent detects an emotion.

ARTCAICallEngineFactory

an engine factory

createEngine

Creates a default engine instance.

ARTCAICallEngineInterface

userId

Queries the ID of the user on the current call.

var userId: String?  {get}

isOnCall

Indicates whether a call is in progress. Valid values: true and false.

var isOnCall: Bool { get }

agentInfo

Queries the information about the current intelligent agent. The information includes the type, channel ID, UID of the intelligent agent in the channel, and instance ID of the intelligent agent.

var agentInfo: ARTCAICallAgentInfo?  { get }

agentState

The status of the intelligent agent. The agent can be listening, thinking, or speaking.

var agentState: ARTCAICallAgentState { get }

delegate

Configures and queries callback events.

weak var delegate: ARTCAICallEngineDelegate?  { get set }

call

Starts a call.

func call(userId: String, token: String, agentInfo: ARTCAICallAgentInfo, completed:((_ error: NSError?) -> Void)?)

Parameters

Parameter

Type

Description

userId

String

The UID of the current user.

token

String

The token that is used to join a meeting.

agentInfo

ARTCAICallAgentInfo

The information about the intelligent agent.

completed

((_ error: NSError?) -> Void)?

The callback that is invoked once the process is finished.

handup

Ends a call.

 func handup(_ stopAIAgent: Bool)

Parameters

Parameter

Type

Description

stopAIAgent

Bool

Specifies whether to stop the intelligent agent.

setAgentViewConfig

Configures the rendering view for the intelligent agent. This setting is valid only for the digital human agent.

func setAgentViewConfig(viewConfig: ARTCAICallViewConfig?)

Parameters

Parameter

Type

Description

viewConfig

ARTCAICallViewConfig?

The rendering view settings, including the rendering view, rendering mode, mirror mode, and rotation mode.

interruptSpeaking

Interrupts the speech of the intelligent agent.

func interruptSpeaking() -> Bool

enableVoiceInterrupt

Enables or disables intelligent interruption.

func enableVoiceInterrupt(enable: Bool) -> Bool

Parameters

Parameter

Type

Description

enable

Bool

Specifies whether to enable intelligent interruption.

switchVoiceId

Changes the voice.

func switchVoiceId(voiceId: String) -> Bool

Parameters

Parameter

Type

Description

voiceId

String

The voice ID.

enableSpeaker

Enables or disables the speaker.

func enableSpeaker(enable: Bool) -> Bool

Parameters

Parameter

Type

Description

enable

Bool

Specifies whether to enable the speaker.

enablePushToTalk

Enables or disables intercom mode. In intercom mode, the intelligent agent returns a result only after the finishPushToTalk operation is called.

func enablePushToTalk(enable: Bool) -> Bool

Parameters

Parameter

Type

Description

enable

Bool

Specifies whether to enable intercom mode.

startPushToTalk

Starts speaking in intercom mode.

func startPushToTalk() -> Bool

finishPushToTalk

Finishes speaking in intercom mode.

func finishPushToTalk() -> Bool

cancelPushToTalk

Cancels speaking in intercom mode.

func cancelPushToTalk() -> Bool

muteMicrophone

Mutes or unmutes the microphone.

func muteMicrophone(mute: Bool) -> Bool

Parameters

Parameter

Type

Description

mute

Bool

Specifies whether to mute the microphone.

visionConfig

The visual configuration, including the resolution and frame rate. For more information, see ARTCAICallVisionConfig. The configuration is required if you use a visual intelligent agent. The configuration takes effect only if you set it before the call.

var visionConfig: ARTCAICallVisionConfig { set get }

muteLocalCamera

Turns on or off a camera.

func muteLocalCamera(mute: Bool) -> Bool

Parameters

Parameter

Type

Description

mute

Bool

Specifies whether to turn on a camera.

switchCamera

Switches between the front and rear cameras.

func switchCamera() -> Bool

parseShareAgentCall

Parses the information about a shared intelligent agent. If the parsing is successful, an instance of the ARTCAICallAgentShareConfig type is returned, which can be used to start a call with the intelligent agent.

func parseShareAgentCall(shareInfo: String) -> ARTCAICallAgentShareConfig?

Parameters

Parameter

Type

Description

shareInfo

String

The shared intelligent agent information, which can be generated in the console.

generateShareAgentCall

Starts a call with a shared intelligent agent.

func generateShareAgentCall(shareConfig: ARTCAICallAgentShareConfig, userId: String, completed: ((_ rsp: ARTCAICallAgentInfo?, _ token: String?, _ error: NSError?, _ reqId: String) -> Void)?)

Parameters

Parameter

Type

Description

shareConfig

ARTCAICallAgentShareConfig

The share configurations, such as the share ID, intelligent agent type, expiration time, template settings, and region. You can use the SDK to view their definitions.

userId

String

The ID of the logon user.

completed

((_ rsp: ARTCAICallAgentInfo?, _ token: String?, _ error: NSError?, _ reqId: String) -> Void)?

The callback that is invoked once the process is finished.

getRTCInstance

Queries the information about an RTC engine instance.

func getRTCInstance() -> AnyObject?

destroy

Releases resources.

func destroy()

ARTCAICallEngineDelegate

onErrorOccurs

An error occurred during the current call.

@objc optional func onErrorOccurs(code: ARTCAICallErrorCode)

Parameters

Parameter

Type

Description

code

ARTCAICallErrorCode

The error code.

onCallBegin

A call starts.

@objc optional func onCallBegin()

onCallEnd

A call ends.

@objc optional func onCallEnd()

onAgentVideoAvailable

Indicates whether the ingested video stream of the intelligent agent is available.

@objc optional func onAgentVideoAvailable(available: Bool)

Parameters

Parameter

Type

Description

available

Bool

Indicates whether the ingested video stream of the intelligent agent is available.

onAgentAudioAvailable

Indicates whether the ingested audio stream of the intelligent agent is available.

@objc optional func onAgentAudioAvailable(available: Bool)

Parameters

Parameter

Type

Description

available

Bool

Indicates whether the ingested audio stream of the intelligent agent is available.

onPushToTalk

Indicates whether intercom mode is enabled for the current call.

@objc optional func onPushToTalk(enable: Bool)

Parameters

Parameter

Type

Description

enable

Bool

Indicates whether intercom mode is enabled for the current call.

onAgentWillLeave

The intelligent agent is about to end the current call.

@objc optional func onAgentWillLeave(reason: Int32, message: String)

Parameters

Parameter

Type

Description

reason

Int32

The reason why the agent is leaving. A value of 2001 indicates idle timeout. A value of 0 indicates other reasons.

message

String

The description of the reason.

onReceivedAgentCustomMessage

A custom message is received from the current intelligent agent.

@objc optional func onReceivedAgentCustomMessage(data: [String: Any]?)

Parameters

Parameter

Type

Description

data

[String: Any]?

The message content.

onAgentStateChanged

The status of the intelligent agent changes.

@objc optional func onAgentStateChanged(state: ARTCAICallAgentState)

Parameters

Parameter

Type

Description

state

ARTCAICallAgentState

The status of the intelligent agent. The agent can be listening, thinking, or speaking.

onNetworkStatusChanged

The network status changes.

@objc optional func onNetworkStatusChanged(uid: String, quality: ARTCAICallNetworkQuality)

Parameters

Parameter

Type

Description

uid

String

The ID of the current speaker.

quality

ARTCAICallNetworkQuality

The network quality. Valid values: Excellent, Good, Moderate, Poor, Extremely Poor, Interrupted, and Unknown.

onVoiceVolumeChanged

The volume changes.

@objc optional func onVoiceVolumeChanged(uid: String, volume: Int32)

Parameters

Parameter

Type

Description

uid

String

The UID of the current speaker.

volume

Int32

The volume. Valid values: 0 to 255.

onUserSubtitleNotify

The intelligent agent recognizes the question of the user.

@objc optional func onUserSubtitleNotify(text: String, isSentenceEnd: Bool, sentenceId: Int)

Parameters

Parameter

Type

Description

text

String

The text that is recognized by the intelligent agent.

isSentenceEnd

Bool

Indicates whether the current text is the end of the sentence.

sentenceId

Int

The ID of the sentence to which the current text belongs.

onVoiceAgentSubtitleNotify

The intelligent agent returns an answer.

@objc optional func onVoiceAgentSubtitleNotify(text: String, isSentenceEnd: Bool, userAsrSentenceId: Int)

Parameters

Parameter

Type

Description

text

String

The text of the answer provided by the intelligent agent.

isSentenceEnd

Bool

Indicates whether the current text is the last sentence of the answer.

userAsrSentenceId

Int

The ID of the sentence to which the question of the user recognized by the intelligent agent belongs.

onVoiceIdChanged

The voice of the intelligent agent changes.

@objc optional func onVoiceIdChanged(voiceId: String)

Parameters

Parameter

Type

Description

voiceId

String

The voice ID.

onVoiceInterrupted

Indicates whether voice interruption is enabled for the current call.

@objc optional func onVoiceInterrupted(enable: Bool)

Parameters

Parameter

Type

Description

enable

Bool

Indicates whether intelligent interruption is enabled for the current call.

onAgentAvatarFirstFrameDrawn

The first video frame of the digital human agent is rendered.

@objc optional func onAgentAvatarFirstFrameDrawn()

onHumanTakeoverWillStart

A human agent is stepping in to take over from the current intelligent agent.

@objc optional func onHumanTakeoverWillStart(takeoverUid: String, takeoverMode: Int)

Parameters

Parameter

Type

Description

takeoverUid

String

The UID of the human agent.

takeoverMode

Int

1 indicates the use of human voice. 0 indicates the use of intelligent agent voice.

onHumanTakeoverConnected

A human agent has taken over from the current intelligent agent.

@objc optional func onHumanTakeoverConnected(takeoverUid: String)

Parameters

Parameter

Type

Description

takeoverUid

String

The UID of the human agent.

onAgentEmotionNotify

The intelligent agent detects an emotion.

@objc optional func onAgentEmotionNotify(emotion: String, userAsrSentenceId: Int)

Parameters

Parameter

Type

Description

emotion

String

The emotion tag. Valid values: neutral, happy, angry, and sad.

userAsrSentenceId

Int

The ID of the sentence to which the question of the user recognized by the intelligent agent belongs.

ARTCAICallEngineFactory

createEngine

Creates a default engine instance.

public static func createEngine() -> ARTCAICallEngineInterface