iOS SDK Key Data Structures for IMS Integration - Intelligent Media Services - Alibaba Cloud - Intelligent Media Services

This topic describes the data types used in the AICallKit iOS SDK.

Data Structure Overview

Note

Deprecated parameters and methods exist in older SDK versions. Upgrade to the latest SDK version. For more information, see iOS User Guide.

Structure Type	Data Type	Description
*Enum*	ARTCAICallAgentType	Agent type
	ARTCAICallAgentState	Agent state
	ARTCAICallAudioProfile	Audio encoding configuration
	ARTCAICallAudioScenario	Audio scenario configuration
	ARTCAICallAgentViewMode	Agent view rendering mode
	ARTCAICallAgentViewMirrorMode	Agent view mirror mode
	ARTCAICallAgentViewRotationMode	Agent view rotation mode
	ARTCAICallNetworkQuality	Network Status
	ARTCAICallSpeakingInterruptedReason	Reason the agent’s speech was interrupted
	ARTCAICallVoiceprintResult	VAD result
	ARTCAICallErrorCode	Error code
	ARTCAICallConnectionStatus	Network connection status during a call
	ARTCAICallTurnDetectionMode	Method to detect when user speech ends
*Class*	ARTCAICallAgentInfo	Agent runtime information
	ARTCAICallAudioConfig	Call audio configuration
	ARTCAICallViewConfig	Agent view configuration for agents that require rendering, such as digital humans.
	ARTCAICallVisionConfig	Runtime configuration for vision understanding agents
	ARTCAICallVisionCustomCaptureRequest	Request model to enable custom frame capture for vision understanding agents
	ARTCAICallSendTextToAgentRequest	Send a text message to an agent to query the model.
	ARTCAICallConfig	Configuration to start an agent call
	ARTCAICallTemplateConfig (deprecated)	TemplateConfig parameter used to start a call
	ARTCAICallChatSyncConfig	Chat agent session configuration parameters
	ARTCAICallAgentShareConfig	Agent sharing configuration information
	ARTCAICallVideoConfig	Local video configuration for calls
	ARTCAICallAgentConfig	Agent startup and runtime configuration for calls
	ARTCAICallAgentAsrConfig	Speech recognition configuration
	ARTCAICallAgentTtsConfig	Speech synthesis configuration
	ARTCAICallAgentLlmConfig	Large Language Model (LLM) configuration
	ARTCAICallAgentAvatarConfig	Digital human configuration
	ARTCAICallAgentInterruptConfig	Interrupt configuration
	ARTCAICallAgentVoiceprintConfig	Voiceprint denoising configuration
	ARTCAICallAgentTurnDetectionConfig	Turn detection configuration
	ARTCAICallAgentVcrResult	VCR detection result
	ARTCAICallAgentVcrConfig	VCR configuration
	ARTCAICallAgentVcrBaseConfig	Base VCR detection configuration
	ARTCAICallAgentVcrFrameMotionConfig	VCR video frame detection configuration
	ARTCAICallExperimentalConfig	Experimental parameters that control specific logic policies
	ARTCAICallAgentAmbientConfig	Call environment parameters
	ARTCAICallAgentAutoSpeechContent	Agent speech content for auto-speech scenarios, such as acknowledgments and proactive questions
	ARTCAICallAgentAutoSpeechLlmPending	Auto-speech configuration for cases where the LLM response is delayed
	ARTCAICallAgentAutoSpeechUserIdle	Configuration for agent questions when the user is silent
	ARTCAICallAgentBackChanneling	Back-channeling configuration. When enabled, the agent randomly plays short acknowledgments at specific trigger points.

Data Structure Details

Enum

ARTCAICallAgentType

Agent type

Enumeration Value	Value	Description
VoiceAgent	0	Voice-only interaction with no visual representation
AvatarAgent	1	Visual representation with support for voice and visual interaction
VisionAgent	2	Focuses on visual information understanding and analysis
VideoAgent	3	Bidirectional video call between the user and the agent

ARTCAICallAgentState

Agent state

Enumeration Value	Value	Description
Listening	1	Listening
Thinking	2	Thinking
Speaking	3	Speaking

ARTCAICallAudioProfile

Audio encoding configuration

Enumeration Value	Value	Description
LowQualityMode	0x0000	Low-quality audio mode. Default sample rate: 8000 Hz. Mono channel. Maximum encoding bitrate: 12 kbps
BasicQualityMode	0x0001	Standard-quality audio mode. Default sample rate: 16000 Hz. Mono channel. Maximum encoding bitrate: 24 kbps
HighQualityMode	0x0010	(Default) High-quality audio mode. Default sample rate: 48000 Hz. Mono channel. Maximum encoding bitrate: 64 kbps
StereoHighQualityMode	0x0011	Stereo high-quality audio mode. Default sample rate: 48000 Hz. Stereo channel. Maximum encoding bitrate: 80 kbps
SuperHighQualityMode	0x0012	Super-high-quality audio mode. Default sample rate: 48000 Hz. Mono channel. Maximum encoding bitrate: 96 kbps
StereoSuperHighQualityMode	0x0013	Stereo super-high-quality audio mode. Default sample rate: 48000 Hz. Stereo channel. Maximum encoding bitrate: 128 kbps

ARTCAICallAudioScenario

Audio scenario configuration

Enumeration Value	Value	Description
DefaultMode	0x0000	Recommended for general real-time communication scenarios
MusicMode	0x0300	High-fidelity music mode. Recommended for music instruction or other scenarios requiring high-quality music reproduction

ARTCAICallAgentViewMode

Agent view rendering mode

Enumeration Value	Value	Description
Auto	0	Auto mode
Stretch	1	Stretch mode
Fill	2	Fill mode
Crop	3	Crop mode

ARTCAICallAgentViewMirrorMode

Agent view mirror mode

Enumeration Value	Value	Description
OnlyFrontCameraPreviewEnabled	0	Mirror only the front camera preview. Do not mirror other views.
AllEnabled	1	Enable mirroring for all views
AllDisabled	2	Disable mirroring for all views

ARTCAICallAgentViewRotationMode

Agent view rotation mode

Enumeration Value	Value	Description
Rotation_0	0	Video view rotation angle: 0 degrees
Rotation_90	1	Video view rotation angle: 90 degrees
Rotation_180	2	Video view rotation angle: 180 degrees
Rotation_270	3	Video view rotation angle: 270 degrees

ARTCAICallNetworkQuality

Network Status

Enumeration Value	Value	Description
Excellent	0	Excellent network quality. Video and audio are smooth and clear
Good	1	Good network quality. Smoothness and clarity are nearly identical to excellent
Poor	2	Poor network quality. Minor issues with smoothness and clarity. Communication remains unaffected
Bad	3	Poor network quality. Severe video stuttering. Audio remains usable for communication
VeryBad	4	Very poor network quality. Communication is nearly impossible
Disconnect	5	Network disconnected
Unknow	6	Unknown

ARTCAICallSpeakingInterruptedReason

Reason the agent’s speech was interrupted

Enumeration Value	Value	Description
unknown	0	Unknown reason
byWords	1	Specific words were detected
byVoice	2	Voice interruption
byInterruptSpeaking	3	The interruptSpeaking API was called
bySpeechBroadCast	4	The voice broadcast was interrupted.
byLlmQuery	5	An active LLM query was interrupted.

ARTCAICallVoiceprintResult

VAD result

Enumeration Value	Value	Description
Off	0	Voiceprint denoising VAD is disabled. AIVAD is also disabled
Unregister	1	Voiceprint denoising VAD is enabled but voiceprint registration is incomplete
DetectedSpeaker	2	Voiceprint denoising VAD is enabled and the main speaker is identified
UndetectedSpeaker	3	Voiceprint denoising VAD is enabled but the main speaker is not identified
DetectedSpeakerWithAIVad	4	AIVAD is enabled and the main speaker is identified
UndetectedSpeakerWithAIVad	5	AIVAD is enabled but the main speaker is not identified
Unknown	100	Unknown

ARTCAICallErrorCode

Error code

Enumeration Value	Value	Description
None	0	Success
InvalidAction	-1	Invalid action
InvalidParames	-2	Invalid parameter
NetworkError	-3	Network error
InternalError	-4	Internal error
BeginCallFailed	-10000	Failed to start the call
ConnectionFailed	-10001	Connection issue
PublishFailed	-10002	Failed to ingest the stream
SubscribeFailed	-10003	Failed to pull the stream
TokenExpired	-10004	Call authentication expired
KickedByUserReplace	-10005	Call failed due to same-name login
KickedBySystem	-10006	Call failed because the system kicked the user out
KickedByChannelTerminated	-10007	Call failed because the channel was destroyed
LocalDeviceException	-10008	Call failed due to local device issues
AgentLeaveChannel	-10101	The agent left the channel (call ended)
AgentPullFailed	-10102	Failed to pull the stream for the agent
AgentASRFailed	-10103	Agent ASR failed
AvatarServiceFailed	-10201	Failed to start the digital agent service
AvatarRoutesExhausted	-10202	Exceeded the maximum number of concurrent digital agent routes
AgentSubscriptionRequired	-10203	Call initiation exceeded the daily free trial quota
AgentNotFound	-10204	Agent not found (agent ID does not exist)
ChatTextMessageSendFailed	-10301	Failed to send the text message
ChatTextMessageReceiveFailed	-10302	Failed to receive the text message
ChatVoiceRecordFailed	-10310	Failed to record the voice message
ChatVoiceMessageSendFailed	-10311	Failed to send the voice message
ChatVoiceMessageReceiveFailed	-10312	Failed to receive the voice message
ChatPlayMessageReceiveFailed	-10321	Failed to receive the playback message
ChatLogNotFound	-10331	Chat log not found
ChatAttachmentUploading	-10332	The attachment is still uploading. Wait until upload completes before sending the message
UnknowError	-40000	Unknown error

ARTCAICallTurnDetectionMode

Method to detect when user speech ends

Enumeration Value	Value	Description
Normal	0	Normal mode. Does not use AI for semantic analysis. Uses ASR silence duration to detect speech end
Semantic		Semantic mode. Uses AI to analyze context and semantics to detect speech end

ARTCAICallConnectionStatus

Network connection status during a call

Enumeration Value	Value	Description
Init	0	Initialization complete
Disconnected	1	Network connection disconnected
Connecting	2	Establishing network connection
Connected	3	Network connected
Reconnecting	4	Reconnecting to the network
Failed	5	Network connection failed

Class

ARTCAICallAgentInfo

Agent runtime information

Property Name	Type	Description
agentId	String	Current agent ID
agentType	ARTCAICallAgentType	Agent type
channelId	String	RTC channel ID where the agent resides
uid	String	Unique identifier for the agent joining the RTC channel
instanceId	String	Instance ID for the current agent runtime
requestId	String	Request ID for starting the current agent
region	String?	Region where the agent resides

ARTCAICallAudioConfig

Audio configuration for a call.

Property Name	Type	Description
audioProfile	ARTCAICallAudioProfile	Audio encoding configuration. Default: HighQualityMode
audioScenario	ARTCAICallAudioScenario	Audio scenario configuration. Default: ARTCAICallAudioSceneMusicMode

ARTCAICallViewConfig

Agent view configuration for agents that require rendering, such as digital humans.

Property Name	Type	Description
view	UIView	Rendering view
viewMode	ARTCAICallAgentViewMode	Image rendering mode
viewMirrorMode	ARTCAICallAgentViewMirrorMode	Image mirror mode
viewRotationMode	ARTCAICallAgentViewRotationMode	Image rotation mode

ARTCAICallVisionConfig

Runtime configuration for vision understanding agents.

Property Name	Type	Description
preview	UIView?	Preview. Empty means no preview—only stream ingestion
viewMode	ARTCAICallAgentViewMode	Preview image rendering mode
viewMirrorMode	ARTCAICallAgentViewMirrorMode	Preview image mirror mode
viewRotationMode	ARTCAICallAgentViewRotationMode	Preview image rotation mode
dimensions	CGSize	Stream ingestion resolution
frameRate	Int	Stream ingestion frame rate
bitrate	Int	Stream ingestion bitrate
keyFrameInterval	Int	Stream ingestion keyframe interval (milliseconds)
useHighQualityPreview	Bool	Use high-definition preview. Otherwise, the SDK adjusts automatically
cameraCaptureFrameRate	Int	Preview resolution (default: 15 fps)

ARTCAICallVisionCustomCaptureRequest

Custom frame capture request for vision understanding agents

Property Name	Type	Description
text	String	Text parameter for multimodal large model requests
enableASR	Bool	Pass ASR results as input to the large model
isSingle	Bool	Single-frame capture
eachDuration	UInt	Frame capture interval (seconds)
num	UInt	Number of images per frame capture
duration	UInt	Duration of continuous frame capture (seconds). Applies only for continuous capture.
userData	String?	JSON string containing custom business information

ARTCAICallSendTextToAgentRequest

Request for sending text messages to an agent.

Property Name	Type	Description
text	String	Text message to ask the agent, for example: "What is this?"

ARTCAICallConfig

Configuration for starting an agent call.

Property Name	Type	Description
agentId	String	Agent ID
agentType	ARTCAICallAgentType	Agent type. Must match the agent ID’s type. Otherwise, agent startup fails
agentUserId	String?	Agent UID. If empty, the service assigns one
region	String	Region where the agent service resides. Must match the region of the agent ID. Otherwise, agent startup fails
userId	String	Current user ID
userJoinToken	String	Current user’s join token
userData	[String: Any]?	User-defined information passed to the agent
agentConfig	ARTCAICallAgentConfig?	agentConfig parameter used to start the call
audioConfig	ARTCAICallAudioConfig?	Local audio configuration
videoConfig	ARTCAICallVideoConfig?	Local video configuration. Applies only for VisionAgent or VideoAgent
chatSyncConfig	ARTCAICallChatSyncConfig?	Associated chat agent configuration
templateConfig	ARTCAICallTemplateConfig (deprecated)?	Deprecated. Use `agentConfig`

ARTCAICallTemplateConfig (deprecated)

TemplateConfig parameter for starting a call.

Important

This method is deprecated in versions 2.5 and later. Use ARTCAICallAgentConfig instead.

Property Name	Type	Description
agentGreeting	String?	Agent greeting. Empty uses the agent’s default value. Maximum length: 100 characters
userOnlineTimeout	Int32	Time for the agent to wait before ending the task if the user does not join. Negative values use the server default: 60 seconds
userOfflineTimeout	Int32	Time for the agent to wait before ending the task after the user leaves. Negative values use the server default: 5 seconds
workflowOverrideParams	[String: Any]?	Workflow override parameters
bailianAppParams	[String: Any]?	Alibaba Cloud Model Studio application center parameters
asrMaxSilence	Int32	Voice segmentation threshold. Range: 200–1200 ms. Negative values use the server default: 400 ms
volume	Int32	Agent speech volume. Range: 0–400. Output volume = workflow speech output volume × volume ÷ 100. Negative values use the server default: 100
vadLevel	Int32	VAD sensitivity setting. Default: 11. Valid range: [0, 11] 0 disables VAD. 1–10: Higher numbers make interruptions harder. 11 differs significantly from earlier values. It causes less pre-processing distortion and better noise resistance.
enableVoiceInterrupt	Bool	Enable intelligent interruption
agentVoiceId	String?	Agent voice ID. Empty uses the agent’s default value
enableIntelligentSegment	Bool	Enable intelligent sentence segmentation and merging
useVoiceprint	Bool	Whether to apply voiceprint recognition with denoising to the current utterance.
voiceprintId	String?	Voiceprint ID. Non-empty enables voiceprint denoising for this call
agentMaxIdleTime	Int32	Maximum idle time for the agent (seconds). Negative values use the server default: 600 seconds
llmHistoryLimit	Int32	Maximum history turns retained for LLM/multimodal LLM conversations. Negative values use the server default: 10
enablePushToTalk	Bool	Enable push-to-talk mode
agentGracefulShutdown	Bool	Enable graceful shutdown: finish speaking the current sentence before stopping
agentAvatarId	String?	Digital human model ID. Empty uses the agent’s default value
asrLanguageId	String?	ASR language ID. Empty uses the agent’s default value. Options: Mandarin Chinese en: English zh_en: Chinese-English mixed es: Spanish jp: Japanese
wakeUpQuery	String?	User command before call start. Used for immediate agent response after call starts
llmSystemPrompt	String?	LLM system prompt, for example: “You are a friendly and helpful assistant…”Note: Not supported for LLM nodes using Alibaba Cloud Model Studio workflows
asrHotWords	[String]?	ASR hotword list. Limit: up to 500 words. Each word: up to 10 characters
interruptWords	[String]?	Specific words or phrases that trigger interruption, for example: “Hold on” or “I know”

ARTCAICallChatSyncConfig

Chat agent session configuration.

Property Name	Type	Description
sessionId	String	Associated chat agent session ID
agentId	String	Associated chat agent ID (must be in the same account and region)
receiverId	String	User ID for the associated chat agent session

ARTCAICallAgentShareConfig

Agent sharing configuration

Property Name	Type	Description
shareId	String?	Agent share ID
agentType	ARTCAICallAgentType	Agent workload type
expireTime	Date?	Time-to-live (TTL)
region	String?	Region where the agent resides
templateConfig	String?	Template configuration (JSON string)
userData	[String: Any]?	User-defined information passed to the agent

ARTCAICallVideoConfig

Configuration for the local video in a call

Property Name	Type	Description
dimensions	CGSize	Stream ingestion resolution
frameRate	Int	Stream ingestion frame rate
bitrate	Int	Stream ingestion bitrate
keyFrameInterval	Int	Stream ingestion keyframe interval (milliseconds)
useHighQualityPreview	Bool	Use high-definition preview. Otherwise, the SDK adjusts automatically based on stream ingestion resolution
cameraCaptureFrameRate	Int	Preview resolution
useFrontCameraDefault	Bool	Start with the front camera by default

ARTCAICallAgentConfig

Configuration for starting and running the call agent.

Property Name	Type	Description
agentGreeting	String?	Agent greeting. Empty uses the agent’s default value
wakeUpQuery	String?	User command before call start. Used for immediate agent response after call starts
agentMaxIdleTime	Int32	Maximum idle time for the agent (seconds). The agent shuts down automatically after timeout. Default: 600 seconds
userOnlineTimeout	Int32	Time for the agent to wait before ending the task if the user does not join. Default: 60 seconds
userOfflineTimeout	Int32	Time for the agent to wait before ending the task after the user leaves. Default: 5 seconds
enablePushToTalk	Bool	Enable push-to-talk mode
agentGracefulShutdown	Bool	Enable graceful shutdown
volume	Int32	Agent speech volume. Range: 0–400. Default: 100
workflowOverrideParams	[String: Any]?	Workflow override parameters
enableIntelligentSegment	Bool	Smart sentence segmentation switch
asrConfig	ARTCAICallAgentAsrConfig	Speech recognition configuration
ttsConfig	ARTCAICallAgentTtsConfig	Speech synthesis configuration
llmConfig	ARTCAICallAgentLlmConfig	Large Language Model (LLM) configuration
avatarConfig	ARTCAICallAgentAvatarConfig	Digital human configuration
interruptConfig	ARTCAICallAgentInterruptConfig	Interrupt configuration
voiceprintConfig	ARTCAICallAgentVoiceprintConfig	Voiceprint denoising configuration
turnDetectionConfig	ARTCAICallAgentTurnDetectionConfig	Turn detection configuration
experimentalConfig	ARTCAICallExperimentalConfig	Customized, non-production configuration
vcrConfig	ARTCAICallAgentVcrConfig	VCR configuration
preConnectAudioUrl	String?	Sound effect to play after connection and before the greeting. Supports URL input. The greeting plays after the sound effect.
ambientConfig	ARTCAICallAgentAmbientConfig	Environment configuration
backChannelingConfig	ARTCAICallAgentBackChanneling	Back-channeling configuration. When configured, the system randomly plays short acknowledgments at specific trigger points.
autoSpeechForLlmPendingConfig	ARTCAICallAgentAutoSpeechLlmPending	Auto-speech configuration for cases where the LLM response is delayed.
autoSpeechForUserIdleConfig	ARTCAICallAgentAutoSpeechUserIdle	Configuration for agent questions when the user is silent.

ARTCAICallAgentAmbientConfig

Call environment parameters

Property Name	Type	Description
volume	Int32	Background sound volume. Default: 100
resourceId	String?	Resource ID of the background sound registered in the console. An empty string disables it.

ARTCAICallAgentAsrConfig

Speech recognition configuration

Property Name	Type	Description
asrLanguageId	String?	ASR language ID. Empty uses the agent's default value.
asrMaxSilence	Int32	Voice segmentation threshold. Silence exceeding this duration is considered a sentence break. Default: 400 ms. Range: 200–1200 ms.
asrHotWords	[String]?	ASR hotword list. Limit: up to 500 words. Each word: up to 10 characters.
vadLevel	Int32	VAD sensitivity setting. Default: 11. Valid range: [0, 11] 0 disables VAD. 1–10: Higher numbers make interruptions harder. 11 differs significantly from earlier values. It causes less pre-processing distortion and better noise resistance.
customParams	String?	Runtime parameters for custom ASR. Use URL parameter format, for example: "mode=fast&sample=16000&format=wav"
vadDuration	Int32	Minimum duration threshold for voice activity detection, used to adjust interruption sensitivity. Default: 0 (disabled). Valid range: 200–2000 ms. Common range: [200, 500], corresponding to 1 to 4 words. Negative values are not sent to the server (server default is disabled).
asrMaxSilence	Int32	Voice segmentation threshold. Silence exceeding this duration is considered a sentence break. Range: 200–1200 ms. Default: -1. Negative values use the agent's default configuration (console value).

ARTCAICallAgentTtsConfig

Speech synthesis configuration

Property Name	Type	Description
agentVoiceId	String?	Agent voice ID. Empty uses the agent's default value.
pronunciationRules	[[String: Any]]?	Array of pronunciation rules. Up to 20 rules are supported. If nil or empty, no rules are used. Example: `[ { "Word": "overlap", // Target word "Pronunciation": "chongdie", // Replacement pronunciation "Type": "replacement" // Polyphone rule }, { "Word": "action", "Pronunciation": "hangdong", "Type": "replacement" } ]`
speechRate	Double	TTS playback speed. Supports all TTS types. Range: [0.5, 2.0]. Default: 1.0. Negative values are not sent to the server (uses console configuration).
languageId	String?	TTS playback language code. Valid when TTS type is MiniMax.
emotion	String?	TTS playback emotion type. Valid when TTS type is MiniMax.
modelId	String?	TTS model ID. Currently only supports MiniMax. Options: speech-01-turbo, speech-02-turbo.
speechRate	Double	TTS playback speed. Supports all TTS types. Range: [0.5, 2.0]. Default: -1. Negative values use the agent's default configuration (console value).

ARTCAICallAgentLlmConfig

Large Language Model configuration

Property Name	Type	Description
llmHistoryLimit	Int32	Maximum history turns retained for LLM/multimodal LLM conversations. Default: -1. Negative values use the agent's default configuration (console value).
llmSystemPrompt	String?	LLM system prompt.
bailianAppParams	[String: Any]?	Parameters for the Model Studio Application Center.
llmCompleteReply	boolean	Send the complete LLM result. Note When enabled, the complete LLM result is returned via the onLLMReplyCompleted event callback after generation.
openAIExtraQuery	String?	Additional query parameters for OpenAI protocol LLMs. Note Parameters must be in key=value format, with multiple parameters joined by '&'. All values must be strings.
outputMinLength	Int32	Minimum text output length (characters). Text shorter than this is cached for concatenation. Range: [0, 100]. A value of 0 or less means no limit. Default: no limit.
outputMaxDelay	Int32	Maximum text output delay (milliseconds). Cached text is forcibly output after this time. Range: [1000, 10000]. A value of 0 or less means no limit. Default: no limit.
historySyncWithTTS	boolean	Sync LLM message history with TTS playback content. Default: false. When enabled, the saved LLM message and TTS playback content are consistent, with minor discrepancies allowed. Note When a user interrupts the agent, the `<ims_agent_interrupted>` tag is inserted at the interruption point in the next message sent to the LLM. For example: `[ {"role": "user", "content": "Tell me a story."}, {"role": "assistant", "content": "Okay, I'll tell you a story from the Romance of the Three Kingdoms. Do you<ims_agent_interrupted> want to hear it?"}, {"role": "user", "content": "Tell me a different one."} ]`

ARTCAICallAgentAvatarConfig

Digital human configuration

Property Name	Type	Description
agentAvatarId	String?	Digital human model ID. Empty uses the agent's default value.

ARTCAICallAgentInterruptConfig

Interrupt configuration

Property Name	Type	Description
enableVoiceInterrupt	Bool	Enable intelligent interruption
interruptWords	String?	Specific words or phrases that trigger interruption
noInterruptMode	String?	Controls the ASR text processing policy for user speech when the agent is speaking and intelligent interruption is disabled. Valid values: cache: Caches ASR text and processes it in the next turn after the current turn ends. discard: Discards ASR text immediately. Other values (including empty): Use the server default configuration.

ARTCAICallAgentVoiceprintConfig

Voiceprint denoising configuration

Property Name	Type	Description
useVoiceprint	Bool	Does the current sentence segmentation use voiceprint denoising detection?
voiceprintId	String?	Voiceprint ID. Non-empty enables voiceprint denoising for this call.

ARTCAICallAgentTurnDetectionConfig

Turn detection configuration

Property Name	Type	Description
turnEndWords	[String]?	Specific words to end a turn, for example: "Done" or "I'm finished"
mode	ARTCAICallTurnDetectionMode	Method to detect when user speech ends. Default: Semantic, which uses AI for semantic analysis.
semanticWaitDuration	Int32	Custom wait time for semantic segmentation (milliseconds). Range: [0, 10000]. Negative values are not sent to the server (uses server default of -1, where AI automatically determines the appropriate wait time). Note The semanticWaitDuration field is invalid in ARTCAICallTurnDetectionMode.Normal mode.
eagerness	[String]?	This parameter is only effective when `mode = "Semantic"` and has a higher priority than `SemanticWaitDuration`. It controls how quickly the AI responds after detecting a user pause: Low: Waits patiently, up to 6 seconds, to reduce the risk of false interruptions. Medium: Balanced mode, up to 4 seconds, suitable for most scenarios. High: Responds quickly, up to 2 seconds, for faster interaction but with a higher risk of cutting off the user. Other values (including empty): Use the server default configuration.

ARTCAICallAgentVcrResult

VCR detection result

Property Name	Type	Description
resultData	[String]?	All VCR detection results returned by the agent
stillFrameMotionResult	FrameMotionResult?	VCR still frame detection result
invalidFrameMotionResult	FrameMotionResult?	VCR invalid frame detection result
peopleCountResult	PeopleCountResult?	VCR real-time people count detection result
equipmentResult	EquipmentResult?	VCR electronic device detection result
headMotionResult	HeadMotionResult?	VCR head motion detection result
lookAwayResult	LookAwayResult?	VCR gaze aversion detection result

LookAwayResult

VCR gaze aversion detection result

Property Name	Type	Description
count	Int32	Total number of gaze aversions up to the current frame
duration	Int32	Total duration of gaze aversions up to the current frame (milliseconds)

ARTCAICallAgentVcrConfig

VCR configuration

Property Name	Type	Description
data	[String]?	Caches the JSON object passed by the user. This object is used later to generate a JSON string, allowing for custom extensions.
stillFrameMotion	ARTCAICallAgentVcrFrameMotionConfig?	VCR still frame detection configuration
invalidFrameMotion	ARTCAICallAgentVcrFrameMotionConfig?	VCR invalid frame detection configuration
peopleCount	ARTCAICallAgentVcrBaseConfig?	VCR real-time people count detection configuration
equipment	ARTCAICallAgentVcrBaseConfig?	VCR electronic device detection configuration
headMotion	ARTCAICallAgentVcrBaseConfig?	VCR head motion detection configuration
lookAway	ARTCAICallAgentVcrBaseConfig?	VCR gaze aversion detection configuration

ARTCAICallAgentVcrBaseConfig

Base VCR detection configuration

Property Name	Type	Description
enable	Boolean	Enable this feature. Enabled by default.

ARTCAICallAgentVcrFrameMotionConfig

VCR video frame detection configuration

Property Name	Type	Description
callbackDelay	Int32	Callback trigger delay in milliseconds. Default: 3000 ms

ARTCAICallExperimentalConfig

Experimental parameters that control specific logic policies

Property Name	Type	Description
rtcSdkParams	[String: Any]?	RTC SDK parameters
commonParams	[String: Any]?	Common parameters

ARTCAICallAgentAutoSpeechContent

Agent speech content for auto-speech scenarios (including acknowledgments, proactive questions, etc.)

Property Name	Type	Description
probability	Double	Trigger probability. Range: 0.0–1.0
text	String	Prompt text, UTF-8 encoded. Example: "Are you still there?". Maximum length: 20 characters for acknowledgments, 100 characters for auto-replies.

ARTCAICallAgentAutoSpeechLlmPending

Auto-speech configuration for cases where the LLM response is delayed

Property Name	Type	Description
waitTime	Int32	Wait time threshold in milliseconds. A prompt is triggered after this duration. Range: 500–10000 ms. Cannot be empty.
messages	[ARTCAICallAgentAutoSpeechContent]	Collection of waiting prompts. Maximum 10 items. Each item ≤ 100 characters. Total probability must be 1.0.

ARTCAICallAgentAutoSpeechUserIdle

Configuration for agent questions when the user is silent

Property Name	Type	Description
waitTime	Int32	Silence duration threshold in milliseconds. A question is triggered after this duration. Range: 5000–600000 ms. Recommended: 10000.
maxRepeats	Int32	Maximum number of questions. Range: 0–10. Recommended: 5. After exceeding, no more questions are triggered, and the call is ended.
messages	[ARTCAICallAgentAutoSpeechContent]	Collection of waiting prompts. Maximum 10 items. Each item ≤ 100 characters. Total probability must be 1.0.

ARTCAICallAgentBackChanneling

Back-channeling configuration module

Property Name	Type	Description
enable	boolean	Is the Echo feature enabled?
triggerStage	String	Back-channeling trigger timing
probability	Double	Trigger probability. Range: 0.0–1.0
words	[ARTCAICallAgentAutoSpeechContent]	Collection of acknowledgment phrases. Maximum 10 items. Each item ≤ 20 characters. Total probability must be 1.0.