All Products
Search
Document Center

Intelligent Media Services:Data Structures

Last Updated:Mar 20, 2026

Read this topic to learn about the data types used in the iOS SDK.

Data Structure Overview

Note

Deprecated parameters and methods exist in older SDK versions. Upgrade to the latest SDK version. For more information, see iOS User Guide.

Structure Type

Data Type

Description

Enum

ARTCAICallAgentType

Agent type

ARTCAICallAgentState

Agent state

ARTCAICallAudioProfile

Audio encoding configuration

ARTCAICallAudioScenario

Audio scenario configuration

ARTCAICallAgentViewMode

Agent view rendering mode

ARTCAICallAgentViewMirrorMode

Agent view mirror mode

ARTCAICallAgentViewRotationMode

Agent view rotation mode

ARTCAICallNetworkQuality

Network Status

ARTCAICallSpeakingInterruptedReason

Reason the agent’s speech was interrupted

ARTCAICallVoiceprintResult

VAD result

ARTCAICallErrorCode

Error code

ARTCAICallConnectionStatus

Network connection status during a call

ARTCAICallTurnDetectionMode

Method to detect when user speech ends

Class

ARTCAICallAgentInfo

Agent runtime information

ARTCAICallAudioConfig

Call audio configuration

ARTCAICallViewConfig

Agent view configuration. Use this class to configure rendering for agents that require it, such as digital humans.

ARTCAICallVisionConfig

Runtime configuration for vision understanding agents

ARTCAICallVisionCustomCaptureRequest

Request model to enable custom frame capture for vision understanding agents

ARTCAICallSendTextToAgentRequest

Send a message to an agent to request the Model.

ARTCAICallConfig

Configuration to start an agent call

ARTCAICallTemplateConfig (deprecated)

TemplateConfig parameter used to start a call

ARTCAICallChatSyncConfig

Chat agent session configuration parameters

ARTCAICallAgentShareConfig

Agent sharing configuration information

ARTCAICallVideoConfig

Local video configuration for calls

ARTCAICallAgentConfig

Agent startup and runtime configuration for calls

ARTCAICallAgentAsrConfig

Speech recognition configuration

ARTCAICallAgentTtsConfig

Speech synthesis configuration

ARTCAICallAgentLlmConfig

Large Language Model (LLM) configuration

ARTCAICallAgentAvatarConfig

Digital human configuration

ARTCAICallAgentInterruptConfig

Interrupt configuration

ARTCAICallAgentVoiceprintConfig

Voiceprint denoising configuration

ARTCAICallAgentTurnDetectionConfig

Turn detection configuration

ARTCAICallAgentVcrResult

VCR detection result

ARTCAICallAgentVcrConfig

VCR configuration

ARTCAICallAgentVcrBaseConfig

Base VCR detection configuration

ARTCAICallAgentVcrFrameMotionConfig

VCR video frame detection configuration

ARTCAICallExperimentalConfig

Experimental parameters used to control specific logic policies

ARTCAICallAgentAmbientConfig

Call environment parameters

ARTCAICallAgentAutoSpeechContent

Agent speech content for auto-speech scenarios, such as acknowledgments and proactive questions

ARTCAICallAgentAutoSpeechLlmPending

Auto-speech configuration for cases where the LLM response is delayed

ARTCAICallAgentAutoSpeechUserIdle

Configuration for agent questions when the user is silent

ARTCAICallAgentBackChanneling

Configuration module for back-channeling. When enabled, the agent randomly plays short acknowledgments at specific trigger points.

Data Structure Details

Enum

ARTCAICallAgentType

Agent type

Enumeration Value

Value

Description

VoiceAgent

0

Voice-only interaction with no visual representation

AvatarAgent

1

Visual representation with support for voice and visual interaction

VisionAgent

2

Focuses on visual information understanding and analysis

VideoAgent

3

Bidirectional video call between the user and the agent

ARTCAICallAgentState

Agent state

Enumeration Value

Value

Description

Listening

1

Listening

Thinking

2

Thinking

Speaking

3

Speaking

ARTCAICallAudioProfile

Audio encoding configuration

Enumeration Value

Value

Description

LowQualityMode

0x0000

Low-quality audio mode. Default sample rate: 8000 Hz. Mono channel. Maximum encoding bitrate: 12 kbps

BasicQualityMode

0x0001

Standard-quality audio mode. Default sample rate: 16000 Hz. Mono channel. Maximum encoding bitrate: 24 kbps

HighQualityMode

0x0010

(Default) High-quality audio mode. Default sample rate: 48000 Hz. Mono channel. Maximum encoding bitrate: 64 kbps

StereoHighQualityMode

0x0011

Stereo high-quality audio mode. Default sample rate: 48000 Hz. Stereo channel. Maximum encoding bitrate: 80 kbps

SuperHighQualityMode

0x0012

Super-high-quality audio mode. Default sample rate: 48000 Hz. Mono channel. Maximum encoding bitrate: 96 kbps

StereoSuperHighQualityMode

0x0013

Stereo super-high-quality audio mode. Default sample rate: 48000 Hz. Stereo channel. Maximum encoding bitrate: 128 kbps

ARTCAICallAudioScenario

Audio scenario configuration

Enumeration Value

Value

Description

DefaultMode

0x0000

Recommended for general real-time communication scenarios

MusicMode

0x0300

High-fidelity music mode. Recommended for music instruction or other scenarios requiring high-quality music reproduction

ARTCAICallAgentViewMode

Agent view rendering mode

Enumeration Value

Value

Description

Auto

0

Auto mode

Stretch

1

Stretch mode

Fill

2

Fill mode

Crop

3

Crop mode

ARTCAICallAgentViewMirrorMode

Agent view mirror mode

Enumeration Value

Value

Description

OnlyFrontCameraPreviewEnabled

0

Mirror only the front camera preview. Do not mirror other views.

AllEnabled

1

Enable mirroring for all views

AllDisabled

2

Disable mirroring for all views

ARTCAICallAgentViewRotationMode

Agent view rotation mode

Enumeration Value

Value

Description

Rotation_0

0

Video view rotation angle: 0 degrees

Rotation_90

1

Video view rotation angle: 90 degrees

Rotation_180

2

Video view rotation angle: 180 degrees

Rotation_270

3

Video view rotation angle: 270 degrees

ARTCAICallNetworkQuality

Network Status

Enumeration Value

Value

Description

Excellent

0

Excellent network quality. Video and audio are smooth and clear

Good

1

Good network quality. Smoothness and clarity are nearly identical to excellent

Poor

2

Poor network quality. Minor issues with smoothness and clarity. Communication remains unaffected

Bad

3

Poor network quality. Severe video stuttering. Audio remains usable for communication

VeryBad

4

Very poor network quality. Communication is nearly impossible

Disconnect

5

Network disconnected

Unknow

6

Unknown

ARTCAICallSpeakingInterruptedReason

Reason the agent’s speech was interrupted

Enumeration Value

Value

Description

unknown

0

Unknown reason

byWords

1

Specific words were detected

byVoice

2

Voice interruption

byInterruptSpeaking

3

The interruptSpeaking API was called

bySpeechBroadCast

4

The voice broadcast was interrupted.

byLlmQuery

5

An active LLM query was interrupted.

ARTCAICallVoiceprintResult

VAD result

Enumeration Value

Value

Description

Off

0

Voiceprint denoising VAD is disabled. AIVAD is also disabled

Unregister

1

Voiceprint denoising VAD is enabled but voiceprint registration is incomplete

DetectedSpeaker

2

Voiceprint denoising VAD is enabled and the main speaker is identified

UndetectedSpeaker

3

Voiceprint denoising VAD is enabled but the main speaker is not identified

DetectedSpeakerWithAIVad

4

AIVAD is enabled and the main speaker is identified

UndetectedSpeakerWithAIVad

5

AIVAD is enabled but the main speaker is not identified

Unknown

100

Unknown

ARTCAICallErrorCode

Error code

Enumeration Value

Value

Description

None

0

Success

InvalidAction

-1

Invalid action

InvalidParames

-2

Invalid parameter

NetworkError

-3

Network error

InternalError

-4

Internal error

BeginCallFailed

-10000

Failed to start the call

ConnectionFailed

-10001

Connection issue

PublishFailed

-10002

Failed to ingest the stream

SubscribeFailed

-10003

Failed to pull the stream

TokenExpired

-10004

Call authentication expired

KickedByUserReplace

-10005

Call failed due to same-name login

KickedBySystem

-10006

Call failed because the system kicked the user out

KickedByChannelTerminated

-10007

Call failed because the channel was destroyed

LocalDeviceException

-10008

Call failed due to local device issues

AgentLeaveChannel

-10101

The agent left the channel (call ended)

AgentPullFailed

-10102

Failed to pull the stream for the agent

AgentASRFailed

-10103

Agent ASR failed

AvatarServiceFailed

-10201

Failed to start the digital agent service

AvatarRoutesExhausted

-10202

Exceeded the maximum number of concurrent digital agent routes

AgentSubscriptionRequired

-10203

Call initiation exceeded the daily free trial quota

AgentNotFound

-10204

Agent not found (agent ID does not exist)

ChatTextMessageSendFailed

-10301

Failed to send the text message

ChatTextMessageReceiveFailed

-10302

Failed to receive the text message

ChatVoiceRecordFailed

-10310

Failed to record the voice message

ChatVoiceMessageSendFailed

-10311

Failed to send the voice message

ChatVoiceMessageReceiveFailed

-10312

Failed to receive the voice message

ChatPlayMessageReceiveFailed

-10321

Failed to receive the playback message

ChatLogNotFound

-10331

Chat log not found

ChatAttachmentUploading

-10332

The attachment is still uploading. Wait until upload completes before sending the message

UnknowError

-40000

Unknown error

ARTCAICallTurnDetectionMode

Method to detect when user speech ends

Enumeration Value

Value

Description

Normal

0

Normal mode. Does not use AI for semantic analysis. Uses ASR silence duration to detect speech end

Semantic

Semantic mode. Uses AI to analyze context and semantics to detect speech end

ARTCAICallConnectionStatus

Network connection status during a call

Enumeration Value

Value

Description

Init

0

Initialization complete

Disconnected

1

Network connection disconnected

Connecting

2

Establishing network connection

Connected

3

Network connected

Reconnecting

4

Reconnecting to the network

Failed

5

Network connection failed

Class

ARTCAICallAgentInfo

Agent runtime information

Property Name

Type

Description

agentId

String

Current agent ID

agentType

ARTCAICallAgentType

Agent type

channelId

String

RTC channel ID where the agent resides

uid

String

Unique identifier for the agent joining the RTC channel

instanceId

String

Instance ID for the current agent runtime

requestId

String

Request ID for starting the current agent

region

String?

Region where the agent resides

ARTCAICallAudioConfig

Specifies the audio configuration for a call.

Property Name

Type

Description

audioProfile

ARTCAICallAudioProfile

Audio encoding configuration. Default: HighQualityMode

audioScenario

ARTCAICallAudioScenario

Audio scenario configuration. Default: ARTCAICallAudioSceneMusicMode

ARTCAICallViewConfig

This class provides agent view configuration, allowing you to configure rendering for agents that require it, such as digital humans.

Property Name

Type

Description

view

UIView

Rendering view

viewMode

ARTCAICallAgentViewMode

Image rendering mode

viewMirrorMode

ARTCAICallAgentViewMirrorMode

Image mirror mode

viewRotationMode

ARTCAICallAgentViewRotationMode

Image rotation mode

ARTCAICallVisionConfig

Specifies the runtime configuration for visual understanding agents.

Property Name

Type

Description

preview

UIView?

Preview. Empty means no preview—only stream ingestion

viewMode

ARTCAICallAgentViewMode

Preview image rendering mode

viewMirrorMode

ARTCAICallAgentViewMirrorMode

Preview image mirror mode

viewRotationMode

ARTCAICallAgentViewRotationMode

Preview image rotation mode

dimensions

CGSize

Stream ingestion resolution

frameRate

Int

Stream ingestion frame rate

bitrate

Int

Stream ingestion bitrate

keyFrameInterval

Int

Stream ingestion keyframe interval (milliseconds)

useHighQualityPreview

Bool

Use high-definition preview. Otherwise, the SDK adjusts automatically

cameraCaptureFrameRate

Int

Preview resolution (default: 15 fps)

ARTCAICallVisionCustomCaptureRequest

A request model that enables custom frame capture for vision understanding agents

Property Name

Type

Description

text

String

Text parameter for multimodal large model requests

enableASR

Bool

Pass ASR results as input to the large model

isSingle

Bool

Single-frame capture

eachDuration

UInt

Frame capture interval (seconds)

num

UInt

Number of images per frame capture

duration

UInt

Duration of continuous frame capture (seconds). Applies only for continuous capture.

userData

String?

JSON string containing custom business information

ARTCAICallSendTextToAgentRequest

A request model for sending text messages to an agent.

Property Name

Type

Description

text

String

Text message to ask the agent, for example: "What is this?"

ARTCAICallConfig

Specifies the configuration for starting an agent call.

Property Name

Type

Description

agentId

String

Agent ID

agentType

ARTCAICallAgentType

Agent type. Must match the agent ID’s type. Otherwise, agent startup fails

agentUserId

String?

Agent UID. If empty, the service assigns one

region

String

Region where the agent service resides. Must match the region of the agent ID. Otherwise, agent startup fails

userId

String

Current user ID

userJoinToken

String

Current user’s join token

userData

[String: Any]?

User-defined information passed to the agent

agentConfig

ARTCAICallAgentConfig?

agentConfig parameter used to start the call

audioConfig

ARTCAICallAudioConfig?

Local audio configuration

videoConfig

ARTCAICallVideoConfig?

Local video configuration. Applies only for VisionAgent or VideoAgent

chatSyncConfig

ARTCAICallChatSyncConfig?

Associated chat agent configuration

templateConfig

ARTCAICallTemplateConfig (deprecated)?

Deprecated. Use agentConfig

ARTCAICallTemplateConfig (deprecated)

The TemplateConfig parameter is used to start a call.

Important

This method is deprecated in versions 2.5 and later. Use ARTCAICallAgentConfig instead.

Property Name

Type

Description

agentGreeting

String?

Agent greeting. Empty uses the agent’s default value. Maximum length: 100 characters

userOnlineTimeout

Int32

Time for the agent to wait before ending the task if the user does not join. Negative values use the server default: 60 seconds

userOfflineTimeout

Int32

Time for the agent to wait before ending the task after the user leaves. Negative values use the server default: 5 seconds

workflowOverrideParams

[String: Any]?

Workflow override parameters

bailianAppParams

[String: Any]?

Alibaba Cloud Model Studio application center parameters

asrMaxSilence

Int32

Voice segmentation threshold. Range: 200–1200 ms. Negative values use the server default: 400 ms

volume

Int32

Agent speech volume. Range: 0–400. Output volume = workflow speech output volume × volume ÷ 100. Negative values use the server default: 100

vadLevel

Int32

VAD sensitivity setting. Default: 11. Valid range: [0, 11]

  • 0 disables VAD.

  • 1–10: Higher numbers make interruptions harder.

  • 11 differs significantly from earlier values. It causes less pre-processing distortion and better noise resistance.

enableVoiceInterrupt

Bool

Enable intelligent interruption

agentVoiceId

String?

Agent voice ID. Empty uses the agent’s default value

enableIntelligentSegment

Bool

Enable intelligent sentence segmentation and merging

useVoiceprint

Bool

Whether to apply voiceprint recognition with denoising to the current utterance.

voiceprintId

String?

Voiceprint ID. Non-empty enables voiceprint denoising for this call

agentMaxIdleTime

Int32

Maximum idle time for the agent (seconds). Negative values use the server default: 600 seconds

llmHistoryLimit

Int32

Maximum history turns retained for LLM/multimodal LLM conversations. Negative values use the server default: 10

enablePushToTalk

Bool

Enable push-to-talk mode

agentGracefulShutdown

Bool

Enable graceful shutdown: finish speaking the current sentence before stopping

agentAvatarId

String?

Digital human model ID. Empty uses the agent’s default value

asrLanguageId

String?

ASR language ID. Empty uses the agent’s default value. Options:

  • Mandarin Chinese

  • en: English

  • zh_en: Chinese-English mixed

  • es: Spanish

  • jp: Japanese

wakeUpQuery

String?

User command before call start. Used for immediate agent response after call starts

llmSystemPrompt

String?

LLM system prompt, for example: “You are a friendly and helpful assistant…”Note: Not supported for LLM nodes using Alibaba Cloud Model Studio workflows

asrHotWords

[String]?

ASR hotword list. Limit: up to 500 words. Each word: up to 10 characters

interruptWords

[String]?

Specific words or phrases that trigger interruption, for example: “Hold on” or “I know”

ARTCAICallChatSyncConfig

Configuration parameters for the associated chat agent session.

Property Name

Type

Description

sessionId

String

Associated chat agent session ID

agentId

String

Associated chat agent ID (must be in the same account and region)

receiverId

String

User ID for the associated chat agent session

ARTCAICallAgentShareConfig

Configuration information for agent sharing

Property Name

Type

Description

shareId

String?

Agent share ID

agentType

ARTCAICallAgentType

Agent workload type

expireTime

Date?

Time-to-live (TTL)

region

String?

Region where the agent resides

templateConfig

String?

Template configuration (JSON string)

userData

[String: Any]?

User-defined information passed to the agent

ARTCAICallVideoConfig

Configuration for the local video in a call

Property Name

Type

Description

dimensions

CGSize

Stream ingestion resolution

frameRate

Int

Stream ingestion frame rate

bitrate

Int

Stream ingestion bitrate

keyFrameInterval

Int

Stream ingestion keyframe interval (milliseconds)

useHighQualityPreview

Bool

Use high-definition preview. Otherwise, the SDK adjusts automatically based on stream ingestion resolution

cameraCaptureFrameRate

Int

Preview resolution

useFrontCameraDefault

Bool

Start with the front camera by default

ARTCAICallAgentConfig

Configuration for starting and running the call agent.

Property Name

Type

Description

agentGreeting

String?

Agent greeting. Empty uses the agent’s default value

wakeUpQuery

String?

User command before call start. Used for immediate agent response after call starts

agentMaxIdleTime

Int32

Maximum idle time for the agent (seconds). The agent shuts down automatically after timeout. Default: 600 seconds

userOnlineTimeout

Int32

Time for the agent to wait before ending the task if the user does not join. Default: 60 seconds

userOfflineTimeout

Int32

Time for the agent to wait before ending the task after the user leaves. Default: 5 seconds

enablePushToTalk

Bool

Enable push-to-talk mode

agentGracefulShutdown

Bool

Enable graceful shutdown

volume

Int32

Agent speech volume. Range: 0–400. Default: 100

workflowOverrideParams

[String: Any]?

Workflow override parameters

enableIntelligentSegment

Bool

Smart sentence segmentation switch

asrConfig

ARTCAICallAgentAsrConfig

Speech recognition configuration

ttsConfig

ARTCAICallAgentTtsConfig

Speech synthesis configuration

llmConfig

ARTCAICallAgentLlmConfig

Large Language Model (LLM) configuration

avatarConfig

ARTCAICallAgentAvatarConfig

Digital human configuration

interruptConfig

ARTCAICallAgentInterruptConfig

Interrupt configuration

voiceprintConfig

ARTCAICallAgentVoiceprintConfig

Voiceprint denoising configuration

turnDetectionConfig

ARTCAICallAgentTurnDetectionConfig

Turn detection configuration

experimentalConfig

ARTCAICallExperimentalConfig

Customized, non-production configuration

vcrConfig

ARTCAICallAgentVcrConfig

VCR configuration

preConnectAudioUrl

String?

Sound effect to play after connection and before the greeting. Supports URL input. The greeting plays after the sound effect.

ambientConfig

ARTCAICallAgentAmbientConfig

Environment configuration

backChannelingConfig

ARTCAICallAgentBackChanneling

Configuration module for back-channeling. When configured, the system randomly plays short acknowledgments at specific trigger points.

autoSpeechForLlmPendingConfig

ARTCAICallAgentAutoSpeechLlmPending

Auto-speech configuration for cases where the LLM response is delayed.

autoSpeechForUserIdleConfig

ARTCAICallAgentAutoSpeechUserIdle

Configuration for agent questions when the user is silent.

ARTCAICallAgentAmbientConfig

Call environment parameters

Property Name

Type

Description

volume

Int32

Background sound volume. Default: 100

resourceId

String?

Resource ID of the background sound registered in the console. An empty string disables it.

ARTCAICallAgentAsrConfig

Speech recognition configuration

Property Name

Type

Description

asrLanguageId

String?

ASR language ID. Empty uses the agent's default value.

asrMaxSilence

Int32

Voice segmentation threshold. Silence exceeding this duration is considered a sentence break. Default: 400 ms. Range: 200–1200 ms.

asrHotWords

[String]?

ASR hotword list. Limit: up to 500 words. Each word: up to 10 characters.

vadLevel

Int32

VAD sensitivity setting. Default: 11. Valid range: [0, 11]

  • 0 disables VAD.

  • 1–10: Higher numbers make interruptions harder.

  • 11 differs significantly from earlier values. It causes less pre-processing distortion and better noise resistance.

customParams

String?

Runtime parameters for custom ASR. Use URL parameter format, for example: "mode=fast&sample=16000&format=wav"

vadDuration

Int32

Minimum duration threshold for voice activity detection, used to adjust interruption sensitivity. Default: 0 (disabled). Valid range: 200–2000 ms. Common range: [200, 500], corresponding to 1 to 4 words. Negative values are not sent to the server (server default is disabled).

asrMaxSilence

Int32

Voice segmentation threshold. Silence exceeding this duration is considered a sentence break. Range: 200–1200 ms. Default: -1. Negative values use the agent's default configuration (console value).

ARTCAICallAgentTtsConfig

Speech synthesis configuration

Property Name

Type

Description

agentVoiceId

String?

Agent voice ID. Empty uses the agent's default value.

pronunciationRules

[[String: Any]]?

Array of pronunciation rules. Up to 20 rules are supported. If nil or empty, no rules are used. Example:

 [
  {
      "Word": "overlap",                       // Target word
      "Pronunciation": "chongdie",              // Replacement pronunciation
      "Type": "replacement"                // Polyphone rule
  },
  {
      "Word": "action",
      "Pronunciation": "hangdong",
      "Type": "replacement"
  }
]

speechRate

Double

TTS playback speed. Supports all TTS types. Range: [0.5, 2.0]. Default: 1.0. Negative values are not sent to the server (uses console configuration).

languageId

String?

TTS playback language code. Valid when TTS type is MiniMax.

emotion

String?

TTS playback emotion type. Valid when TTS type is MiniMax.

modelId

String?

TTS model ID. Currently only supports MiniMax. Options: speech-01-turbo, speech-02-turbo.

speechRate

Double

TTS playback speed. Supports all TTS types. Range: [0.5, 2.0]. Default: -1. Negative values use the agent's default configuration (console value).

ARTCAICallAgentLlmConfig

Large Language Model configuration

Property Name

Type

Description

llmHistoryLimit

Int32

Maximum history turns retained for LLM/multimodal LLM conversations. Default: -1. Negative values use the agent's default configuration (console value).

llmSystemPrompt

String?

LLM system prompt.

bailianAppParams

[String: Any]?

Parameters for the Model Studio Application Center.

llmCompleteReply

boolean

Send the complete LLM result.

Note

When enabled, the complete LLM result is returned via the onLLMReplyCompleted event callback after generation.

openAIExtraQuery

String?

Additional query parameters for OpenAI protocol LLMs.

Note

Parameters must be in key=value format, with multiple parameters joined by '&'. All values must be strings.

outputMinLength

Int32

Minimum text output length (characters). Text shorter than this is cached for concatenation. Range: [0, 100]. A value of 0 or less means no limit. Default: no limit.

outputMaxDelay

Int32

Maximum text output delay (milliseconds). Cached text is forcibly output after this time. Range: [1000, 10000]. A value of 0 or less means no limit. Default: no limit.

historySyncWithTTS

boolean

Sync LLM message history with TTS playback content. Default: false. When enabled, the saved LLM message and TTS playback content are consistent, with minor discrepancies allowed.

Note

When a user interrupts the agent, the <ims_agent_interrupted> tag is inserted at the interruption point in the next message sent to the LLM. For example:

[
  {"role": "user", "content": "Tell me a story."},
  {"role": "assistant", "content": "Okay, I'll tell you a story from the Romance of the Three Kingdoms. Do you<ims_agent_interrupted> want to hear it?"},
  {"role": "user", "content": "Tell me a different one."}
]

ARTCAICallAgentAvatarConfig

Digital human configuration

Property Name

Type

Description

agentAvatarId

String?

Digital human model ID. Empty uses the agent's default value.

ARTCAICallAgentInterruptConfig

Interrupt configuration

Property Name

Type

Description

enableVoiceInterrupt

Bool

Enable intelligent interruption

interruptWords

String?

Specific words or phrases that trigger interruption

noInterruptMode

String?

Controls the ASR text processing policy for user speech when the agent is speaking and intelligent interruption is disabled. Valid values:

  • cache: Caches ASR text and processes it in the next turn after the current turn ends.

  • discard: Discards ASR text immediately.

  • Other values (including empty): Use the server default configuration.

ARTCAICallAgentVoiceprintConfig

Voiceprint denoising configuration

Property Name

Type

Description

useVoiceprint

Bool

Does the current sentence segmentation use voiceprint denoising detection?

voiceprintId

String?

Voiceprint ID. Non-empty enables voiceprint denoising for this call.

ARTCAICallAgentTurnDetectionConfig

Turn detection configuration

Property Name

Type

Description

turnEndWords

[String]?

Specific words to end a turn, for example: "Done" or "I'm finished"

mode

ARTCAICallTurnDetectionMode

Method to detect when user speech ends. Default: Semantic, which uses AI for semantic analysis.

semanticWaitDuration

Int32

Custom wait time for semantic segmentation (milliseconds). Range: [0, 10000]. Negative values are not sent to the server (uses server default of -1, where AI automatically determines the appropriate wait time).

Note

The semanticWaitDuration field is invalid in ARTCAICallTurnDetectionMode.Normal mode.

eagerness

[String]?

This parameter is only effective when mode = "Semantic" and has a higher priority than SemanticWaitDuration. It controls how quickly the AI responds after detecting a user pause:

  • Low: Waits patiently, up to 6 seconds, to reduce the risk of false interruptions.

  • Medium: Balanced mode, up to 4 seconds, suitable for most scenarios.

  • High: Responds quickly, up to 2 seconds, for faster interaction but with a higher risk of cutting off the user.

  • Other values (including empty): Use the server default configuration.

ARTCAICallAgentVcrResult

VCR detection result

Property Name

Type

Description

resultData

[String]?

All VCR detection results returned by the agent

stillFrameMotionResult

FrameMotionResult?

VCR still frame detection result

invalidFrameMotionResult

FrameMotionResult?

VCR invalid frame detection result

peopleCountResult

PeopleCountResult?

VCR real-time people count detection result

equipmentResult

EquipmentResult?

VCR electronic device detection result

headMotionResult

HeadMotionResult?

VCR head motion detection result

lookAwayResult

LookAwayResult?

VCR gaze aversion detection result

LookAwayResult

VCR gaze aversion detection result

Property Name

Type

Description

count

Int32

Total number of gaze aversions up to the current frame

duration

Int32

Total duration of gaze aversions up to the current frame (milliseconds)

ARTCAICallAgentVcrConfig

VCR configuration

Property Name

Type

Description

data

[String]?

Caches the JSON object passed by the user. This object is used later to generate a JSON string, allowing for custom extensions.

stillFrameMotion

ARTCAICallAgentVcrFrameMotionConfig?

VCR still frame detection configuration

invalidFrameMotion

ARTCAICallAgentVcrFrameMotionConfig?

VCR invalid frame detection configuration

peopleCount

ARTCAICallAgentVcrBaseConfig?

VCR real-time people count detection configuration

equipment

ARTCAICallAgentVcrBaseConfig?

VCR electronic device detection configuration

headMotion

ARTCAICallAgentVcrBaseConfig?

VCR head motion detection configuration

lookAway

ARTCAICallAgentVcrBaseConfig?

VCR gaze aversion detection configuration

ARTCAICallAgentVcrBaseConfig

Base VCR detection configuration

Property Name

Type

Description

enable

Boolean

Enable this feature. Enabled by default.

ARTCAICallAgentVcrFrameMotionConfig

VCR video frame detection configuration

Property Name

Type

Description

callbackDelay

Int32

Callback trigger delay in milliseconds. Default: 3000 ms

ARTCAICallExperimentalConfig

Experimental parameters for controlling specific logic policies

Property Name

Type

Description

rtcSdkParams

[String: Any]?

RTC SDK parameters

commonParams

[String: Any]?

Common parameters

ARTCAICallAgentAutoSpeechContent

Agent speech content for auto-speech scenarios (including acknowledgments, proactive questions, etc.)

Property Name

Type

Description

probability

Double

Trigger probability. Range: 0.0–1.0

text

String

Prompt text, UTF-8 encoded. Example: "Are you still there?". Maximum length: 20 characters for acknowledgments, 100 characters for auto-replies.

ARTCAICallAgentAutoSpeechLlmPending

Auto-speech configuration for cases where the LLM response is delayed

Property Name

Type

Description

waitTime

Int32

Wait time threshold in milliseconds. A prompt is triggered after this duration. Range: 500–10000 ms. Cannot be empty.

messages

[ARTCAICallAgentAutoSpeechContent]

Collection of waiting prompts. Maximum 10 items. Each item ≤ 100 characters. Total probability must be 1.0.

ARTCAICallAgentAutoSpeechUserIdle

Configuration for agent questions when the user is silent

Property Name

Type

Description

waitTime

Int32

Silence duration threshold in milliseconds. A question is triggered after this duration. Range: 5000–600000 ms. Recommended: 10000.

maxRepeats

Int32

Maximum number of questions. Range: 0–10. Recommended: 5. After exceeding, no more questions are triggered, and the call is ended.

messages

[ARTCAICallAgentAutoSpeechContent]

Collection of waiting prompts. Maximum 10 items. Each item ≤ 100 characters. Total probability must be 1.0.

ARTCAICallAgentBackChanneling

Back-channeling configuration module

Property Name

Type

Description

enable

boolean

Is the Echo feature enabled?

triggerStage

String

Back-channeling trigger timing

probability

Double

Trigger probability. Range: 0.0–1.0

words

[ARTCAICallAgentAutoSpeechContent]

Collection of acknowledgment phrases. Maximum 10 items. Each item ≤ 20 characters. Total probability must be 1.0.