This topic describes the data types used by the iOS software development kit (SDK).
Overview
Earlier SDK versions contain deprecated parameters and methods. You should upgrade the SDK to the latest version. For more information, see iOS guide.
Structure type | Data type | Description |
Enum | The agent type. | |
The agent status. | ||
The audio encoding configuration. | ||
The audio scenario configuration. | ||
The rendering mode of the agent view. | ||
The mirror mode of the agent view. | ||
The rotation mode of the agent view. | ||
The network quality. | ||
The reason why the agent's current speech was interrupted. | ||
The voice activity detection (VAD) feedback result. | ||
The error code. | ||
The mode for determining whether the user has finished speaking. | ||
Class | The agent runtime information. | |
The call audio configuration. | ||
The agent view configuration. Set this class when the agent needs rendering, such as for a digital human. | ||
The visual understanding agent runtime configuration. | ||
The request model for the visual understanding agent to enable custom frame capture. | ||
The request model for sending a text message to the agent. | ||
The configuration for starting an agent call. | ||
The TemplateConfig parameter for starting a call. | ||
The configuration parameters for the associated chat agent session. | ||
The agent sharing configuration information. | ||
The on-premises video configuration for the call. | ||
The startup and runtime configuration for the call agent. | ||
The speech recognition configuration. | ||
The speech synthesis configuration. | ||
The large language model configuration. | ||
The digital human configuration. | ||
The interruption configuration. | ||
The voiceprint-based noise suppression configuration. | ||
The turn detection configuration. | ||
The VCR detection result. | ||
The VCR configuration. | ||
The basic VCR detection configuration. | ||
The VCR video frame detection configuration. | ||
Experimental parameters for controlling specific logic policies. |
Data structure details
Enum
ARTCAICallAgentType
The agent type.
Enumeration value | Value | Description |
VoiceAgent | 0 | A voice-only agent that does not have a visual representation. |
AvatarAgent | 1 | A virtual avatar that supports both voice and visual interactions. |
VisionAgent | 2 | An agent that understands and analyzes visual information. |
VideoAgent | 3 | A video call agent that supports bidirectional video calls between the user and the agent. |
ARTCAICallAgentState
The agent status.
Enumeration value | Value | Description |
Listening | 1 | The agent is listening. |
Thinking | 2 | The agent is thinking. |
Speaking | 3 | Speech |
ARTCAICallAudioProfile
The audio encoding configuration.
Enumeration value | Value | Description |
LowQualityMode | 0x0000 | Low-quality audio mode. Default settings: 8000 Hz sample rate, mono, and a maximum encoding bitrate of 12 kbps. |
BasicQualityMode | 0x0001 | Standard-quality audio mode. Default settings: 16000 Hz sample rate, mono, and a maximum encoding bitrate of 24 kbps. |
HighQualityMode | 0x0010 | (Default) High-quality audio mode. Default settings: 48000 Hz sample rate, mono, and a maximum encoding bitrate of 64 kbps. |
StereoHighQualityMode | 0x0011 | Stereo high-quality audio mode. Default settings: 48000 Hz sample rate, stereo, and a maximum encoding bitrate of 80 kbps. |
SuperHighQualityMode | 0x0012 | Super high-quality audio mode. Default settings: 48000 Hz sample rate, mono, and a maximum encoding bitrate of 96 kbps. |
StereoSuperHighQualityMode | 0x0013 | Stereo super high-quality audio mode. Default settings: 48000 Hz sample rate, stereo, and a maximum encoding bitrate of 128 kbps. |
ARTCAICallAudioScenario
The audio scenario configuration.
Enumeration value | Value | Description |
DefaultMode | 0x0000 | Recommended for general audio and video communication scenarios. |
MusicMode | 0x0300 | The default scenario. Recommended for music scenarios that require high-fidelity music quality, such as musical instrument instruction. |
ARTCAICallAgentViewMode
The rendering mode of the agent view.
Enumeration value | Value | Description |
Auto | 0 | Automatic mode |
Stretch | 1 | Stretch mode |
Fill | 2 | Fill mode |
Crop | 3 | Crop mode |
ARTCAICallAgentViewMirrorMode
The mirror mode of the agent view.
Enumeration value | Value | Description |
OnlyFrontCameraPreviewEnabled | 0 | Only the preview from the front camera is mirrored. Other views are not mirrored. |
AllEnabled | 1 | All views are mirrored. |
AllDisabled | 2 | All views are not mirrored. |
ARTCAICallAgentViewRotationMode
The rotation mode of the agent view.
Enumeration value | Value | Description |
Rotation_0 | 0 | The video view is rotated by 0 degrees. |
Rotation_90 | 1 | The video view is rotated by 90 degrees. |
Rotation_180 | 2 | The video view is rotated by 180 degrees. |
Rotation_270 | 3 | The video view is rotated by 270 degrees. |
ARTCAICallNetworkQuality
The network quality.
Enumeration value | Value | Description |
Excellent | 0 | Excellent network. The video stream is smooth and has high definition. |
Good | 1 | Good network. The smoothness and definition are almost the same as Excellent. |
Poor | 2 | Poor network. Audio and video quality is slightly degraded but does not affect communication. |
Bad | 3 | Poor network. The video stutters severely, but audio communication is not affected. |
VeryBad | 4 | Extremely poor network. Communication is nearly impossible. |
Disconnect | 5 | The network is disconnected. |
Unknown | 6 | Unknown. |
ARTCAICallSpeakingInterruptedReason
The reason why the agent's current speech was interrupted.
Enumeration value | Value | Description |
unknown | 0 | Unknown reason. |
byWords | 1 | A specific word was recognized. |
byVoice | 2 | Interrupted by voice. |
byInterruptSpeaking | 3 | The interruptSpeaking API was called. |
bySpeechBroadCast | 4 | Interrupted by an active voice broadcast. |
byLlmQuery | 5 | Interrupted by an active large language model (LLM) query. |
ARTCAICallVoiceprintResult
The Voice Activity Detection (VAD) feedback result.
Enumeration value | Value | Description |
Off | 0 | VAD-based noise suppression and AIVad are disabled. |
Unregister | 1 | VAD-based noise suppression is enabled, but voiceprint registration is not complete. |
DetectedSpeaker | 2 | VAD-based noise suppression is enabled, and the main speaker is detected. |
UndetectedSpeaker | 3 | VAD-based noise suppression is enabled, but the main speaker is not detected. |
DetectedSpeakerWithAIVad | 4 | AIVad is enabled, and the main speaker is detected. |
UndetectedSpeakerWithAIVad | 5 | AIVad is enabled, but the main speaker is not detected. |
Unknown | 100 | Unknown |
ARTCAICallErrorCode
The error code.
Enumeration value | Value | Description |
None | 0 | Success |
InvalidAction | -1 | Invalid operation. |
InvalidParames | -2 | Invalid parameters. |
NetworkError | -3 | Network error. |
InternalError | -4 | Internal error. |
BeginCallFailed | -10000 | Failed to start the call. |
ConnectionFailed | -10001 | A connection problem occurred. |
PublishFailed | -10002 | Failed to ingest the stream. |
SubscribeFailed | -10003 | Failed to pull the stream. |
TokenExpired | -10004 | The call authentication has expired. |
KickedByUserReplace | -10005 | The call cannot proceed because another user logged on with the same name. |
KickedBySystem | -10006 | The call cannot proceed because the user was kicked out by the system. |
KickedByChannelTerminated | -10007 | The call cannot proceed because the channel was destroyed. |
LocalDeviceException | -10008 | The call cannot proceed due to on-premises device issues. |
AgentLeaveChannel | -10101 | The agent left the channel (the agent ended the call). |
AgentPullFailed | -10102 | The agent failed to pull the stream. |
AgentASRFailed | -10103 | Speech recognition failed on the agent side. |
AvatarServiceFailed | -10201 | Failed to start the digital agent service. |
AvatarRoutesExhausted | -10202 | The number of concurrent ingest endpoints for the digital agent has reached its limit. |
AgentSubscriptionRequired | -10203 | Failed to start the call because the daily free trial quota has been exceeded. |
AgentNotFound | -10204 | The agent was not found (the agent ID does not exist). |
ChatTextMessageSendFailed | -10301 | Failed to send the text message. |
ChatTextMessageReceiveFailed | -10302 | Failed to receive the text message. |
ChatVoiceRecordFailed | -10310 | Failed to record the voice message. |
ChatVoiceMessageSendFailed | -10311 | Failed to send the voice message. |
ChatVoiceMessageReceiveFailed | -10312 | Failed to receive the voice message. |
ChatPlayMessageReceiveFailed | -10321 | Failed to receive the playback message. |
ChatLogNotFound | -10331 | The chat record was not found. |
ChatAttachmentUploading | -10332 | The attachment is still uploading. The message can be sent only after the upload is complete. |
UnknowError | -40000 | Unknown error. |
ARTCAICallTurnDetectionMode
The mode for determining whether the user has finished speaking.
Enumeration value | Value | Description |
Normal | 0 | Normal mode. Uses the ASR silence duration to determine if the user has finished speaking. This mode does not use AI for semantic analysis. |
Semantic | Semantic mode. Uses AI to determine if the user has finished speaking based on contextual semantics. |
Class
ARTCAICallAgentInfo
The agent runtime information.
Property | Type | Description |
agentId | String | The current agent ID. |
agentType | The agent type. | |
channelId | String | The ID of the Real-Time Communication (RTC) channel where the agent resides. |
uid | String | The unique identifier for the agent to enter the RTC channel. |
instanceId | String | The instance ID of the current agent. |
requestId | String | The request ID for starting the current agent. |
region | String? | The region where the agent resides. |
ARTCAICallAudioConfig
The call audio configuration.
Property | Type | Description |
audioProfile | The audio encoding configuration. Default value: HighQualityMode. | |
audioScenario | The audio scenario configuration. Default value: ARTCAICallAudioSceneMusicMode. |
ARTCAICallViewConfig
The agent view configuration. You can set this class when the agent requires rendering, such as for a digital human.
Property | Type | Description |
view | UIView | The rendering view. |
viewMode | The image rendering mode. | |
viewMirrorMode | The image mirror mode. | |
viewRotationMode | The image rotation mode. |
ARTCAICallVisionConfig
The visual understanding agent runtime configuration.
Property | Type | Description |
preview | UIView? | The preview view. If this is empty, the stream is ingested without a preview. |
viewMode | The preview image rendering mode. | |
viewMirrorMode | The preview image mirror mode. | |
viewRotationMode | The preview image rotation mode. | |
dimensions | CGSize | The stream ingest resolution. |
frameRate | Int | The stream ingest frame rate. |
bitrate | Int | The stream ingest bitrate. |
keyFrameInterval | Int | Stream ingest: Keyframe interval (milliseconds) |
useHighQualityPreview | Bool | Specifies whether to use high-definition preview. If not, the SDK automatically adjusts the resolution. |
cameraCaptureFrameRate | Int | The preview resolution. Default value: 15 fps. |
ARTCAICallVisionCustomCaptureRequest
The request model for the visual understanding agent to enable custom frame capture.
Property | Type | Description |
text | String | The text parameter for requesting the multi-modal large language model. |
enableASR | Bool | Specifies whether to pass the ASR result to the large language model as input. |
isSingle | Bool | Specifies whether to perform a single frame capture. |
eachDuration | UInt | The frame capture interval. Unit: seconds. |
num | UInt | The number of images captured each time. |
duration | UInt | The duration of continuous frame capture. Unit: seconds. This parameter is valid only for continuous frame capture. |
userData | String? | A JSON string for custom business information. |
ARTCAICallSendTextToAgentRequest
The request model for sending a text message to the agent.
Property | Type | Description |
text | String | The text message to send to the agent, such as "What is this?" |
ARTCAICallConfig
The configuration for starting an agent call.
Property | Type | Description |
agentId | String | The agent ID. |
agentType | The agent type. This must match the type of the agentId. Otherwise, the agent will fail to start. | |
agentUserId | String? | The agent's UID. If this is empty, the service assigns one. |
region | String | The region where the agent service is located. This must match the region of the agentId. Otherwise, the agent will fail to start. |
userId | String | The current user ID. |
userJoinToken | String | The token for the current user to join the call. |
userData | [String: Any]? | Custom user information that is passed to the agent. |
agentConfig | The agentConfig parameter for starting a call. | |
audioConfig | The on-premises audio configuration. | |
videoConfig | The on-premises video configuration. This takes effect only for VisionAgent or VideoAgent. | |
chatSyncConfig | The associated chat agent configuration. | |
templateConfig | Deprecated. Use |
ARTCAICallTemplateConfig (deprecated)
The template configuration for starting a call.
This class is deprecated in v2.5 and later. Use ARTCAICallAgentConfig instead.
Property | Type | Description |
agentGreeting | String? | The agent's welcome message. If this is empty, the agent's configured value is used. The maximum length is 100 characters. |
userOnlineTimeout | Int32 | The timeout period for the agent to close the task if the user does not join the call. If the value is less than 0, the server's default value of 60s is used. |
userOfflineTimeout | Int32 | The timeout period for the agent to close the task after the user leaves the call. If the value is less than 0, the server's default value of 5s is used. |
workflowOverrideParams | [String: Any]? | The workflow override parameters. |
bailianAppParams | [String: Any]? | Application Center parameters in Model Studio |
asrMaxSilence | Int32 | The threshold for voice sentence segmentation detection. Valid values: 200 ms to 1200 ms. If the value is less than 0, the server's default value of 400 ms is used. |
volume | Int32 | The agent's speaking volume. Valid values: 0 to 400. Output volume = Voice output volume in the workflow × volume / 100. If the value is less than 0, the server's default value of 100 is used. |
vadLevel | Int32 | The sensitivity parameter for AIVad. Valid values: 0 to 10. Default value: 3. |
enableVoiceInterrupt | Bool | Specifies whether to enable intelligent interruption. |
agentVoiceId | String? | The agent's voice ID. If this is empty, the agent's configured value is used. |
enableIntelligentSegment | Bool | Specifies whether to enable intelligent sentence segmentation and merging. |
useVoiceprint | Bool | Specifies whether to use voiceprint-based noise suppression for the current sentence segmentation. |
voiceprintId | String? | The voiceprint ID. If this is not empty, voiceprint-based noise suppression is enabled for the current call. |
agentMaxIdleTime | Int32 | The maximum idle time for the agent. Unit: seconds. If the value is less than 0, the server's default value of 600s is used. |
llmHistoryLimit | Int32 | The maximum number of historical conversation rounds to retain for the LLM or multi-modal LLM context. If the value is less than 0, the server's default value of 10 is used. |
enablePushToTalk | Bool | Specifies whether to enable push-to-talk mode. |
agentGracefulShutdown | Bool | Specifies whether to enable graceful shutdown. The agent finishes broadcasting the current sentence before stopping. |
agentAvatarId | String? | The digital human model ID. If this is empty, the agent's configured value is used. |
asrLanguageId | String? | The ASR language ID. If this is empty, the agent's configured value is used. Valid values:
|
wakeUpQuery | String? | A user command given before the call starts. This allows the agent to respond immediately after the call begins. |
llmSystemPrompt | String? | The LLM system prompt, such as "You are a friendly and helpful assistant..." Note: LLM nodes of the Alibaba Cloud Model Studio pipeline type are not supported. |
asrHotWords | [String]? | The ASR hotword list. It can contain up to 500 words. Each word can have up to 10 characters. |
interruptWords | [String]? | Specific words or phrases that trigger conversation interruption, such as "Excuse me" or "I know." |
ARTCAICallChatSyncConfig
The configuration parameters for the associated chat agent session.
Property | Type | Description |
sessionId | String | The ID of the associated chat agent session. |
agentId | String | The ID of the associated chat agent. It must be in the same account and region. |
receiverId | String | The user ID of the associated chat agent session. |
ARTCAICallAgentShareConfig
The agent sharing configuration.
Property | Type | Description |
shareId | String? | The agent share ID. |
agentType | Agent Workload Type | |
expireTime | Date? | The expiration time. |
region | String? | The region where the agent resides. |
templateConfig | String? | The template configuration as a JSON string. |
userData | [String: Any]? | Custom user information that is passed to the agent. |
ARTCAICallVideoConfig
The local video configuration for the call.
Property | Type | Description |
dimensions | CGSize | The stream ingest resolution. |
frameRate | Int | The stream ingest frame rate. |
bitrate | Int | The stream ingest bitrate. |
keyFrameInterval | Int | The stream ingest GOP. Unit: milliseconds. |
useHighQualityPreview | Bool | Specifies whether to use high-definition preview. If not, the SDK automatically adjusts the preview resolution based on the stream ingest resolution. |
cameraCaptureFrameRate | Int | The preview resolution. |
useFrontCameraDefault | Bool | Specifies whether to start the front camera by default. |
ARTCAICallAgentConfig
The startup and runtime configuration for the call agent.
Property | Type | Description |
agentGreeting | String? | The agent's welcome message. If this is empty, the agent's configured value is used. |
wakeUpQuery | String? | A user command given before the call starts. This allows the agent to respond immediately after the call begins. |
agentMaxIdleTime | Int32 | The maximum idle time for the agent before it automatically goes offline. Unit: seconds. Default value: 600s. |
userOnlineTimeout | Int32 | The timeout period for the agent to close the task if the user does not join the call. Default value: 60s. |
userOfflineTimeout | Int32 | The timeout period for the agent to close the task after the user leaves the call. Default value: 5s. |
enablePushToTalk | Bool | Specifies whether to enable push-to-talk mode. |
agentGracefulShutdown | Bool | Specifies whether to enable graceful shutdown. |
volume | Int32 | The agent's speaking volume. Valid values: 0 to 400. Default value: 100. |
workflowOverrideParams | [String: Any]? | The workflow override parameters. |
enableIntelligentSegment | Bool | The switch for intelligent sentence segmentation. |
asrConfig | The speech recognition configuration. | |
ttsConfig | The speech synthesis configuration. | |
llmConfig | The large language model configuration. | |
avatarConfig | The digital human configuration. | |
interruptConfig | The interruption configuration. | |
voiceprintConfig | The voiceprint-based noise suppression configuration. | |
turnDetectionConfig | The turn detection configuration. | |
experimentalConfig | The non-production customized configuration. | |
vcrConfig | The VCR configuration. |
ARTCAICallAgentAsrConfig
The speech recognition configuration.
Property | Type | Description |
asrLanguageId | String? | The ASR language ID. If this is empty, the agent's configured value is used. |
asrMaxSilence | Int32 | The threshold for voice sentence segmentation detection. A silence duration exceeding this threshold is considered a sentence break. Default value: 400 ms. Valid values: 200 ms to 1200 ms. |
asrHotWords | [String]? | The ASR hotword list. It can contain up to 500 words. Each word can have up to 10 characters. |
vadLevel | Int32 | The sensitivity parameter for AIVad. Valid values: 0 to 10. Default value: 3. |
customParams | String? | When you use a custom ASR, pass runtime parameters in a URL parameter format. Example: "mode=fast&sample=16000&format=wav". |
vadDuration | Int32 | The minimum duration for VAD to adjust interruption sensitivity. A value of 0 disables this feature. Valid values: 200 to 2000 ms. A common range is 200 to 500 ms, corresponding to one to four characters. If the value is less than 0, it is not sent to the server, where the feature is disabled by default. |
ARTCAICallAgentTtsConfig
The speech synthesis configuration.
Property | Type | Description |
agentVoiceId | String? | The voice ID of the agent. If this parameter is empty, the value configured for the agent is used. |
pronunciationRules | [[String: Any]]? | An array of pronunciation rules. A maximum of 20 rules are supported. If this parameter is nil or an empty array, no rules are used. Example: |
speechRate | Double | The Text-to-Speech (TTS) speech rate. This parameter is supported for all TTS types. Valid values range from 0.5 to 2.0, and the default value is 1.0. If you specify a value of less than 0, the value is not sent to the server. Instead, the value configured in the console is used. |
languageId | String? | The TTS language code. This parameter is valid only when the TTS type is MiniMax. |
emotion | String? | The TTS emotion type. This parameter is valid only when the TTS type is MiniMax. |
modelId | String? | The TTS model ID. This parameter currently supports only MiniMax. Valid values are `speech-01-turbo` and `speech-02-turbo`. |
ARTCAICallAgentLlmConfig
The large language model configuration.
Property | Type | Description |
llmHistoryLimit | Int32 | The maximum number of historical conversation rounds to retain. Default value: 10. |
llmSystemPrompt | String? | The LLM system prompt. |
bailianAppParams | [String: Any]? | Parameters for the Model Studio Application Center |
llmCompleteReply | boolean | Specifies whether to send the complete LLM result. Note If enabled, the complete LLM result is returned through the onLLMReplyCompleted event callback after the LLM generates the result. |
openAIExtraQuery | String? | Additional query parameters for LLMs that use the OpenAI protocol. Note Parameters must be in the `key=value` format. Use ampersands (&) to separate multiple parameters. All values must be strings. |
ARTCAICallAgentAvatarConfig
The digital human configuration.
Property | Type | Description |
agentAvatarId | String? | The digital human model ID. If this is empty, the agent's configured value is used. |
ARTCAICallAgentInterruptConfig
The interruption configuration.
Property | Type | Description |
enableVoiceInterrupt | Bool | Specifies whether to enable intelligent interruption. |
interruptWords | [String]? | Specific words or phrases that trigger conversation interruption. |
ARTCAICallAgentVoiceprintConfig
The voiceprint-based noise suppression configuration.
Property | Type | Description |
useVoiceprint | Bool | Specifies whether to use voiceprint-based noise suppression for the current sentence segmentation. |
voiceprintId | String? | The voiceprint ID. If this is not empty, voiceprint-based noise suppression is enabled for the current call. |
ARTCAICallAgentTurnDetectionConfig
The turn detection configuration.
Property | Type | Description |
turnEndWords | [String]? | Specific words for sentence segmentation, such as "Over" or "I'm done." |
mode | The mode for determining whether the user has finished speaking. Default value: Semantic. This mode uses AI for semantic analysis to determine if the user has finished speaking. | |
semanticWaitDuration | Int32 | The custom wait time for semantic sentence segmentation. Unit: milliseconds. Valid values: 0 to 10,000. If the value is less than 0, it is not sent to the server. The server then uses its default value of -1, which allows the AI to automatically determine the wait time. Note The `semanticWaitDuration` parameter is invalid in `ARTCAICallTurnDetectionMode.Normal` mode. |
ARTCAICallAgentVcrResult
The Video Content Recognition (VCR) detection result.
Property | Type | Description |
resultData | [String]? | All VCR detection results returned by the agent. |
stillFrameMotionResult | FrameMotionResult? | The VCR still frame detection result. |
invalidFrameMotionResult | FrameMotionResult? | The VCR invalid frame detection result. |
peopleCountResult | PeopleCountResult? | The VCR real-time people count detection result. |
equipmentResult | EquipmentResult? | The VCR electronic device detection result. |
headMotionResult | HeadMotionResult? | The VCR head motion detection result. |
ARTCAICallAgentVcrConfig
The VCR configuration.
Property | Type | Description |
data | [String]? | Caches the JSON object that you pass. This object is then used to generate a JSON string, which allows for custom extensions. |
stillFrameMotion | The VCR still frame detection configuration. | |
invalidFrameMotion | The VCR invalid frame detection configuration. | |
peopleCount | The VCR real-time people count detection configuration. | |
equipment | The VCR electronic device detection configuration. | |
headMotion | The VCR head motion detection configuration. |
ARTCAICallAgentVcrBaseConfig
The basic VCR detection configuration.
Property | Type | Description |
enable | Boolean | Specifies whether to enable this feature. Default value: true. |
ARTCAICallAgentVcrFrameMotionConfig
The VCR video frame detection configuration.
Property | Type | Description |
callbackDelay | Int32 | The delay in milliseconds before a callback is triggered. Default value: 3000. |
ARTCAICallExperimentalConfig
Experimental parameters for controlling specific logic policies.
Property | Type | Description |
rtcSdkParams | [String: Any]? | RTC SDK parameters. |
commonParams | [String: Any]? | Common parameters. |