This topic describes the data types that the Web SDK uses.
Data structure overview
Earlier SDK versions contain deprecated parameters and methods. Upgrade the SDK to the latest version. For more information, see Web SDK User Guide.
Structure type | Data type | Description |
Enum | Agent type | |
Agent status | ||
Reason why the agent was interrupted while speaking | ||
VAD feedback result | ||
Error code | ||
Class | Agent runtime information | |
Request to configure custom frame capture for a visual understanding agent | ||
Request to send a text message to an agent | ||
Configuration for starting an agent call | ||
Template configuration parameters for starting a call | ||
Configuration for an associated chat agent session | ||
Agent sharing configuration | ||
Configuration for starting and running a call agent | ||
Speech recognition configuration | ||
Speech synthesis configuration | ||
Large language model configuration | ||
Digital human configuration | ||
Interruption configuration | ||
Voiceprint noise reduction configuration | ||
Turn detection configuration | ||
VCR detection result | ||
VCR configuration | ||
Basic VCR detection configuration | ||
VCR frame motion detection configuration |
Data structure details
Enum
AICallAgentType
Agent type.
Enumeration value | Value | Description |
VoiceAgent | 0 | Supports only interactive voice response and has no visual avatar. |
AvatarAgent | 1 | Has a virtual avatar and supports voice and visual interaction. |
VisionAgent | 2 | Primarily responsible for understanding and analyzing visual information. |
VideoAgent | 3 | Video call. Supports bidirectional video calls between the user and the agent. |
AICallAgentState
Agent status.
Enumeration value | Value | Description |
Listening | 1 | Listening |
Thinking | 2 | Thinking |
Speaking | 3 | Speaking |
AICallSpeakingInterruptedReason
The reason an agent was interrupted while speaking.
Enumeration value | Value | Description |
unknown | 0 | Unknown reason |
byWords | 1 | Interrupted because a specific word was detected. |
byVoice | 2 | Interrupted by voice. |
byInterruptSpeaking | 3 | Interrupted by a call to the interruptSpeaking API. |
bySpeechBroadCast | 4 | Interrupted by an active voice broadcast. |
byLlmQuery | 5 | Interrupted by an active LLM query. |
AICallVoiceprintResult
Voice Activity Detection (VAD) feedback result.
Enumeration value | Value | Description |
Off | 0 | Voiceprint noise reduction VAD is disabled, and AI VAD is disabled. |
Unregister | 1 | Voiceprint noise reduction VAD is enabled, but voiceprint registration is not complete. |
DetectedSpeaker | 2 | Voiceprint noise reduction VAD is enabled, and the primary speaker is detected. |
UndetectedSpeaker | 3 | Voiceprint noise reduction VAD is enabled, but the primary speaker is not detected. |
DetectedSpeakerWithAIVad | 4 | AI VAD is enabled, and the primary speaker is detected. |
UndetectedSpeakerWithAIVad | 5 | AI VAD is enabled, but the primary speaker is not detected. |
Unknown | 100 | Unknown |
AICallErrorCode
Error code.
Enumeration value | Value | Description |
None | 0 | Success |
InvalidAction | -1 | Invalid operation |
InvalidParames | -2 | Invalid parameters |
NetworkError | -3 | Network error |
InternalError | -4 | Internal error |
BeginCallFailed | -10000 | Failed to start the call. |
ConnectionFailed | -10001 | A connection problem occurred. |
PublishFailed | -10002 | Stream ingest failed. |
SubscribeFailed | -10003 | Stream pulling failed. |
TokenExpired | -10004 | Call authentication expired. |
KickedByUserReplace | -10005 | The call cannot proceed because another user logged on with the same name. |
KickedBySystem | -10006 | The call cannot proceed because the user was kicked by the system. |
KickedByChannelTerminated | -10007 | The call cannot proceed because the channel was destroyed. |
LocalDeviceException | -10008 | The call cannot proceed due to an on-premises device issue. |
AgentLeaveChannel | -10101 | The agent left the channel (the agent ended the call). |
AgentPullFailed | -10102 | The agent failed to pull the stream. |
AgentASRFailed | -10103 | The agent's ASR failed. |
AvatarServiceFailed | -10201 | Failed to start the digital agent service. |
AvatarRoutesExhausted | -10202 | The number of concurrent digital agent ingest endpoints was exceeded. |
AgentSubscriptionRequired | -10203 | The call could not be initiated because the daily free trial quota was exceeded. |
AgentNotFound | -10204 | The agent was not found (the agent ID does not exist). |
ChatTextMessageSendFailed | -10301 | Failed to send the text message. |
ChatTextMessageReceiveFailed | -10302 | Failed to receive the text message. |
ChatVoiceRecordFailed | -10310 | Failed to record the voice message. |
ChatVoiceMessageSendFailed | -10311 | Failed to send the voice message. |
ChatVoiceMessageReceiveFailed | -10312 | Failed to receive the voice message. |
ChatPlayMessageReceiveFailed | -10321 | Failed to receive the playback message. |
ChatLogNotFound | -10331 | The chat history was not found. |
ChatAttachmentUploading | -10332 | The attachment is still uploading. The message can be sent only after the upload is complete. |
UnknowError | -40000 | Unknown error |
Class
AICallAgentInfo
Agent runtime information.
Property | Type | Description |
agentType | The agent type. | |
channelId | string | The ID of the RTC channel where the agent is located. |
userId | string | The unique identifier for the agent to enter the RTC channel. |
instanceId | string | The ID of the instance where the agent is running. |
reqId | string | The request ID for starting the current agent. |
AICallVisionCustomCaptureRequest
The request model for configuring custom frame capture for a visual understanding agent.
Property | Type | Description |
text | string | The text parameter for requesting the multimodal large model. |
isSingle | boolean | Specifies whether to capture a single frame. |
eachDuration | number | The interval between frame captures, in seconds. |
num | number | The number of images to capture each time. |
duration | number | The duration for continuous frame capture, in seconds. This parameter is effective only for continuous frame capture. |
userData (Optional) | string | A JSON string that contains custom business information. |
AICallSendTextToAgentRequest
The request model for sending a text message to an agent.
Property | Type | Description |
text | string | The text message to query the agent, for example, "What is this?". |
AICallConfig
Configuration for starting an agent call.
Property | Type | Description |
agentId | string | The agent ID. |
agentType | The agent type. | |
agentUserId (Optional) | string | The UID of the agent. If this is left empty, the service assigns a UID. |
region | string | The region where the agent service is located. |
userId | string | The current user ID. |
userJoinToken | string | The token for the current user to join the meeting. |
userData (Optional) | string | Custom user information that is passed to the agent. We recommend that you use a JSON string. |
chatSyncConfig (Optional) | Configuration for the associated chat agent. | |
agentConfig (Optional) | The agentConfig parameter used to start the call. | |
templateConfig (Optional) | Deprecated. Use |
AICallTemplateConfig (Deprecated)
The TemplateConfig parameter for starting a call.
This method is deprecated in versions 2.5 and later. In the latest version, use AICallAgentConfig.
Property | Type | Description |
agentGreeting (Optional) | string | The agent's welcome message. If this is left empty, the value configured for the agent is used. |
userOnlineTimeout | number | The timeout period for the agent to close the task if the user does not join the meeting. If the value is less than 0, the server-side default value of 60s is used. |
userOfflineTimeout | number | The timeout period for the agent to close the task after the user leaves the meeting. If the value is less than 0, the server-side default value of 5s is used. |
workflowOverrideParams (Optional) | object | Workflow overwrite parameters. |
bailianAppParams (Optional) | object | Parameters of the Model Studio Application Center |
asrMaxSilence | number | The threshold for voice activity detection. A value of less than 0 indicates that the server-side default value of 400 ms is used. Valid values: 200 ms to 1200 ms. |
volume | number | The agent's speaking volume. Valid values: 0 to 400. Output volume = Voice output volume in the workflow × volume / 100. A value of less than 0 indicates that the server-side default value of 100 is used. |
vadLevel | number | The sensitivity parameter for AI VAD. Default value: 3. Valid values: [0, 10]. |
enableVoiceInterrupt | boolean | Specifies whether to enable intelligent interruption. |
agentVoiceId (Optional) | string | The ID of the agent's voice timbre. If this is left empty, the value configured for the agent is used. |
enableIntelligentSegment | boolean | Specifies whether to enable intelligent sentence merging. |
useVoiceprint | boolean | Specifies whether to use voiceprint noise reduction for the current sentence. |
voiceprintId (Optional) | string | The voiceprint ID. If this is not empty, voiceprint noise reduction is enabled for the current call. |
agentMaxIdleTime | number | The maximum idle time for the agent, in seconds. A value of less than 0 indicates that the server-side default value of 600s is used. |
llmHistoryLimit | number | The maximum number of historical conversation rounds to retain for the LLM or Multimodal LLM. A value of less than 0 indicates that the server-side default value of 10 is used. |
enablePushToTalk | boolean | Specifies whether to enable push-to-talk mode. |
agentGracefulShutdown | boolean | Specifies whether to enable graceful shutdown. If enabled, the agent stops after broadcasting the current sentence. |
agentAvatarId (Optional) | string | The ID of the digital human model. If this is left empty, the value configured for the agent is used. |
asrLanguageId (Optional) | string | The ASR language ID. If this is left empty, the value configured for the agent is used. |
wakeUpQuery (Optional) | string | The user's instruction before the call starts. The agent responds immediately after the call starts. |
llmSystemPrompt (Optional) | string | The system prompt for the LLM, for example, "You are a friendly and helpful assistant...". Note: This is not supported if the LLM node is a Model Studio workflow. |
asrHotWords (Optional) | Array<string> | A list of ASR hotwords. |
interruptWords (Optional) | Array<string> | Specific words or phrases that trigger conversation interruption, such as "Excuse me" or "I see". |
AICallAgentConfig
Configuration for starting and running a call agent.
Property | Type | Description |
agentGreeting (Optional) | string | The agent's welcome message. If this is left empty, the value configured for the agent is used. The message can be up to 100 characters long. |
wakeUpQuery (Optional) | string | The user's instruction before the call starts. The agent responds immediately after the call starts. |
agentMaxIdleTime | number | The maximum idle time for the agent, in seconds. The agent automatically goes offline if the time is exceeded. Default value: 600s. |
userOnlineTimeout | number | The timeout period for the agent to close the task if the user does not join the meeting. Default value: 60s. |
userOfflineTimeout | number | The timeout period for the agent to close the task after the user leaves the meeting. Default value: 5s. |
enablePushToTalk | boolean | Specifies whether to enable push-to-talk mode. |
agentGracefulShutdown | boolean | Specifies whether to enable graceful shutdown. If enabled, the agent stops after broadcasting the current sentence. |
volume | number | The agent's speaking volume. Valid values: 0 to 400. Default value: 100. |
workflowOverrideParams | JSONObject | Workflow overwrite parameters. |
enableIntelligentSegment | boolean | The switch for intelligent sentence segmentation. |
asrConfig | Speech recognition configuration. | |
ttsConfig | Speech synthesis configuration. | |
llmConfig | Large language model configuration. | |
avatarConfig | Digital human configuration. | |
interruptConfig | Interruption configuration. | |
voiceprintConfig | Voiceprint noise reduction configuration. | |
turnDetectionConfig | Turn detection configuration. | |
experimentalConfig | JSONObject | Non-productized custom configuration. |
vcrConfig | VCR configuration. |
AICallChatSyncConfig
Configuration parameters for an associated chat agent session.
Property | Type | Description |
sessionId | string | The ID of the associated chat agent session. |
agentId | string | The ID of the associated chat agent (must be in the same account and region). |
receiverId | string | The user ID for the associated chat agent session. |
AICallAgentShareConfig
Agent sharing configuration.
Property | Type | Description |
shareId (Optional) | string | The agent sharing ID. |
agentType | The agent workload type. | |
expireTime (Optional) | Date | The expiration time. |
region (Optional) | string | The region where the agent is located. |
templateConfig (Optional) | string | The template configuration (JSON string). |
userData (Optional) | string | Custom user information that is passed to the agent. |
AICallAgentAsrConfig
Automatic Speech Recognition (ASR) configuration.
Property | Type | Description |
asrLanguageId (Optional) | string | The ASR language ID. If this is left empty, the value configured for the agent is used. Valid values:
|
asrMaxSilence | number | The threshold for voice activity detection. If the duration of silence exceeds this threshold, a sentence break is detected. Default value: 400 ms. Valid values: 200 ms to 1200 ms. |
asrHotWords (Optional) | string[] | A list of ASR hotwords. Limits: up to 500 words, with each word containing no more than 10 characters. |
vadLevel | number | The sensitivity parameter for AI VAD. Default value: 3. Valid values: [0, 10]. |
customParams | string | When using a self-managed ASR, pass runtime parameters in the URL parameter format, for example, "mode=fast&sample=16000&format=wav". |
vadDuration | number | The minimum duration threshold for voice activity detection, used to control interruption sensitivity. A value of 0 (default) disables this feature. Valid values: 200 to 2000 milliseconds. A common range is [200, 500], which corresponds to 1 to 4 characters. If you set this to a value less than 0, the value is not sent to the server (the server disables this feature by default). |
AICallAgentTtsConfig
Text-to-Speech (TTS) configuration.
Property | Type | Description |
agentVoiceId (Optional) | string | The ID of the agent's voice timbre. If this is left empty, the value configured for the agent is used. |
pronunciationRules | JSONObject[] | An array of pronunciation rules. Up to 20 rules are supported. If this is undefined or an empty array, no rules are used. Example: |
speechRate | number | The TTS playback speed. All TTS types are supported. Valid values: [0.5, 2.0]. Default value: 1.0. If you set this to a value less than 0, the value is not sent to the server (the value configured in the console is used). |
languageId | string | The TTS language code. This is valid only when the TTS type is MiniMax. |
emotion | string | The TTS emotion type. This is valid only when the TTS type is MiniMax. |
modelId | string | The TTS model ID. Currently, only MiniMax is supported. Valid values: speech-01-turbo, speech-02-turbo. |
AICallAgentLlmConfig
Large Language Model (LLM) configuration.
Property | Type | Description |
llmHistoryLimit | number | The maximum number of historical conversation rounds to retain. Default value: 10. |
llmSystemPrompt (Optional) | string | The system prompt for the LLM. |
bailianAppParams | JSONObject | Model Studio application parameters. |
llmCompleteReply | boolean | Specifies whether to send the complete LLM result. Note If this is enabled, the complete LLM result is returned through the llmReplyCompleted event callback after the result is generated. |
openAIExtraQuery (Optional) | string | Additional query parameters for the OpenAI protocol-based LLM. Note Parameters must be in the key=value format. Use ampersands (&) to separate multiple parameters. All values must be strings. |
AICallAgentAvatarConfig
Digital human configuration.
Property | Type | Description |
agentAvatarId (Optional) | string | The ID of the digital human model. If this is left empty, the value configured for the agent is used. |
AICallAgentInterruptConfig
Interruption configuration.
Property | Type | Description |
enableVoiceInterrupt | boolean | Specifies whether to enable intelligent interruption. |
interruptWords (Optional) | string[] | Specific words or phrases that trigger conversation interruption. |
AICallAgentVoiceprintConfig
Voiceprint noise reduction configuration.
Property | Type | Description |
useVoiceprint | boolean | Specifies whether to use voiceprint noise reduction for the current sentence. |
voiceprintId (Optional) | string | The voiceprint ID. If this is not empty, voiceprint noise reduction is enabled for the current call. |
AICallAgentTurnDetectionConfig
Turn detection configuration.
Property | Type | Description |
turnEndWords (Optional) | string[] | Specific words for sentence breaking, for example, "Over" or "I'm done." |
mode | AICallTurnDetectionMode | The mode for determining whether the user has finished speaking. The default is Semantic, which uses AI to determine whether the user has finished speaking based on semantics. |
semanticWaitDuration | number | The custom wait time for semantic sentence breaking, in milliseconds. Valid values: [0, 10000]. If you set this to a value less than 0, the value is not sent to the server (the server-side default value of -1 is used, and the AI automatically determines the appropriate wait time). Note If AICallTurnDetectionMode is set to Normal, the semanticWaitDuration field is invalid. |
AICallAgentVcrResult
VCR detection result.
Property | Type | Description |
data | JSONObject | All VCR detection results returned by the agent. |
stillFrameMotion | FrameMotionResult | Still frame detection result. |
invalidFrameMotion | FrameMotionResult | Invalid frame detection result. |
peopleCount | PeopleCountResult | People count detection result. |
equipment | EquipmentResult | Electronic device detection result. |
headMotion | HeadMotionResult | Head motion detection result. |
AICallAgentVcrConfig
VCR configuration.
Property | Type | Description |
data | JSONObject | When a user passes a JSON object, it is cached. The object is then used to generate a JSON string, which allows for custom extensions. |
stillFrameMotion | VCR still frame detection configuration. | |
invalidFrameMotion | VCR invalid frame detection configuration. | |
peopleCount | VCR real-time people count detection configuration. | |
equipment | VCR electronic device detection configuration. | |
headMotion | VCR head motion detection configuration. |
AICallAgentVcrBaseConfig
Basic VCR detection configuration.
Property | Type | Description |
enable | boolean | Specifies whether to enable the feature. |
AICallAgentVcrFrameMotionConfig
VCR frame motion detection configuration.
Property | Type | Description |
callbackDelay | number | The delay before the callback is triggered, in milliseconds. Default value: 3000 ms. |