All Products
Search
Document Center

:Data structures

Last Updated:Aug 12, 2025

This topic describes the data types that the Web SDK uses.

Data structure overview

Note

Earlier SDK versions contain deprecated parameters and methods. Upgrade the SDK to the latest version. For more information, see Web SDK User Guide.

Structure type

Data type

Description

Enum

AICallAgentType

Agent type

AICallAgentState

Agent status

AICallSpeakingInterruptedReason

Reason why the agent was interrupted while speaking

AICallVoiceprintResult

VAD feedback result

AICallErrorCode

Error code

Class

AICallAgentInfo

Agent runtime information

AICallVisionCustomCaptureRequest

Request to configure custom frame capture for a visual understanding agent

AICallSendTextToAgentRequest

Request to send a text message to an agent

AICallConfig

Configuration for starting an agent call

AICallTemplateConfig (Deprecated)

Template configuration parameters for starting a call

AICallChatSyncConfig

Configuration for an associated chat agent session

AICallAgentShareConfig

Agent sharing configuration

AICallAgentConfig

Configuration for starting and running a call agent

AICallAgentAsrConfig

Speech recognition configuration

AICallAgentTtsConfig

Speech synthesis configuration

AICallAgentLlmConfig

Large language model configuration

AICallAgentAvatarConfig

Digital human configuration

AICallAgentInterruptConfig

Interruption configuration

AICallAgentVoiceprintConfig

Voiceprint noise reduction configuration

AICallAgentTurnDetectionConfig

Turn detection configuration

AICallAgentVcrResult

VCR detection result

AICallAgentVcrConfig

VCR configuration

AICallAgentVcrBaseConfig

Basic VCR detection configuration

AICallAgentVcrFrameMotionConfig

VCR frame motion detection configuration

Data structure details

Enum

AICallAgentType

Agent type.

Enumeration value

Value

Description

VoiceAgent

0

Supports only interactive voice response and has no visual avatar.

AvatarAgent

1

Has a virtual avatar and supports voice and visual interaction.

VisionAgent

2

Primarily responsible for understanding and analyzing visual information.

VideoAgent

3

Video call. Supports bidirectional video calls between the user and the agent.

AICallAgentState

Agent status.

Enumeration value

Value

Description

Listening

1

Listening

Thinking

2

Thinking

Speaking

3

Speaking

AICallSpeakingInterruptedReason

The reason an agent was interrupted while speaking.

Enumeration value

Value

Description

unknown

0

Unknown reason

byWords

1

Interrupted because a specific word was detected.

byVoice

2

Interrupted by voice.

byInterruptSpeaking

3

Interrupted by a call to the interruptSpeaking API.

bySpeechBroadCast

4

Interrupted by an active voice broadcast.

byLlmQuery

5

Interrupted by an active LLM query.

AICallVoiceprintResult

Voice Activity Detection (VAD) feedback result.

Enumeration value

Value

Description

Off

0

Voiceprint noise reduction VAD is disabled, and AI VAD is disabled.

Unregister

1

Voiceprint noise reduction VAD is enabled, but voiceprint registration is not complete.

DetectedSpeaker

2

Voiceprint noise reduction VAD is enabled, and the primary speaker is detected.

UndetectedSpeaker

3

Voiceprint noise reduction VAD is enabled, but the primary speaker is not detected.

DetectedSpeakerWithAIVad

4

AI VAD is enabled, and the primary speaker is detected.

UndetectedSpeakerWithAIVad

5

AI VAD is enabled, but the primary speaker is not detected.

Unknown

100

Unknown

AICallErrorCode

Error code.

Enumeration value

Value

Description

None

0

Success

InvalidAction

-1

Invalid operation

InvalidParames

-2

Invalid parameters

NetworkError

-3

Network error

InternalError

-4

Internal error

BeginCallFailed

-10000

Failed to start the call.

ConnectionFailed

-10001

A connection problem occurred.

PublishFailed

-10002

Stream ingest failed.

SubscribeFailed

-10003

Stream pulling failed.

TokenExpired

-10004

Call authentication expired.

KickedByUserReplace

-10005

The call cannot proceed because another user logged on with the same name.

KickedBySystem

-10006

The call cannot proceed because the user was kicked by the system.

KickedByChannelTerminated

-10007

The call cannot proceed because the channel was destroyed.

LocalDeviceException

-10008

The call cannot proceed due to an on-premises device issue.

AgentLeaveChannel

-10101

The agent left the channel (the agent ended the call).

AgentPullFailed

-10102

The agent failed to pull the stream.

AgentASRFailed

-10103

The agent's ASR failed.

AvatarServiceFailed

-10201

Failed to start the digital agent service.

AvatarRoutesExhausted

-10202

The number of concurrent digital agent ingest endpoints was exceeded.

AgentSubscriptionRequired

-10203

The call could not be initiated because the daily free trial quota was exceeded.

AgentNotFound

-10204

The agent was not found (the agent ID does not exist).

ChatTextMessageSendFailed

-10301

Failed to send the text message.

ChatTextMessageReceiveFailed

-10302

Failed to receive the text message.

ChatVoiceRecordFailed

-10310

Failed to record the voice message.

ChatVoiceMessageSendFailed

-10311

Failed to send the voice message.

ChatVoiceMessageReceiveFailed

-10312

Failed to receive the voice message.

ChatPlayMessageReceiveFailed

-10321

Failed to receive the playback message.

ChatLogNotFound

-10331

The chat history was not found.

ChatAttachmentUploading

-10332

The attachment is still uploading. The message can be sent only after the upload is complete.

UnknowError

-40000

Unknown error

Class

AICallAgentInfo

Agent runtime information.

Property

Type

Description

agentType

AICallAgentType

The agent type.

channelId

string

The ID of the RTC channel where the agent is located.

userId

string

The unique identifier for the agent to enter the RTC channel.

instanceId

string

The ID of the instance where the agent is running.

reqId

string

The request ID for starting the current agent.

AICallVisionCustomCaptureRequest

The request model for configuring custom frame capture for a visual understanding agent.

Property

Type

Description

text

string

The text parameter for requesting the multimodal large model.

isSingle

boolean

Specifies whether to capture a single frame.

eachDuration

number

The interval between frame captures, in seconds.

num

number

The number of images to capture each time.

duration

number

The duration for continuous frame capture, in seconds. This parameter is effective only for continuous frame capture.

userData (Optional)

string

A JSON string that contains custom business information.

AICallSendTextToAgentRequest

The request model for sending a text message to an agent.

Property

Type

Description

text

string

The text message to query the agent, for example, "What is this?".

AICallConfig

Configuration for starting an agent call.

Property

Type

Description

agentId

string

The agent ID.

agentType

AICallAgentType

The agent type.

agentUserId (Optional)

string

The UID of the agent. If this is left empty, the service assigns a UID.

region

string

The region where the agent service is located.

userId

string

The current user ID.

userJoinToken

string

The token for the current user to join the meeting.

userData (Optional)

string

Custom user information that is passed to the agent. We recommend that you use a JSON string.

chatSyncConfig (Optional)

AICallChatSyncConfig

Configuration for the associated chat agent.

agentConfig (Optional)

AICallAgentConfig

The agentConfig parameter used to start the call.

templateConfig (Optional)

AICallTemplateConfig (Deprecated)

Deprecated. Use agentConfig instead.

AICallTemplateConfig (Deprecated)

The TemplateConfig parameter for starting a call.

Important

This method is deprecated in versions 2.5 and later. In the latest version, use AICallAgentConfig.

Property

Type

Description

agentGreeting (Optional)

string

The agent's welcome message. If this is left empty, the value configured for the agent is used.

userOnlineTimeout

number

The timeout period for the agent to close the task if the user does not join the meeting. If the value is less than 0, the server-side default value of 60s is used.

userOfflineTimeout

number

The timeout period for the agent to close the task after the user leaves the meeting. If the value is less than 0, the server-side default value of 5s is used.

workflowOverrideParams (Optional)

object

Workflow overwrite parameters.

bailianAppParams (Optional)

object

Parameters of the Model Studio Application Center

asrMaxSilence

number

The threshold for voice activity detection. A value of less than 0 indicates that the server-side default value of 400 ms is used. Valid values: 200 ms to 1200 ms.

volume

number

The agent's speaking volume. Valid values: 0 to 400. Output volume = Voice output volume in the workflow × volume / 100. A value of less than 0 indicates that the server-side default value of 100 is used.

vadLevel

number

The sensitivity parameter for AI VAD. Default value: 3. Valid values: [0, 10].

enableVoiceInterrupt

boolean

Specifies whether to enable intelligent interruption.

agentVoiceId (Optional)

string

The ID of the agent's voice timbre. If this is left empty, the value configured for the agent is used.

enableIntelligentSegment

boolean

Specifies whether to enable intelligent sentence merging.

useVoiceprint

boolean

Specifies whether to use voiceprint noise reduction for the current sentence.

voiceprintId (Optional)

string

The voiceprint ID. If this is not empty, voiceprint noise reduction is enabled for the current call.

agentMaxIdleTime

number

The maximum idle time for the agent, in seconds. A value of less than 0 indicates that the server-side default value of 600s is used.

llmHistoryLimit

number

The maximum number of historical conversation rounds to retain for the LLM or Multimodal LLM. A value of less than 0 indicates that the server-side default value of 10 is used.

enablePushToTalk

boolean

Specifies whether to enable push-to-talk mode.

agentGracefulShutdown

boolean

Specifies whether to enable graceful shutdown. If enabled, the agent stops after broadcasting the current sentence.

agentAvatarId (Optional)

string

The ID of the digital human model. If this is left empty, the value configured for the agent is used.

asrLanguageId (Optional)

string

The ASR language ID. If this is left empty, the value configured for the agent is used.

wakeUpQuery (Optional)

string

The user's instruction before the call starts. The agent responds immediately after the call starts.

llmSystemPrompt (Optional)

string

The system prompt for the LLM, for example, "You are a friendly and helpful assistant...". Note: This is not supported if the LLM node is a Model Studio workflow.

asrHotWords (Optional)

Array<string>

A list of ASR hotwords.

interruptWords (Optional)

Array<string>

Specific words or phrases that trigger conversation interruption, such as "Excuse me" or "I see".

AICallAgentConfig

Configuration for starting and running a call agent.

Property

Type

Description

agentGreeting (Optional)

string

The agent's welcome message. If this is left empty, the value configured for the agent is used. The message can be up to 100 characters long.

wakeUpQuery (Optional)

string

The user's instruction before the call starts. The agent responds immediately after the call starts.

agentMaxIdleTime

number

The maximum idle time for the agent, in seconds. The agent automatically goes offline if the time is exceeded. Default value: 600s.

userOnlineTimeout

number

The timeout period for the agent to close the task if the user does not join the meeting. Default value: 60s.

userOfflineTimeout

number

The timeout period for the agent to close the task after the user leaves the meeting. Default value: 5s.

enablePushToTalk

boolean

Specifies whether to enable push-to-talk mode.

agentGracefulShutdown

boolean

Specifies whether to enable graceful shutdown. If enabled, the agent stops after broadcasting the current sentence.

volume

number

The agent's speaking volume. Valid values: 0 to 400. Default value: 100.

workflowOverrideParams

JSONObject

Workflow overwrite parameters.

enableIntelligentSegment

boolean

The switch for intelligent sentence segmentation.

asrConfig

AICallAgentAsrConfig

Speech recognition configuration.

ttsConfig

AICallAgentTtsConfig

Speech synthesis configuration.

llmConfig

AICallAgentLlmConfig

Large language model configuration.

avatarConfig

AICallAgentAvatarConfig

Digital human configuration.

interruptConfig

AICallAgentInterruptConfig

Interruption configuration.

voiceprintConfig

AICallAgentVoiceprintConfig

Voiceprint noise reduction configuration.

turnDetectionConfig

AICallAgentTurnDetectionConfig

Turn detection configuration.

experimentalConfig

JSONObject

Non-productized custom configuration.

vcrConfig

AICallAgentVcrConfig

VCR configuration.

AICallChatSyncConfig

Configuration parameters for an associated chat agent session.

Property

Type

Description

sessionId

string

The ID of the associated chat agent session.

agentId

string

The ID of the associated chat agent (must be in the same account and region).

receiverId

string

The user ID for the associated chat agent session.

AICallAgentShareConfig

Agent sharing configuration.

Property

Type

Description

shareId (Optional)

string

The agent sharing ID.

agentType

AICallAgentType

The agent workload type.

expireTime (Optional)

Date

The expiration time.

region (Optional)

string

The region where the agent is located.

templateConfig (Optional)

string

The template configuration (JSON string).

userData (Optional)

string

Custom user information that is passed to the agent.

AICallAgentAsrConfig

Automatic Speech Recognition (ASR) configuration.

Property

Type

Description

asrLanguageId (Optional)

string

The ASR language ID. If this is left empty, the value configured for the agent is used. Valid values:

  • zh_mandarin: Chinese

  • en: English

  • zh_en: Chinese-English mixed

  • es: Spanish

  • jp: Japanese

asrMaxSilence

number

The threshold for voice activity detection. If the duration of silence exceeds this threshold, a sentence break is detected. Default value: 400 ms. Valid values: 200 ms to 1200 ms.

asrHotWords (Optional)

string[]

A list of ASR hotwords. Limits: up to 500 words, with each word containing no more than 10 characters.

vadLevel

number

The sensitivity parameter for AI VAD. Default value: 3. Valid values: [0, 10].

customParams

string

When using a self-managed ASR, pass runtime parameters in the URL parameter format, for example, "mode=fast&sample=16000&format=wav".

vadDuration

number

The minimum duration threshold for voice activity detection, used to control interruption sensitivity. A value of 0 (default) disables this feature. Valid values: 200 to 2000 milliseconds. A common range is [200, 500], which corresponds to 1 to 4 characters. If you set this to a value less than 0, the value is not sent to the server (the server disables this feature by default).

AICallAgentTtsConfig

Text-to-Speech (TTS) configuration.

Property

Type

Description

agentVoiceId (Optional)

string

The ID of the agent's voice timbre. If this is left empty, the value configured for the agent is used.

pronunciationRules

JSONObject[]

An array of pronunciation rules. Up to 20 rules are supported. If this is undefined or an empty array, no rules are used. Example:

 [
  {
      "Word": "overlap",                       // Target word
      "Pronunciation": "chong die",              // Replacement pronunciation
      "Type": "replacement"                // Polyphonic character rule
  },
  {
      "Word": "action",
      "Pronunciation": "hang dong",
      "Type": "replacement"
  }
]

speechRate

number

The TTS playback speed. All TTS types are supported. Valid values: [0.5, 2.0]. Default value: 1.0. If you set this to a value less than 0, the value is not sent to the server (the value configured in the console is used).

languageId

string

The TTS language code. This is valid only when the TTS type is MiniMax.

emotion

string

The TTS emotion type. This is valid only when the TTS type is MiniMax.

modelId

string

The TTS model ID. Currently, only MiniMax is supported. Valid values: speech-01-turbo, speech-02-turbo.

AICallAgentLlmConfig

Large Language Model (LLM) configuration.

Property

Type

Description

llmHistoryLimit

number

The maximum number of historical conversation rounds to retain. Default value: 10.

llmSystemPrompt (Optional)

string

The system prompt for the LLM.

bailianAppParams

JSONObject

Model Studio application parameters.

llmCompleteReply

boolean

Specifies whether to send the complete LLM result.

Note

If this is enabled, the complete LLM result is returned through the llmReplyCompleted event callback after the result is generated.

openAIExtraQuery (Optional)

string

Additional query parameters for the OpenAI protocol-based LLM.

Note

Parameters must be in the key=value format. Use ampersands (&) to separate multiple parameters. All values must be strings.

AICallAgentAvatarConfig

Digital human configuration.

Property

Type

Description

agentAvatarId (Optional)

string

The ID of the digital human model. If this is left empty, the value configured for the agent is used.

AICallAgentInterruptConfig

Interruption configuration.

Property

Type

Description

enableVoiceInterrupt

boolean

Specifies whether to enable intelligent interruption.

interruptWords (Optional)

string[]

Specific words or phrases that trigger conversation interruption.

AICallAgentVoiceprintConfig

Voiceprint noise reduction configuration.

Property

Type

Description

useVoiceprint

boolean

Specifies whether to use voiceprint noise reduction for the current sentence.

voiceprintId (Optional)

string

The voiceprint ID. If this is not empty, voiceprint noise reduction is enabled for the current call.

AICallAgentTurnDetectionConfig

Turn detection configuration.

Property

Type

Description

turnEndWords (Optional)

string[]

Specific words for sentence breaking, for example, "Over" or "I'm done."

mode

AICallTurnDetectionMode

The mode for determining whether the user has finished speaking. The default is Semantic, which uses AI to determine whether the user has finished speaking based on semantics.

semanticWaitDuration

number

The custom wait time for semantic sentence breaking, in milliseconds. Valid values: [0, 10000]. If you set this to a value less than 0, the value is not sent to the server (the server-side default value of -1 is used, and the AI automatically determines the appropriate wait time).

Note

If AICallTurnDetectionMode is set to Normal, the semanticWaitDuration field is invalid.

AICallAgentVcrResult

VCR detection result.

Property

Type

Description

data

JSONObject

All VCR detection results returned by the agent.

stillFrameMotion

FrameMotionResult

Still frame detection result.

invalidFrameMotion

FrameMotionResult

Invalid frame detection result.

peopleCount

PeopleCountResult

People count detection result.

equipment

EquipmentResult

Electronic device detection result.

headMotion

HeadMotionResult

Head motion detection result.

AICallAgentVcrConfig

VCR configuration.

Property

Type

Description

data

JSONObject

When a user passes a JSON object, it is cached. The object is then used to generate a JSON string, which allows for custom extensions.

stillFrameMotion

AICallAgentVcrFrameMotionConfig

VCR still frame detection configuration.

invalidFrameMotion

AICallAgentVcrFrameMotionConfig

VCR invalid frame detection configuration.

peopleCount

AICallAgentVcrBaseConfig

VCR real-time people count detection configuration.

equipment

AICallAgentVcrBaseConfig

VCR electronic device detection configuration.

headMotion

AICallAgentVcrBaseConfig

VCR head motion detection configuration.

AICallAgentVcrBaseConfig

Basic VCR detection configuration.

Property

Type

Description

enable

boolean

Specifies whether to enable the feature.

AICallAgentVcrFrameMotionConfig

VCR frame motion detection configuration.

Property

Type

Description

callbackDelay

number

The delay before the callback is triggered, in milliseconds. Default value: 3000 ms.