| Parameter | Type | Description | Example |
|---|---|---|---|
| object | The template configuration of the AI agent. | ||
| Greeting | string | The welcome message that the agent says upon joining. Changes take effect in the next session. Default value: None. | |
| WakeUpQuery | string | A command given to the agent before the call starts. The agent will respond to this query immediately after the call begins. | |
| MaxIdleTime | integer | The maximum time the agent will wait for interaction before it hangs up. Unit: seconds. Default value: 600. | 600 |
| UserOnlineTimeout | integer | The timeout period for the agent to close the task if no user joins the channel. Unit: seconds. Default value: 60. | 60 |
| UserOfflineTimeout | integer | The timeout period for the agent to close the task after the user has left the channel. Unit: seconds. Default value: 5. | 5 |
| EnablePushToTalk | boolean | Specifies whether to enable the push-to-talk mode. Default value: false. | false |
| GracefulShutdown | boolean | Specifies whether to enable graceful shutdown. Default value: false. If enabled, when the agent is stopped, it will finish its current sentence before disconnecting (up to 10 seconds). | false |
| Volume | long | The agent's speaking volume.
| 100 |
| WorkflowOverrideParams | string | The parameters to override the workflow configuration. Default value: None. | {} |
| AvatarUrl | string | The URL for the agent's profile image in audio-only calls. Default value: None. | http://example.com/a.jpg |
| AvatarUrlType | string | The type of the avatar URL. Default value: None. | USER |
| EnableIntelligentSegment | boolean | If enabled, the system intelligently merges short, interim segments into a single sentence. Default value: true. | true |
| AsrConfig | object | The configuration for Automatic Speech Recognition (ASR). | |
| AsrMaxSilence | integer | The silence threshold for sentence segmentation. A pause longer than this value is considered a sentence break. Unit: milliseconds. Default value: 400. Valid values: 200 to 1200. | 400 |
| AsrLanguageId | string | The language ID for ASR. Valid values:
| zh_mandarin |
| CustomParams | string | Passthrough parameters for ASR. | mode=fast&sample=16000&format=wav |
| VadDuration | integer | The minimum duration for voice activity detection, in milliseconds. This parameter controls the sensitivity of interruptions, preventing the agent from cutting off user speech too early during short pauses. 0: Disables this feature. Valid values: 200 to 2000. Recommended: 200 to 500 ms, which typically corresponds to the length of 1 to 4 words. By default, this parameter is left empty, which indicates the feature is disabled. | 300 |
| AsrHotWords | array | Hotwords for ASR to improve recognition accuracy. Maximum of 128 hotwords. | |
| string | The hotword. Length: 1 to 10 characters. | ||
| VadLevel | integer | The voice activity detection (VAD) threshold for interruption. A higher value makes it harder to trigger interruptions. Valid values: 0 to 10. Default value: 1. The value of 0 specifies to disable the VAD feature. | 1 |
| TtsConfig | object | The configuration for Text-to-Speech (TTS). | |
| PronunciationRules | array<object> | The pronunciation rules, executed in order. Maximum of 20 rules. | |
| object | The pronunciation rule. | ||
| Type | string | The type of rule. Valid value:
| replacement |
| Word | string | The word to be replaced. The value supports up to 10 Chinese characters. Other characters, including spaces, are not supported. | |
| Pronunciation | string | The target pronunciation. The value supports up to 10 Chinese characters. Other characters, including spaces, are not supported. | |
| VoiceIdList | array | Available voices. | |
| string | The voice. | zhixiaoxia | |
| VoiceId | string | The voice ID. Changes take effect on the next sentence. If not set, the system uses the default voice ID specified in the agent template. This parameter takes effect only for the preset TTS model. Max length: 64 characters. Refer to Intelligent voice samples for options. | longcheng_v2 |
| Emotion | string | Applies only to MiniMax models. Seven types of emotions are supported:
| happy |
| ModelId | string | Applies only to MiniMax models. Valid values: speech-01-turbo and speech-02-turbo | speech-01-turbo |
| LanguageId | string | Applies only to MiniMax models. By default, this parameter is left empty. This enhances speech recognition accuracy for specific languages and dialects. If the language type is unknown, set it to auto to have the model automatically detect it. Valid values: Supported languages
| Chinese |
| SpeechRate | double | Supports all platforms. For CosyVoice, the default value is 1.0. Valid values: 0.5 to 2.0. For MiniMax, the default value is 1.0. Valid values: 0.5 to 2.0. | 1.0 |
| LlmConfig | object | The configuration for the large language model (LLM). | |
| FunctionMap | array<object> | Maps agent capabilities to LLM functions. Only supports function calling with custom LLMs that adhere to the OpenAI protocol. | |
| object | A single mapping rule. | ||
| Function | string | The name of the built-in agent capability. Only hangup is supported. | hangup |
| MatchFunction | string | hangup | |
| LlmHistoryLimit | integer | The maximum number of conversational turns to retain in the history. Default value: 10. | 10 |
| LlmCompleteReply | boolean | If true, the service sends the complete result from the LLM to the client in a single response after the generation process is finished. | true |
| LlmHistory | array<object> | The LLM/MLLM conversation history context. | |
| object | A single session. | ||
| Role | string | The role of the participant in the conversation. Valid values:
| user |
| Content | string | The actual text content of the message for that role. | |
| LlmSystemPrompt | string | The system prompt for the LLM. | |
| OpenAIExtraQuery | string | Additional query parameters to be sent to the OpenAI-protocol LLM, formatted as a URL query string (key=value pairs separated by &). All values must be strings. | api-version=2024-02-01&api-key=sk-xxx |
| OutputMaxDelay | integer | The maximum time (in milliseconds) to buffer text before it is forcibly sent to the client. Valid values: [1000,10000]. A value of 0 or an empty string (default) disables this limit. | 2000 |
| BailianAppParams | string | Alibaba Cloud Model Studio Application Center parameters in a JSON format. Reference: Model Studio Application Center Parameter | |
| OutputMinLength | integer | The minimum number of characters that must be buffered before a text chunk is sent. Valid values: [0, 100]. A value of 0 or an empty string (default) disables this limit. | 5 |
| AvatarConfig | object | The avatar configuration. Only effective if the workflow includes an avatar node. | |
| AvatarId | string | The model ID of the avatar. | 5257 |
| InterruptConfig | object | The configuration for the speech interruption strategy. | |
| InterruptWords | array | Words or phrases that will trigger an interruption. | |
| string | A word or phrase that will trigger an interruption. | ||
| EnableVoiceInterrupt | boolean | Specifies whether to allow the user to interrupt the agent by speaking. Default value: true. | true |
| VoiceprintConfig | object | The configuration for voiceprint recognition. | |
| VoiceprintId | string | zhixiaoxia | |
| UseVoiceprint | boolean | Specifies whether to enable voiceprint recognition. Default value: false. You must specify a valid voiceprint ID when you enable voiceprint recognition. | false |
| TurnDetectionConfig | object | The configuration for detecting the end of a user's conversational turn. | |
| SemanticWaitDuration | integer | Specifies how long to wait after a user stops speaking for the agent to decide if the turn is over. Unit: milliseconds. Default value: -1.
Note
In Normal mode, this field is ignored.
| -1 |
| TurnEndWords | array | Keywords that signify the end of the user's turn. | |
| string | A keyword that signifies the end of the user's turn. | ||
| Mode | string | The mode of turn detection.
| Semantic |
| ExperimentalConfig | string | The parameters for experimental features. Contact support for details. | "" |
| VcrConfig | object | Configuration for video content recognition. When enabled, the system sends callbacks to the client with details about content identified. | |
| PeopleCount | object | Configuration for the people counting feature. | |
| Enabled | boolean | Enables or disables the feature. Default value: false. | false |
| StillFrameMotion | object | Configuration for detecting still frames. | |
| Enabled | boolean | Enables or disables still frame detection. Default value: false. | false |
| CallbackDelay | integer | The delay in milliseconds before a still frame detection event is triggered. The callback is sent only after the video has been static for this duration. If not set, the value from the console configuration is used. Valid values: [200,5000]. | 3000 |
| Equipment | object | Configuration for device identification. | |
| Enabled | boolean | Enables or disables device identification. Default value: false. | false |
| HeadMotion | object | Configuration for head motion detection. | |
| Enabled | boolean | Enables or disables head motion detection. Default value: false. | false |
| LookAway | object | Configuration for detecting if the user is looking away from the screen. | |
| Enabled | boolean | Enables or disables this feature. Default value: false. | true |
| InvalidFrameMotion | object | Configuration for detecting invalid frames. | |
| Enabled | boolean | Enables or disables invalid frame detection. | false |
| CallbackDelay | integer | The delay in milliseconds before an invalid frame detection event is triggered. The callback is sent only after the frame has been considered invalid for this duration. If not set, the value from the console configuration is used. Valid values: [200, 5000]. | 3000 |
| AmbientSoundConfig | object | Configuration for the ambient sound played during the call. | |
| ResourceId | string | The ID of the ambient sound. This ID can be obtained from the advanced settings section of the agent configuration in the console. | f67901c595834************ |
| Volume | integer | The volume of the ambient sound. Valid values: [0, 100]. A value of 0 disables the ambient sound. | 50 |