|
Parameter |
Type |
Description |
Example |
|---|---|---|---|
|
object |
Specifies the configuration for the AI Agent. |
||
| Greeting |
string |
The greeting the AI Agent delivers at the start of a session. Changes to this value take effect in the next session. By default, no greeting is used. |
你好 |
| WakeUpQuery |
string |
A user-defined query that the AI Agent responds to immediately when the session starts. |
今天天气怎么样? |
| MaxIdleTime |
integer |
The maximum idle time in seconds. If the session remains idle for this period, the agent automatically ends the session. Default: 600. |
600 |
| UserOnlineTimeout |
integer |
The time in seconds the agent waits for a user to join. If a user does not join within this period, the agent terminates the session. Default: 60. |
60 |
| UserOfflineTimeout |
integer |
The timeout duration, in seconds, before the AI Agent terminates the session after the user has left. Default: 5. |
5 |
| EnablePushToTalk |
boolean |
Specifies whether to enable push-to-talk mode. Default: |
false |
| GracefulShutdown |
boolean |
Specifies whether to enable graceful shutdown. Default: When enabled, if the session is terminated, the AI Agent completes its current utterance before disconnecting. The agent speaks for a maximum of 10 seconds. |
false |
| Volume |
integer |
The speaking volume of the AI Agent.
|
100 |
| WorkflowOverrideParams |
string |
Specifies parameters to override the workflow configuration. By default, this is not set. |
{} |
| AvatarUrl |
string |
The URL for the AI Agent's profile image in audio-only calls. By default, no image is specified. |
http://example.com/a.jpg |
| AvatarUrlType |
string |
The type of the profile image URL. By default, this is not set. |
USER |
| EnableIntelligentSegment |
boolean |
Specifies whether to enable intelligent sentence segmentation. If enabled, the system intelligently merges short, consecutive user utterances into a single sentence. Default: |
true |
| AsrConfig |
object |
Specifies the Automatic Speech Recognition (ASR) configuration. |
|
| AsrLanguageId |
string |
The language ID for ASR. Valid values:
|
zh_mandarin |
| AsrMaxSilence |
integer |
The silence detection threshold for sentence segmentation. A silence period longer than this duration triggers a sentence break. Unit: milliseconds. Valid range: 200 to 1200. Default: 400. |
400 |
| AsrHotWords |
array |
A list of hotwords to improve ASR accuracy. You can specify up to 128 hotwords. |
|
|
string |
A hotword. The string must be between 1 and 10 characters in length. |
检查 |
|
| VadLevel |
integer |
Controls the sensitivity of the voice activity detection (VAD) for interruptions. A higher value makes the agent harder to interrupt. Valid range: 0 to 11. Default: 11.
|
11 |
| CustomParams |
string |
Specifies pass-through parameters for custom ASR integrations. |
mode=fast&sample=16000&format=wav |
| VadDuration |
integer |
The minimum duration of voice activity, in milliseconds, required to trigger an interruption. This helps control interruption sensitivity. A value of 0 disables this feature. Valid range: 200 to 2000. A typical setting is between 200 and 500, which corresponds to 1 to 4 words. By default, this parameter is not set and the feature is inactive. |
300 |
| TtsConfig |
object |
Specifies the Text-to-Speech (TTS) configuration. |
|
| VoiceId |
string |
The ID of the voice to use for synthesis. Changes take effect on the next utterance. If not specified, the agent uses the default voice from its template. This parameter only applies to preset TTS voices. Maximum length: 64 characters. For available values, see Voice Demos. |
longcheng_v2 |
| VoiceIdList |
array |
A list of available voices. |
|
|
string |
A voice ID. |
zhixiaoxia |
|
| PronunciationRules |
array |
A list of pronunciation rules for TTS, applied sequentially. You can specify up to 20 rules. |
|
|
object |
A TTS pronunciation rule. |
||
| Word |
string |
The word to replace. It must consist of Chinese characters, be 10 characters or fewer, and contain no spaces. |
一一零 |
| Pronunciation |
string |
The target pronunciation for the word. It must consist of Chinese characters, be 10 characters or fewer, and contain no spaces. |
幺幺零 |
| Type |
string |
The type of pronunciation rule. Valid value:
|
replacement |
| ModelId |
string |
Specifies the model ID. Currently, only minimax models are supported. Valid values: |
speech-01-turbo |
| LanguageId |
string |
Specifies the language ID. Currently, only minimax models are supported. By default, this parameter is empty. Setting this parameter enhances performance for the specified language or dialect. If you are unsure of the language, set the value to "auto" to enable automatic detection. Supported values include: |
Chinese |
| Emotion |
string |
Specifies the emotion for the synthesized speech. Currently, only minimax models support this feature. Valid values:
|
happy |
| SpeechRate |
number |
The speech rate. Supported on all platforms. |
1.0 |
| LlmConfig |
object |
Specifies the Large Language Model (LLM) configuration. |
|
| LlmHistory |
array |
The LLM/MLLM conversation history context. |
|
|
object |
A single turn in the conversation. |
||
| Role |
string |
The role of the participant in the conversation. Valid values:
|
user |
| Content |
string |
The text content of the message for the specified role. |
你好 |
| LlmHistoryLimit |
integer |
The maximum number of conversational turns to retain in the LLM/MLLM history. Default: 10. |
10 |
| LlmSystemPrompt |
string |
The system prompt for the LLM at the start of the call. |
你是一位友好且乐于助人的助手,专注于为用户提供准确的信息和建议。 |
| BailianAppParams |
string |
Parameters for Alibaba Cloud Model Studio (Bailian) applications, formatted as a JSON string. For parameter format details, see Alibaba Cloud Model Studio (Bailian) application parameters. |
"{\"biz_params\":{\"user_defined_params\":{\"your_plugin_id\":{\"article_index\":2}}},\"memory_id\":\"your_memory_id\",\"image_list\":[\"https://your_image_url\"],\"rag_options\":{\"pipeline_ids\":[\"your_id\"],\"file_ids\":[\"文档ID1\",\"文档ID2\"],\"metadata_filter\":{\"name\":\"张三\"},\"structured_filter\":{\"key1\":\"value1\",\"key2\":\"value2\"},\"tags\":[\"标签1\",\"标签2\"]}}" |
| OpenAIExtraQuery |
string |
Additional query parameters for an OpenAI-compatible LLM. Parameters must be in |
api-version=2024-02-01&api-key=sk-xxx |
| LlmCompleteReply |
boolean |
If enabled, the AI Agent sends the complete LLM result to the client after the full response is generated. This setting does not affect the streaming of subtitles. |
true |
| FunctionMap |
array |
A list of function mappings used to associate AI Agent capabilities with LLM functions. This is currently only supported for function calling with user-defined, OpenAI-compatible LLMs. |
|
|
object |
A single mapping rule. |
||
| Function |
string |
The name of the built-in function provided by the AI Agent system. Currently, only |
hangup |
| MatchFunction |
string |
The user-defined LLM function name that corresponds to the agent's built-in function. For details on the custom LLM protocol, see LLM standard interface. |
hangup |
| OutputMinLength |
integer |
The minimum length in characters for a text output chunk. Text shorter than this value is buffered. Valid range: 0 to 100. A value of 0 or an empty value (default) disables this limit. |
5 |
| OutputMaxDelay |
integer |
The maximum delay in milliseconds before buffered text is forcibly sent. Valid range: 1000 to 10000. A value of 0 or an empty value (default) disables this limit. |
2000 |
| HistorySyncWithTTS |
boolean |
Specifies whether the LLM message history should be synchronized with the content played by TTS. Default: Note
When a user interrupts the AI Agent, the system inserts an
|
false |
| AvatarConfig |
object |
The avatar configuration. This takes effect only if the workflow includes an avatar node. |
|
| AvatarId |
string |
The model ID of the avatar. |
5257 |
| InterruptConfig |
object |
Specifies the speech interruption strategy configuration. |
|
| EnableVoiceInterrupt |
boolean |
Specifies whether to allow voice interrupt. Default: |
true |
| InterruptWords |
array |
A list of specific words or phrases that trigger a conversation interruption. |
|
|
string |
A specific word or phrase that triggers a conversation interruption. |
打断一下 |
|
| NoInterruptMode |
string |
The ASR processing policy when interruptions are disabled.
By default, ASR text is cached. |
cache |
| KeepInterruptWordsForLLM |
boolean |
Specifies whether to include the interruption keywords in the text sent to the LLM. Default: |
|
| VoiceprintConfig |
object |
Specifies the voiceprint recognition configuration. |
|
| UseVoiceprint |
boolean |
Specifies whether to enable voiceprint recognition. Default: |
false |
| VoiceprintId |
string |
The unique ID for voiceprint recognition. By default, this is not set. You must register the provided voiceprint ID. For more information, see Register a voiceprint. |
zhixiaoxia |
| RegistrationMode |
string |
||
| TurnDetectionConfig |
object |
Specifies the conversational turn detection configuration. |
|
| TurnEndWords |
array |
A list of keywords that indicate the end of a user's turn. |
|
|
string |
A keyword that indicates the end of a user's turn. |
我说完了 |
|
| Mode |
string |
The mode for turn detection.
|
Semantic |
| SemanticWaitDuration |
integer |
The pause detection time in AI mode. Unit: milliseconds. Default: -1.
Note
This parameter is only effective in |
-1 |
| Eagerness |
string |
Controls how quickly the AI responds after detecting a pause. This parameter is only effective in
By default, this parameter is not set. |
High |
| ExperimentalConfig |
string |
Parameters for experimental features. Contact support for assistance if you need to use this. |
"" |
| VcrConfig |
object |
Configuration for video content recognition, which sends callbacks to the client about content that is identified in the video stream. |
|
| StillFrameMotion |
object |
Specifies the still frame detection configuration. |
|
| Enabled |
boolean |
Specifies whether to enable still frame detection. Default: |
false |
| CallbackDelay |
integer |
The delay in milliseconds before a still frame detection event is triggered. The system sends a notification only after the frame has been static for this duration. If not set, the value from the console configuration is used. Valid range: 200 to 5000. |
3000 |
| InvalidFrameMotion |
object |
Specifies the parameters for invalid frame detection. |
|
| Enabled |
boolean |
Specifies whether to enable invalid frame detection. Default: |
false |
| CallbackDelay |
integer |
The delay in milliseconds before an invalid frame detection event is triggered. The system sends a notification only after the frame has been invalid for this duration. If not set, the value from the console configuration is used. Valid range: 200 to 5000. |
3000 |
| PeopleCount |
object |
Configuration for the people counting feature. |
|
| Enabled |
boolean |
Specifies whether to enable the feature. Default: |
false |
| Equipment |
object |
Configuration for device identification. |
|
| Enabled |
boolean |
Specifies whether to check for prohibited devices. Default: |
false |
| HeadMotion |
object |
Configuration for head motion detection. |
|
| Enabled |
boolean |
Specifies whether to enable head motion detection. Default: |
false |
| LookAway |
object |
Configuration for gaze deviation detection. |
|
| Enabled |
boolean |
Specifies whether to enable gaze deviation detection. Default: |
true |
| AmbientSoundConfig |
object |
Specifies the ambient sound configuration. |
|
| ResourceId |
string |
The ID of the ambient sound. You can obtain this ID from the advanced configuration section of the agent settings in the console. |
f67901c595834************ |
| Volume |
integer |
The volume of the ambient sound. Valid range: 0 to 100. A value of 0 disables the sound. |
50 |
| AutoSpeechConfig |
object |
Manages the agent's proactive speech events, such as playing prompts during LLM delays or when the user is silent. |
|
| UserIdle |
object |
Prompts for when a user is idle for an extended period. |
|
| WaitTime |
integer |
The idle time threshold in milliseconds that triggers a prompt. Required. Valid range: 5000 to 600000. |
5000 |
| MaxRepeats |
integer |
The maximum number of times to prompt the user. After this limit is reached, the call is terminated. Required. Valid range: 0 to 10. |
5 |
| Messages |
array |
A collection of up to 10 query prompts. Each prompt must be 100 characters or less. The sum of all probabilities must be 100%. |
|
|
object |
A prompt and its probability. |
||
| Text |
string |
The text of the prompt. Maximum length: 100 characters. |
您还在吗? |
| Probability |
number |
The probability of this prompt being selected. Valid range: 0.0 to 1.0. |
0.5 |
| LlmPending |
object |
Configuration for prompts played during LLM response delays. |
|
| WaitTime |
integer |
The LLM response time threshold in milliseconds. If the response time exceeds this value, a prompt is played. Required. Valid range: 500 to 10000. Set this based on the actual performance of your LLM. |
3000 |
| Messages |
array |
A list of prompts. You can specify up to 10 prompts, each with a maximum length of 100 characters. The sum of probabilities for all prompts must be 1.0. |
|
|
object |
A prompt and its probability. |
||
| Text |
string |
The text of the prompt. Maximum length: 100 characters. |
稍等一下 |
| Probability |
number |
The probability of this prompt being selected. Valid range: 0.0 to 1.0. |
0.5 |
| BackChannelingConfigs |
array |
Configuration for backchanneling, which plays short, affirming phrases at specific triggers to acknowledge the user's speech. |
|
|
object |
A single backchanneling configuration. |
||
| Enabled |
boolean |
Specifies whether to enable this backchanneling rule. Required. |
true |
| TriggerStage |
string |
The trigger for the backchanneling phrase. Valid value:
|
pause_detected |
| Probability |
number |
The probability of this rule being triggered. Required. Valid range: 0.0 to 1.0. |
0.5 |
| Words |
array |
A list of backchanneling phrases. You can specify up to 10 phrases, each with a maximum length of 20 characters. The sum of probabilities for all phrases must be 1.0. |
|
|
object |
A backchanneling phrase and its probability. |
||
| Text |
string |
The text of the phrase. Required. Maximum length: 20 characters. Multi-language support. |
嗯嗯 |
| Probability |
number |
The probability of this phrase being selected. Required. Valid range: 0.0 to 1.0. |
0.3 |
| BackChannelingConfig |
array |
Important This parameter is deprecated. Use BackChannelingConfigs instead. |
|
|
object |
A single backchanneling configuration. |
||
| Enabled |
boolean |
Specifies whether to enable this backchanneling rule. Required. |
true |
| TriggerStage |
string |
The trigger timing for the backchanneling response. Valid values:
|
pause_detected |
| Probability |
number |
The probability that the feature is triggered. The valid range is 0.0–1.0. This parameter is required. |
0.5 |
| Words |
array |
A collection of up to 10 backchanneling phrases. Each phrase must be 20 characters or less. The sum of the probabilities must be 1.0. |
|
|
object |
Configuration for a backchanneling phrase. |
||
| Text |
string |
The text of the phrase. The maximum length is 20 characters. Multiple languages are supported. This parameter is required. |
嗯嗯 |
| Probability |
number |
The probability that this phrase is triggered. The value must be between 0.0 and 1.0. This parameter is required. |
0.3 |