|
Parameter |
Type |
Description |
Example |
|---|---|---|---|
|
object |
Parameters for the agent template. |
||
| Greeting |
string |
The greeting message. Changes take effect the next time the agent joins a session. By default, no greeting is set. |
你好 |
| WakeUpQuery |
string |
An instruction from the user before the call starts. The agent responds to this instruction immediately after the call begins. |
今天天气怎么样? |
| MaxIdleTime |
integer |
The maximum time to wait for user interaction before the agent goes offline. Unit: seconds. Default: 600. |
600 |
| UserOnlineTimeout |
integer |
The timeout period for the agent to shut down the task if the user does not join the session. Unit: seconds. Default: 60. |
60 |
| UserOfflineTimeout |
integer |
The timeout period for the agent to shut down the task after the user leaves the session. Unit: seconds. Default: 5. |
5 |
| EnablePushToTalk |
boolean |
Specifies whether to enable push-to-talk mode. Default: false. |
false |
| GracefulShutdown |
boolean |
Specifies whether to enable graceful shutdown. Default: false. Graceful shutdown: When the agent is stopped, it finishes speaking the current sentence before stopping. The playback lasts for a maximum of 10 seconds. |
false |
| Volume |
integer |
The speaking volume of the agent.
|
100 |
| WorkflowOverrideParams |
string |
The parameters to overwrite the workflow. By default, this is not set. |
{} |
| AvatarUrl |
string |
The URL of the agent's profile picture for voice calls. By default, this is not set. |
http://example.com/a.jpg |
| AvatarUrlType |
string |
The type of the agent's profile picture URL. By default, this is not set. |
USER |
| EnableIntelligentSegment |
boolean |
The switch for intelligent sentence segmentation. When enabled, pauses in the user's speech are intelligently merged into a single sentence. Default: true. |
true |
| AsrConfig |
object |
Speech recognition configuration. |
|
| AsrLanguageId |
string |
The language ID for Automatic Speech Recognition (ASR). Valid values:
|
zh_mandarin |
| AsrMaxSilence |
integer |
The threshold for speech segmentation. A silence duration exceeding this threshold is considered a sentence break. The valid range is 200 ms to 1200 ms. Default: 400 ms. |
400 |
| AsrHotWords |
array |
A list of hotwords for ASR. The list can contain up to 128 words. |
|
|
string |
The hotword string. The string must be 1 to 10 characters in length. |
检查 |
|
| VadLevel |
integer |
The threshold parameter for interruptions. Valid range: [0, 11]. Default: 11.
|
11 |
| CustomParams |
string |
The pass-through parameters for custom ASR integration. |
mode=fast&sample=16000&format=wav |
| VadDuration |
integer |
The minimum duration threshold for VAD to control interruption sensitivity. A value of 0 disables this feature. Valid range: 200 to 2000 milliseconds. A common range is [200, 500], which corresponds to 1 to 4 words. By default, this parameter is empty and does not take effect. |
300 |
| TtsConfig |
object |
Speech synthesis configuration. |
|
| VoiceId |
string |
The voice ID. Changes take effect on the next sentence. If you do not set this parameter, the voice ID configured in the agent template is used. This parameter is valid only for preset Text-to-Speech (TTS) voices. The value can be up to 64 characters long. For more information about valid values, see Examples of intelligent speech effects. |
longcheng_v2 |
| VoiceIdList |
array |
A list of available voices. |
|
|
string |
Voice |
zhixiaoxia |
|
| PronunciationRules |
array |
The pronunciation rules for TTS. The array can contain up to 20 rules. The rules are executed in order. |
|
|
object |
A TTS pronunciation rule. |
||
| Word |
string |
The word to be replaced. It must be less than 10 characters long, consist of Chinese characters, and not contain spaces. |
一一零 |
| Pronunciation |
string |
The target pronunciation. It must be less than 10 characters long, consist of Chinese characters, and not contain spaces. |
幺幺零 |
| Type |
string |
The type of the pronunciation rule. Valid value:
|
replacement |
| ModelId |
string |
Currently, only minimax is supported. Valid values: speech-01-turbo / speech-02-turbo |
speech-01-turbo |
| LanguageId |
string |
Currently, only minimax is supported. The default value is empty. This parameter enhances the recognition of specified minority languages and dialects, improving speech performance in those scenarios. If the language type is unclear, you can set this to "auto", and the model will automatically determine the language. The following values are supported: |
Chinese |
| Emotion |
string |
Currently, only minimax is supported. Minimax supports the following seven emotions:
|
happy |
| SpeechRate |
number |
Supported on all platforms. For cosyvoice, the default is 1.0 and the valid range is 0.5 to 2.0. For minimax, the default is 1.0 and the valid range is 0.5 to 2.0. |
1.0 |
| LlmConfig |
object |
Large language model configuration. |
|
| LlmHistory |
array |
The historical conversation context for the Large Language Model (LLM)/MLLM. |
|
|
object |
A single conversation. |
||
| Role |
string |
The role of the participant in the conversation. Valid values:
|
user |
| Content |
string |
The actual text content of the conversation. It records the specific expression or response of the role in the conversation. |
你好 |
| LlmHistoryLimit |
integer |
The maximum number of conversation rounds to retain in the LLM/MLLM history. Default: 10. |
10 |
| LlmSystemPrompt |
string |
The system prompt for the LLM after the call starts. |
你是一位友好且乐于助人的助手,专注于为用户提供准确的信息和建议。 |
| BailianAppParams |
string |
The parameters for Alibaba Cloud Model Studio Application Center, in a JSON string format. For more information about the parameter format, see Alibaba Cloud Model Studio Application Center parameters |
"{\"biz_params\":{\"user_defined_params\":{\"your_plugin_id\":{\"article_index\":2}}},\"memory_id\":\"your_memory_id\",\"image_list\":[\"https://your_image_url\"],\"rag_options\":{\"pipeline_ids\":[\"your_id\"],\"file_ids\":[\"文档ID1\",\"文档ID2\"],\"metadata_filter\":{\"name\":\"张三\"},\"structured_filter\":{\"key1\":\"value1\",\"key2\":\"value2\"},\"tags\":[\"标签1\",\"标签2\"]}}" |
| OpenAIExtraQuery |
string |
Extra query parameters for an OpenAI protocol-based LLM. Parameters must be in key=value format, with multiple parameters connected by ampersands (&). All values must be strings. |
api-version=2024-02-01&api-key=sk-xxx |
| LlmCompleteReply |
boolean |
If enabled, the agent sends the complete LLM result to the client after the LLM provides a full response. This switch does not affect the streaming generation of captions. |
true |
| FunctionMap |
array |
A list of function mappings used to associate agent capabilities with LLM functions. Currently, this only supports function invocation for user-defined, OpenAI protocol-based LLMs. |
|
|
object |
A single mapping rule. |
||
| Function |
string |
The name of a built-in function provided by the Alibaba agent system. Currently, only hangup is supported. |
hangup |
| MatchFunction |
string |
The name of the LLM function to map to this feature. This is user-defined and used to invoke the corresponding feature in the LLM. For more information about the user-defined LLM protocol, see Standard LLM interfaces |
hangup |
| OutputMinLength |
integer |
The minimum length of the text output in characters. Text shorter than this length is cached and waits to be concatenated. Valid range: [0, 100]. A value of 0 or empty means no limit. Default: empty. |
5 |
| OutputMaxDelay |
integer |
The maximum delay for text output in milliseconds. If this time is exceeded, the cached text is forcibly output. Valid range: [1000, 10000]. A value of 0 or empty means no limit. Default: empty. |
2000 |
| HistorySyncWithTTS |
boolean |
Specifies whether the LLM message history is consistent with the content played by TTS. Default: false. If enabled, the saved LLM messages will be consistent with the content played by TTS. |
false |
| AvatarConfig |
object |
The digital human configuration. This takes effect only if the workflow contains a digital human node. |
|
| AvatarId |
string |
The model ID of the digital human. |
5257 |
| InterruptConfig |
object |
The configuration for the voice interruption policy. |
|
| EnableVoiceInterrupt |
boolean |
Specifies whether to support voice interruption. Default: true. |
true |
| InterruptWords |
array |
A list of specific words or phrases that trigger a conversation interruption. |
|
|
string |
A specific word or phrase that triggers a conversation interruption. |
打断一下 |
|
| NoInterruptMode |
string |
The ASR processing policy in
The default behavior is to cache the ASR text. |
cache |
| VoiceprintConfig |
object |
Voiceprint configuration. |
|
| UseVoiceprint |
boolean |
Specifies whether to use voiceprint recognition. Default: false. A valid voiceprint ID must be provided when you enable this feature. |
false |
| VoiceprintId |
string |
The unique ID for voiceprint recognition. By default, this is not set. The provided voiceprint ID must be registered through the voiceprint registration interface. For more information, see Register a human voiceprint. |
zhixiaoxia |
| RegistrationMode |
string |
||
| TurnDetectionConfig |
object |
Configuration for conversation round detection. |
|
| TurnEndWords |
array |
A list of keywords used to determine the end of a user's turn. |
|
|
string |
A keyword used to determine the end of a user's turn. |
我说完了 |
|
| Mode |
string |
The mode for turn detection.
|
Semantic |
| SemanticWaitDuration |
integer |
The pause detection time in AI mode. Unit: milliseconds. Default: -1.
Note
This parameter is not valid in Normal mode. |
-1 |
| Eagerness |
string |
This parameter is valid only in Semantic mode. It controls how quickly the agent responds after the AI detects a pause:
By default, this field is empty. |
High |
| ExperimentalConfig |
string |
Parameters for experimental features. If you have any requirements, contact technical support. |
"" |
| VcrConfig |
object |
Configuration for the video content recognition feature. This supports sending a callback to the client with the content that the algorithm recognizes in the video. |
|
| StillFrameMotion |
object |
Configuration for still frame detection. |
|
| Enabled |
boolean |
Specifies whether to enable still frame detection. Default: false. |
false |
| CallbackDelay |
integer |
The delay for sending a notification after a still frame is detected. After you set this, a notification is triggered only after a still frame persists for the specified duration. Unit: milliseconds. By default, this is empty, and the configuration in the console is used for the call. Valid range: [200, 5000]. |
3000 |
| InvalidFrameMotion |
object |
Parameter configuration for invalid frame detection. |
|
| Enabled |
boolean |
Specifies whether to enable invalid frame detection. |
false |
| CallbackDelay |
integer |
The delay for sending a notification after an invalid frame is detected. After you set this, a notification is triggered only after an invalid frame persists for the specified duration. Unit: milliseconds. By default, this is empty, and the configuration in the console is used for the call. Valid range: [200, 5000]. |
3000 |
| PeopleCount |
object |
Configuration for the people counting feature. |
|
| Enabled |
boolean |
The switch for this feature. Default: false. |
false |
| Equipment |
object |
Configuration for device recognition. |
|
| Enabled |
boolean |
Specifies whether to check for prohibited devices. Default: false. |
false |
| HeadMotion |
object |
Configuration for head motion recognition. |
|
| Enabled |
boolean |
Specifies whether to enable head motion recognition. Default: false. |
false |
| LookAway |
object |
Configuration for gaze aversion recognition. |
|
| Enabled |
boolean |
Specifies whether to enable gaze aversion recognition. Default: false. |
true |
| AmbientSoundConfig |
object |
Configuration for ambient sound during a call. |
|
| ResourceId |
string |
The ID of the ambient sound. You can obtain this ID from the advanced configuration of the agent in the console. |
f67901c595834************ |
| Volume |
integer |
The volume of the background sound for the call. Valid range: [0, 100]. A value of 0 means the sound is off. |
50 |
| AutoSpeechConfig |
object |
The configuration module for the agent's automatic speech, including prompts while waiting for the LLM and inquiries during long periods of user silence. |
|
| UserIdle |
object |
Configuration for inquiry playback when the user is silent for a long time. |
|
| WaitTime |
integer |
The silence duration threshold. Unit: milliseconds. Required. An inquiry is triggered if the silence exceeds this duration. Valid range: 5000–600000 ms. |
5000 |
| MaxRepeats |
integer |
The maximum number of inquiries. Valid range: 0–10. Required. If this number is exceeded, no more inquiries are triggered, and the call is terminated. |
5 |
| Messages |
array |
A collection of inquiry prompts. You can have up to 10 prompts. Each prompt can be up to 100 characters long. The sum of probabilities must be 100%. |
|
|
object |
The structure of an inquiry prompt. |
||
| Text |
string |
The text of the inquiry prompt. It can be up to 100 characters long. |
您还在吗? |
| Probability |
number |
The probability of selecting this prompt. Valid range: 0–1, which corresponds to 0%–100%. |
0.5 |
| LlmPending |
object |
Configuration for playback when the LLM response is delayed. |
|
| WaitTime |
integer |
The threshold for waiting for an LLM response. A prompt is played if the wait time exceeds this threshold. Required. Unit: milliseconds. Valid range: 500–10000 ms. Set this based on the actual performance of your LLM. |
3000 |
| Messages |
array |
A collection of inquiry prompts. You can have up to 10 prompts. Each prompt can be up to 100 characters long. The sum of probabilities must be 100%. |
|
|
object |
The structure of an inquiry prompt. |
||
| Text |
string |
The text of the inquiry prompt. It can be up to 100 characters long. |
稍等一下 |
| Probability |
number |
The probability of selecting this prompt. Valid range: 0–1, which corresponds to 0%–100%. |
0.5 |
| BackChannelingConfigs |
array |
||
|
object |
|||
| Enabled |
boolean |
||
| TriggerStage |
string |
||
| Probability |
number |
||
| Words |
array |
||
|
object |
|||
| Text |
string |
||
| Probability |
number |
||
| BackChannelingConfig |
array |
The configuration module for the backchanneling feature. When enabled, the system randomly plays short backchanneling phrases at specific trigger points. |
|
|
object |
A single backchanneling configuration. |
||
| Enabled |
boolean |
Specifies whether to enable the backchanneling feature. Required. Valid values: true, false. |
true |
| TriggerStage |
string |
The trigger for backchanneling. Valid value:
|
pause_detected |
| Probability |
number |
The probability of triggering the feature. Valid range: 0.0–1.0. Required. |
0.5 |
| Words |
array |
A collection of backchanneling phrases. You can have up to 10 phrases. Each phrase can be up to 20 characters long. The sum of probabilities must be 1.0. |
|
|
object |
Configuration for a backchanneling phrase. |
||
| Text |
string |
The text of the phrase. It can be up to 20 characters long and supports multiple languages. Required. |
嗯嗯 |
| Probability |
number |
The probability of triggering this phrase. Valid range: 0.0–1.0. Required. |
0.3 |