All Products
Search
Document Center

Intelligent Media Services:AIAgentConfig

更新時間:Dec 08, 2025
ParameterTypeDescriptionExample
object

The template configuration of the AI agent.

Greetingstring

The welcome message that the agent says upon joining. Changes take effect in the next session. Default value: None.

WakeUpQuerystring

A command given to the agent before the call starts. The agent will respond to this query immediately after the call begins.

MaxIdleTimeinteger

The maximum time the agent will wait for interaction before it hangs up. Unit: seconds. Default value: 600.

600
UserOnlineTimeoutinteger

The timeout period for the agent to close the task if no user joins the channel. Unit: seconds. Default value: 60.

60
UserOfflineTimeoutinteger

The timeout period for the agent to close the task after the user has left the channel. Unit: seconds. Default value: 5.

5
EnablePushToTalkboolean

Specifies whether to enable the push-to-talk mode. Default value: false.

false
GracefulShutdownboolean

Specifies whether to enable graceful shutdown. Default value: false.

If enabled, when the agent is stopped, it will finish its current sentence before disconnecting (up to 10 seconds).

false
Volumelong

The agent's speaking volume.

  • If this parameter is not specified, the adaptive volume mode is used by default.
  • To specify this parameter, enter a value between 0 and 400. Output volume = Workflow output volume × Volume/100. Example:
  1. If Volume is set to 0, the output is muted.
  2. If Volume is set to 100, the output volume is the original volume.
  3. If Volume is set to 200, the output volume is 2 times the original volume.
100
WorkflowOverrideParamsstring

The parameters to override the workflow configuration. Default value: None.

{}
AvatarUrlstring

The URL for the agent's profile image in audio-only calls. Default value: None.

http://example.com/a.jpg
AvatarUrlTypestring

The type of the avatar URL. Default value: None.

USER
EnableIntelligentSegmentboolean

If enabled, the system intelligently merges short, interim segments into a single sentence. Default value: true.

true
AsrConfigobject

The configuration for Automatic Speech Recognition (ASR).

AsrMaxSilenceinteger

The silence threshold for sentence segmentation. A pause longer than this value is considered a sentence break. Unit: milliseconds. Default value: 400. Valid values: 200 to 1200.

400
AsrLanguageIdstring

The language ID for ASR. Valid values:

  • zh_mandarin: Chinese
  • en: English
  • zh_en: Chinese and English
  • es: Spanish
  • jp: Japanese
zh_mandarin
CustomParamsstring

Passthrough parameters for ASR.

mode=fast&sample=16000&format=wav
VadDurationinteger

The minimum duration for voice activity detection, in milliseconds. This parameter controls the sensitivity of interruptions, preventing the agent from cutting off user speech too early during short pauses. 0: Disables this feature. Valid values: 200 to 2000. Recommended: 200 to 500 ms, which typically corresponds to the length of 1 to 4 words. By default, this parameter is left empty, which indicates the feature is disabled.

300
AsrHotWordsarray

Hotwords for ASR to improve recognition accuracy. Maximum of 128 hotwords.

string

The hotword. Length: 1 to 10 characters.

VadLevelinteger

The voice activity detection (VAD) threshold for interruption. A higher value makes it harder to trigger interruptions. Valid values: 0 to 10. Default value: 1. The value of 0 specifies to disable the VAD feature.

1
TtsConfigobject

The configuration for Text-to-Speech (TTS).

PronunciationRulesarray<object>

The pronunciation rules, executed in order. Maximum of 20 rules.

object

The pronunciation rule.

Typestring

The type of rule. Valid value:

  • replacement: replaces every occurrence of Word value with Pronunciation value.
replacement
Wordstring

The word to be replaced. The value supports up to 10 Chinese characters. Other characters, including spaces, are not supported.

Pronunciationstring

The target pronunciation. The value supports up to 10 Chinese characters. Other characters, including spaces, are not supported.

VoiceIdListarray

Available voices.

string

The voice.

zhixiaoxia
VoiceIdstring

The voice ID. Changes take effect on the next sentence. If not set, the system uses the default voice ID specified in the agent template. This parameter takes effect only for the preset TTS model. Max length: 64 characters. Refer to Intelligent voice samples for options.

longcheng_v2
Emotionstring

Applies only to MiniMax models. Seven types of emotions are supported:

  • happy
  • sad
  • angry
  • fearful
  • disgusted
  • surprised
  • calm
happy
ModelIdstring

Applies only to MiniMax models. Valid values: speech-01-turbo and speech-02-turbo

speech-01-turbo
LanguageIdstring

Applies only to MiniMax models. By default, this parameter is left empty. This enhances speech recognition accuracy for specific languages and dialects. If the language type is unknown, set it to auto to have the model automatically detect it. Valid values:

Supported languages

  • Chinese
  • Chinese,Yue
  • English
  • Arabic
  • Russian
  • Spanish
  • French
  • Portuguese
  • German
  • Turkish
  • Dutch
  • Ukrainian
  • Vietnamese
  • Indonesian
  • Japanese
  • Italian
  • Korean
  • Thai
  • Polish
  • Romanian
  • Greek
  • Czech
  • Finnish
  • Hindi
  • auto
Chinese
SpeechRatedouble

Supports all platforms. For CosyVoice, the default value is 1.0. Valid values: 0.5 to 2.0. For MiniMax, the default value is 1.0. Valid values: 0.5 to 2.0.

1.0
LlmConfigobject

The configuration for the large language model (LLM).

FunctionMaparray<object>

Maps agent capabilities to LLM functions. Only supports function calling with custom LLMs that adhere to the OpenAI protocol.

object

A single mapping rule.

Functionstring

The name of the built-in agent capability. Only hangup is supported.

hangup
MatchFunctionstring

The corresponding user-defined function name in your LLM. When the LLM calls this function, it will trigger the mapped agent capability.

hangup
LlmHistoryLimitinteger

The maximum number of conversational turns to retain in the history. Default value: 10.

10
LlmCompleteReplyboolean

If true, the service sends the complete result from the LLM to the client in a single response after the generation process is finished.

true
LlmHistoryarray<object>

The LLM/MLLM conversation history context.

object

A single session.

Rolestring

The role of the participant in the conversation. Valid values:

  • user
  • assistant
  • system
  • function
  • plugin
  • tool
user
Contentstring

The actual text content of the message for that role.

LlmSystemPromptstring

The system prompt for the LLM.

OpenAIExtraQuerystring

Additional query parameters to be sent to the OpenAI-protocol LLM, formatted as a URL query string (key=value pairs separated by &). All values must be strings.

api-version=2024-02-01&api-key=sk-xxx
OutputMaxDelayinteger

The maximum time (in milliseconds) to buffer text before it is forcibly sent to the client. Valid values: [1000,10000]. A value of 0 or an empty string (default) disables this limit.

2000
BailianAppParamsstring

Alibaba Cloud Model Studio Application Center parameters in a JSON format. Reference: Model Studio Application Center Parameter

OutputMinLengthinteger

The minimum number of characters that must be buffered before a text chunk is sent. Valid values: [0, 100]. A value of 0 or an empty string (default) disables this limit.

5
AvatarConfigobject

The avatar configuration. Only effective if the workflow includes an avatar node.

AvatarIdstring

The model ID of the avatar.

5257
InterruptConfigobject

The configuration for the speech interruption strategy.

InterruptWordsarray

Words or phrases that will trigger an interruption.

string

A word or phrase that will trigger an interruption.

EnableVoiceInterruptboolean

Specifies whether to allow the user to interrupt the agent by speaking. Default value: true.

true
VoiceprintConfigobject

The configuration for voiceprint recognition.

VoiceprintIdstring

The unique ID of the voiceprint. Default value: None.

zhixiaoxia
UseVoiceprintboolean

Specifies whether to enable voiceprint recognition. Default value: false. You must specify a valid voiceprint ID when you enable voiceprint recognition.

false
TurnDetectionConfigobject

The configuration for detecting the end of a user's conversational turn.

SemanticWaitDurationinteger

Specifies how long to wait after a user stops speaking for the agent to decide if the turn is over. Unit: milliseconds. Default value: -1.

  • -1: AI decides an appropriate wait time automatically.
  • 0 to 10000: A custom wait time. Recommended: 0 to 1500 ms.
Note In Normal mode, this field is ignored.
-1
TurnEndWordsarray

Keywords that signify the end of the user's turn.

string

A keyword that signifies the end of the user's turn.

Modestring

The mode of turn detection.

  • Normal: uses simple pause detection.
  • Semantic: uses AI to analyze context.
Semantic
ExperimentalConfigstring

The parameters for experimental features. Contact support for details.

""
VcrConfigobject

Configuration for video content recognition. When enabled, the system sends callbacks to the client with details about content identified.

PeopleCountobject

Configuration for the people counting feature.

Enabledboolean

Enables or disables the feature. Default value: false.

false
StillFrameMotionobject

Configuration for detecting still frames.

Enabledboolean

Enables or disables still frame detection. Default value: false.

false
CallbackDelayinteger

The delay in milliseconds before a still frame detection event is triggered. The callback is sent only after the video has been static for this duration. If not set, the value from the console configuration is used. Valid values: [200,5000].

3000
Equipmentobject

Configuration for device identification.

Enabledboolean

Enables or disables device identification. Default value: false.

false
HeadMotionobject

Configuration for head motion detection.

Enabledboolean

Enables or disables head motion detection. Default value: false.

false
LookAwayobject

Configuration for detecting if the user is looking away from the screen.

Enabledboolean

Enables or disables this feature. Default value: false.

true
InvalidFrameMotionobject

Configuration for detecting invalid frames.

Enabledboolean

Enables or disables invalid frame detection.

false
CallbackDelayinteger

The delay in milliseconds before an invalid frame detection event is triggered. The callback is sent only after the frame has been considered invalid for this duration. If not set, the value from the console configuration is used. Valid values: [200, 5000].

3000
AmbientSoundConfigobject

Configuration for the ambient sound played during the call.

ResourceIdstring

The ID of the ambient sound. This ID can be obtained from the advanced settings section of the agent configuration in the console.

f67901c595834************
Volumeinteger

The volume of the ambient sound. Valid values: [0, 100]. A value of 0 disables the ambient sound.

50