Referensi AIAgentConfig untuk sesi agen AI Real-time - Intelligent Media Services - Alibaba Cloud - Intelligent Media Services

object

Parameter templat agen.

Greeting

string

Paket salam. Berlaku saat sesi berikutnya bergabung setelah modifikasi. Nilai default: tidak ada.

你好

WakeUpQuery

string

Instruksi yang diberikan pengguna sebelum panggilan dimulai. Agen akan segera merespons kalimat ini setelah panggilan dimulai.

今天天气怎么样？

MaxIdleTime

integer

Pengatur waktu idle maksimum tanpa interaksi dengan agen. Agen akan menjadi offline ketika batas pengatur waktu ini terlampaui. Unit: seconds. Nilai default: 600 seconds.

600

UserOnlineTimeout

integer

Batas pengatur waktu setelah tugas agen ditutup jika pengguna belum bergabung ke sesi. Unit: seconds. Nilai default: 60 seconds.

60

UserOfflineTimeout

integer

Batas pengatur waktu setelah tugas agen ditutup setelah pengguna meninggalkan sesi. Unit: seconds. Nilai default: 5 seconds.

5

EnablePushToTalk

boolean

Apakah akan mengaktifkan mode dorong-untuk-berbicara. Nilai default: false.

false

GracefulShutdown

boolean

Apakah akan melakukan shutdown yang mulus. Nilai default: false.

Shutdown yang mulus: ketika agen dihentikan, agen akan menyelesaikan pengucapan kalimat saat ini sebelum berhenti, dengan pengatur waktu pemutaran maksimum 10 detik.

false

Volume

integer

Volume saat agen berbicara.

Jika dibiarkan kosong: mode volume adaptif yang direkomendasikan Alibaba Cloud digunakan secara default.
Jika ditentukan: rentang valid adalah 0 hingga 400. Volume keluaran = volume keluaran suara di alur kerja * volume / 100. Contoh:

Jika volume = 0, volume keluaran adalah 0.
Jika volume = 100, volume sama dengan volume asli.
Jika volume = 200, volume menjadi dua kali lipat volume asli.

100

WorkflowOverrideParams

string

Parameter penggantian alur kerja. Nilai default: tidak ada.

{}

AvatarUrl

string

URL gambar avatar agen untuk panggilan suara. Nilai default: tidak ada.

http://example.com/a.jpg

AvatarUrlType

string

Tipe URL avatar agen. Nilai default: tidak ada.

USER

EnableIntelligentSegment

boolean

Sakelar segmentasi kalimat cerdas. Saat diaktifkan, segmentasi yang terjadi saat pengguna berbicara akan digabungkan secara cerdas menjadi satu kalimat. Nilai default: true.

true

AsrConfig

object

Konfigurasi pengenalan suara.

AsrLanguageId

string

ASR language ID. Available options:

zh_mandarin: Chinese
en: English
zh_en: Mixed Chinese and English
es: Spanish
jp: Japanese

zh_mandarin

AsrMaxSilence

integer

Threshold for voice sentence segmentation detection. A segmentation is triggered when the silence duration exceeds this threshold. Valid range: 200ms to 1200ms. Default: 400ms.

400

AsrHotWords

array

ASR hot word list. The hot word list supports up to 128 words.

string

Hot word string. Character length: [1, 10] characters.

检查

VadLevel

integer

Interruption threshold parameter. Valid range: [0, 11]. Default: 11.

0 means the VAD function is disabled.
1-10: a higher value makes interruption more difficult.
11 is significantly different from the previous values. The pre-processing causes less damage to conversation audio and provides stronger noise resistance.

11

CustomParams

string

Pass-through parameters for the in-house ASR integration.

mode=fast&sample=16000&format=wav

VadDuration

integer

Minimum duration threshold for voice activity detection, used to control interruption sensitivity. 0 disables this feature. Valid range: 200 to 2000 milliseconds. Commonly used values [200, 500] correspond to 1-4 characters. Default: empty, not applied.

300

TtsConfig

object

Konfigurasi sintesis suara.

VoiceId

string

Voice ID. Takes effect on the next sentence after modification. If not specified, the voice ID configured in the agent template is used. Only applies to built-in TTS. Input length must not exceed 64 characters. For available values, refer to Intelligent Voice Effect Examples.

longcheng_v2

VoiceIdList

array

List of selectable voices.

string

Voice.

zhixiaoxia

PronunciationRules

array

TTS pronunciation rules. The array length must not exceed 20, and rules are executed in order.

object

TTS pronunciation rule.

Word

string

一一零

Pronunciation

string

幺幺零

Type

string

replacement

ModelId

string

Currently only minimax is supported. Available values: speech-01-turbo / speech-02-turbo

speech-01-turbo

LanguageId

string

Currently only minimax is supported. Default: empty. Enhances recognition of specified minor languages and dialects. Once configured, it can improve voice performance in scenarios involving the specified minor language/dialect. If the minor language type is not clear, you can select "auto" to let the model determine the minor language type. The following values are supported:

Supported languages

Chinese
Chinese,Yue: Cantonese
English
Arabic
Russian
Spanish
French
Portuguese
German
Turkish
Dutch
Ukrainian
Vietnamese
Indonesian
Japanese
Italian
Korean
Thai
Polish
Romanian
Greek
Czech
Finnish
Hindi
auto: automatic detection

Chinese

Emotion

string

Currently only minimax is supported. Minimax currently supports 7 emotions:

happy
sad
angry
fearful
disgusted
surprised
calm: neutral

happy

SpeechRate

number

Supported on all platforms. For cosyvoice, the default is 1.0 with a valid range of 0.5-2.0. For minimax, the default is 1.0 with a valid range of 0.5-2.0.

1.0

LlmConfig

object

Konfigurasi model bahasa besar.

LlmHistory

array

LLM/MLLM conversation history context.

object

A single conversation turn.

Role

string

user

Content

string

你好

LlmHistoryLimit

integer

Maximum number of LLM/MLLM conversation history rounds to retain. Default: 10.

10

LlmSystemPrompt

string

The LLM system prompt used after the call is started.

你是一位友好且乐于助人的助手，专注于为用户提供准确的信息和建议。

BailianAppParams

string

Alibaba Cloud Bailian application center parameters, as a JSON string. For the parameter format, refer to: Alibaba Cloud Bailian Application Center Parameters

"{\"biz_params\":{\"user_defined_params\":{\"your_plugin_id\":{\"article_index\":2}}},\"memory_id\":\"your_memory_id\",\"image_list\":[\"https://your_image_url\"],\"rag_options\":{\"pipeline_ids\":[\"your_id\"],\"file_ids\":[\"文档ID1\",\"文档ID2\"],\"metadata_filter\":{\"name\":\"张三\"},\"structured_filter\":{\"key1\":\"value1\",\"key2\":\"value2\"},\"tags\":[\"标签1\",\"标签2\"]}}"

OpenAIExtraQuery

string

Additional query parameters for the OpenAI-protocol LLM. Parameters must use the key=value format. Multiple parameters are joined with &. All values must be strings.

api-version=2024-02-01&api-key=sk-xxx

LlmCompleteReply

boolean

When enabled, after the LLM produces a complete reply, the agent sends the complete LLM result to the client. This switch does not affect the streaming generation of subtitles.

true

FunctionMap

array

Function mapping list, used to map agent capabilities to LLM functions. Currently only supports function calling with user-defined OpenAI-protocol LLMs.

object

A single mapping rule.

Function

string

hangup

MatchFunction

string

hangup

OutputMinLength

integer

Minimum length (in characters) of text output. Text shorter than this length is buffered for concatenation. Range: [0, 100]. 0 or empty means no limit. Default: empty.

5

OutputMaxDelay

integer

Maximum delay time (in milliseconds) for text output. When this time is exceeded, the buffered text is forcibly output. Range: [1000, 10000]. 0 or empty means no limit. Default: empty.

2000

HistorySyncWithTTS

boolean

Whether the large model message history should stay consistent with the TTS playback content. Default: false. When enabled, the saved large model messages stay consistent with the TTS playback content.

Catatan

When the user interrupts the Agent in the large model message history, the next message sent to the large model will have an <ims_agent_interrupted> tag inserted at the interruption point. For example:

[
  {"role": "user", "content": "Tell me a story."},
  {"role": "assistant", "content": "Sure, let me tell you a story from Romance of the Three Kingdoms. Would you<ims_agent_interrupted> like to hear it?"},
  {"role": "user", "content": "Switch to another one."}
]

false

AvatarConfig

object

Konfigurasi manusia digital. Hanya berlaku ketika alur kerja menyertakan node manusia digital.

AvatarId

string

Model ID of the digital human.

5257

InterruptConfig

object

Konfigurasi kebijakan interupsi suara.

EnableVoiceInterrupt

boolean

Whether voice interruption is supported. Default: true.

true

InterruptWords

array

List of specific words or phrases that trigger conversation interruption.

string

A specific word or phrase that triggers conversation interruption.

打断一下

NoInterruptMode

string

ASR processing strategy under mode:

cache: caches the ASR text. After the current turn ends, the cached ASR text is processed together in the next turn.
discard: directly discards the ASR text.

The default behavior is to cache the ASR text.

cache

KeepInterruptWordsForLLM

boolean

Whether to keep interruption words and pass them to the LLM. By default, they are discarded.

Catatan

Example: the interruption word is "wait". "wait how is the weather today" becomes "how is the weather today" after the interruption word is discarded before being sent to the LLM.

true

VoiceprintConfig

object

Konfigurasi voiceprint.

UseVoiceprint

boolean

Switch for whether to use voiceprint recognition. Default: false. When voiceprint recognition is enabled, a valid voiceprint ID must be passed in.

false

VoiceprintId

string

The unique identity ID for voiceprint recognition. Default: empty. The passed-in voiceprint ID must have been registered via the voiceprint registration interface. For the interface documentation, refer to: Register Human Voiceprint

zhixiaoxia

RegistrationMode

string

Voiceprint registration mode. Default: Explicit

Value	Description
Explicit	Explicit registration mode. The user must upload audio in advance via the voiceprint registration interface to complete registration.
Implicit	Implicit registration mode. The user's voice is automatically collected during the conversation to generate voiceprint features.

Explicit

TurnDetectionConfig

object

Konfigurasi deteksi giliran percakapan.

TurnEndWords

array

List of keywords used to determine the end of the user's turn.

string

Keyword used to determine the end of the user's turn.

我说完了

Mode

string

The mode for turn detection.

Normal (default): standard mode. AI is not used to judge semantics.
Semantic: uses AI to judge, based on contextual semantics, whether the speaker has finished speaking.

Semantic

SemanticWaitDuration

integer

Pause judgment time under AI mode. Unit: milliseconds. Default: -1.

-1: AI automatically determines an appropriate wait time.
0-10000: custom wait time. It is recommended to set this between 0-1500 ms.

Catatan

This setting has no effect under Normal mode.

-1

Eagerness

string

Only effective under Semantic mode. Controls how quickly the AI starts responding after a pause is detected:

"Low": waits patiently, with a maximum wait time of 6 seconds, reducing the risk of being interrupted.
"Medium": balanced wait (maximum wait time of 4 seconds), suitable for most scenarios.
"High": fast response (maximum wait time of 2 seconds), improves speed but may increase the risk of incorrect cut-offs.

This field is empty by default.

High

ExperimentalConfig

string

Parameter fitur eksperimental. Hubungi dukungan jika Anda memiliki kebutuhan.

""

VcrConfig

object

Konfigurasi fitur pengenalan konten video. Mendukung callback ke klien untuk konten yang dikenali oleh algoritma dalam video.

StillFrameMotion

object

Still frame detection configuration.

Enabled

boolean

false

CallbackDelay

integer

3000

InvalidFrameMotion

object

Invalid frame detection parameter configuration.

Enabled

boolean

false

CallbackDelay

integer

3000

PeopleCount

object

People counting feature configuration.

Enabled

boolean

false

Equipment

object

Equipment recognition configuration.

Enabled

boolean

false

HeadMotion

object

Head motion recognition configuration.

Enabled

boolean

false

LookAway

object

Gaze deviation recognition configuration.

Enabled

boolean

true

AmbientSoundConfig

object

Konfigurasi suara latar panggilan.

ResourceId

string

Call ambient sound ID. Can be obtained from the advanced configuration of the agent settings in the console.

f67901c595834************

Volume

integer

Volume of the call background sound. Valid values: [0, 100]. 0 means off.

50

AutoSpeechConfig

object

Modul konfigurasi bicara otomatis agen, termasuk prompt tunggu LLM dan prompt untuk keheningan pengguna yang berkepanjangan.

UserIdle

object

Prompt playback configuration when the user remains silent for a long time.

WaitTime

integer

5000

MaxRepeats

integer

5

Messages

array

object

Text

string

您还在吗？

Probability

number

0.5

HangupEndWord

string

LlmPending

object

Playback configuration for when the LLM response is delayed.

WaitTime

integer

3000

Mode

string

Messages

array

object

Text

string

稍等一下

Probability

number

0.5

BackChannelingConfigs

array

Modul konfigurasi fitur backchanneling. Saat diaktifkan, sistem secara acak memutar frasa backchanneling pendek pada momen pemicu tertentu.

object

A single backchanneling configuration.

Enabled

boolean

Whether to enable the backchanneling feature. Required. Values: true/false.

true

TriggerStage

string

The trigger moment for backchanneling. Available values:

pause_detected (a brief speaking pause is detected)

pause_detected

Probability

number

Feature trigger probability. Range: 0.0–1.0. Required.

0.5

Words

array

Set of backchanneling phrases. Maximum of 10 entries. Each phrase must be ≤ 20 characters in length, and probabilities must sum to 1.0.

object

Backchanneling phrase configuration.

Text

string

嗯嗯

Probability

number

0.3

BackChannelingConfig

array

Penting Usang. Gunakan BackChannelingConfigs.

.

object

A single backchanneling configuration.

Enabled

boolean

Whether to enable the backchanneling feature. Required. Values: true/false.

true

TriggerStage

string

The trigger moment for backchanneling. Available values:

pause_detected (a brief speaking pause is detected)

pause_detected

Probability

number

Feature trigger probability. Range: 0.0–1.0. Required.

0.5

Words

array

Set of backchanneling phrases. Maximum of 10 entries. Each phrase must be ≤ 20 characters in length, and probabilities must sum to 1.0.

object

Backchanneling phrase configuration.

Text

string

嗯嗯

Probability

number

0.3