All Products
Search
Document Center

Alibaba Cloud Model Studio:Client events

Last Updated:Jun 16, 2026

Client events for the Qwen-Omni-Realtime API.

See Qwen-Omni-Realtime .

session.update

After establishing a WebSocket connection, send this event first to update the default session configurations. The server validates the parameters upon receiving the session.update event. If the parameters are invalid, the server returns an error. If valid, the server applies the configuration and returns the complete updated configuration.

type string (required)

The event type. The value must be session.update.

{
    "event_id": "event_ToPZqeobitzUJnt3QqtWg",
    "type": "session.update",
    "session": {
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Chelsie",
        "input_audio_format": "pcm",
        "output_audio_format": "pcm",
        "instructions": "You are an AI customer service specialist for a five-star hotel. Accurately and friendly answer customer inquiries about room types, facilities, prices, and booking policies. Always respond with a professional and helpful attitude. Do not provide unverified information or information outside the scope of hotel services.",
        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.5,
            "silence_duration_ms": 800
        },
        "enable_search": true,
        "search_options": {
            "enable_source": true
        },
        "tools": [
            {
                "type": "function",
                "function": {
                    "name": "get_current_weather",
                    "description": "Useful for querying the weather in a specific city.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "location": {
                                "type": "string",
                                "description": "The city or district, such as Beijing, Hangzhou, or Yuhang District."
                            }
                        },
                        "required": ["location"]
                    }
                }
            }
        ],
        "seed": 1314,
        "max_tokens": 16384,
        "repetition_penalty": 1.05,
        "presence_penalty": 0.0,
        "top_k": 50,
        "top_p": 1.0,
        "temperature": 0.9
    }
}

session object (optional)

The session configuration.

Properties

modalities array (optional)

The model output modalities. Valid values:

  • ["text"]

    Text only.

  • ["text","audio"] (default)

    Text and audio.

voice string (optional)

The voice for the model's audio output. For supported voices, see Voice list.

Default voices:

  • Qwen3.5-Omni-Realtime series: Tina

  • Qwen3-Omni-Flash-Realtime: Cherry

  • Qwen-Omni-Turbo-Realtime: Chelsie

input_audio_format string (optional)

The format of the user's input audio. Currently only supports pcm. The input audio must be a PCM audio stream at a 16 kHz sample rate.

output_audio_format string (optional)

The format of the model's output audio. Currently only supports pcm. The output audio is a PCM audio stream at a 24 kHz sample rate.

smooth_output boolean | null (optional)

Applies to Qwen3-Omni-Flash-Realtime series models only.

Whether to enable a colloquial response style. Valid values:

  • true (default): Colloquial responses.

  • false: Formal, written responses.

    Content that is difficult to read aloud may not perform well.
  • null: The model automatically selects between colloquial and formal style.

instructions string (optional)

The system message that sets the model's role or objective.

turn_detection object (optional)

The Voice Activity Detection (VAD) configuration. Set to null to disable VAD and trigger model responses manually. If omitted, VAD is enabled with default parameters.

Properties

type string (optional)

The VAD type. Valid values:

  • server_vad (default): Detects end of speech based on acoustic features.

  • semantic_vad: Detects end of speech based on semantic validity, filtering out backchannels and background noise. Supported only by the qwen3.5-omni-realtime model.

threshold float (optional)

The VAD sensitivity. A lower value makes VAD more sensitive—more likely to detect faint sounds as speech, including background noise. A higher value requires clearer and louder speech to trigger.

Valid range: [-1.0, 1.0]. Default: 0.5.

silence_duration_ms integer (optional)

The minimum silence duration after speech ends, in milliseconds. When exceeded, the model generates a response. A lower value speeds up responses but may cause false triggers during brief pauses.

Default: 800. Valid range: [200, 6000].

idle_timeout_ms integer (optional)

Applies only to qwen3.5-omni-plus-realtime and qwen3.5-omni-flash-realtime models in server_vad mode.

The idle timeout in milliseconds. After the server finishes audio playback and the user remains silent beyond this duration (no speech.started triggered), the model proactively generates a response to prompt the user to continue the conversation based on the current context. The timeout starts after the last model response audio finishes playing.

Valid range: [5000, 30000].

enable_search boolean (optional)

Applies to the Qwen3.5-Omni-Realtime model only.

Whether to enable web search. Default: false. When enabled, the model decides autonomously whether to search to answer real-time questions.

tools and enable_search are incompatible. Do not enable both at the same time.

search_options object (optional)

Web search configuration. Takes effect only when enable_search is enabled.

Properties

enable_source boolean (optional)

Whether to return the list of search result sources. Set to true to enable.

tools array (optional)

A list of tool definitions. When provided, the model decides whether to call a tool based on user input.

Properties

type string (required)

The value must be function.

function.name string (required)

The tool function name, matching the actual function name, such as get_current_weather or get_current_time.

function.description string (optional)

A description of what the tool does. The model uses this to decide whether to call it.

function.parameters object (optional)

A description of the tool's input parameters. The model uses this to extract parameter values from user input. Omit this field if the tool takes no input.

Properties

type string (required)

The value must be object.

properties object (optional)

Describes each input parameter by name, data type, and description. The key is the parameter name; the value is an object with type and description.

required array (optional)

Lists which input parameters are required.

temperature float (optional)

The sampling temperature, controlling output diversity. A higher value produces more varied content; a lower value produces more deterministic output.

Valid range: [0, 2).

Because temperature and top_p both control diversity, set only one.

Default values:

  • qwen3.5-omni-realtime series: 0.7

  • qwen3-omni-flash-realtime series: 0.9

  • qwen-omni-turbo-realtime series: 1.0

qwen-omni-turbo models do not support modification.

top_p float (optional)

The nucleus sampling probability threshold, controlling output diversity. A higher value produces more varied content; a lower value produces more deterministic output.

Valid range: (0, 1.0].

Because temperature and top_p both control diversity, set only one.

Default values:

  • qwen3.5-omni-realtime series: 0.8

  • qwen3-omni-flash-realtime series: 1.0

  • qwen-omni-turbo-realtime series: 0.01

qwen-omni-turbo models do not support modification.

top_k integer (optional)

The candidate set size for sampling. For example, a value of 50 limits each generation step to the 50 highest-scoring tokens. A larger value increases randomness; a smaller value increases determinism. Set to null or a value greater than 100 to disable top_k, in which case only top_p applies.

Minimum value: 0.

Default values:

  • qwen3.5-omni-realtime series: 20

  • qwen3-omni-flash-realtime series: 50

  • qwen-omni-turbo-realtime series: 20

qwen-omni-turbo models do not support modification.

max_tokens integer (optional)

The maximum number of tokens to return.

The max_tokens setting does not affect the model's generation process. If the generated output exceeds max_tokens, the response is truncated.

The default and maximum values equal the model's maximum output length. See Model list.

Use max_tokens to limit output length for tasks like summarization, keyword extraction, cost control, or latency reduction.

qwen-omni-turbo models do not support modification.

repetition_penalty float (optional)

Controls repetition in the model's output. A higher value reduces repetition; 1.0 means no penalty. Must be greater than 0.

Default values:

  • qwen3.5-omni-realtime series: 1.0

  • qwen3-omni-flash-realtime series: 1.05

  • qwen-omni-turbo-realtime series: 1.05

qwen-omni-turbo models do not support modification.

presence_penalty float (optional)

Controls content repetition in the model's output.

Valid range: [-2.0, 2.0]. A positive value reduces repetition; a negative value increases it.

Default values:

  • qwen3.5-omni-realtime series: 1.5

  • qwen3-omni-flash-realtime series: 0.0

  • qwen-omni-turbo-realtime series: 0.0

When to use:

Higher values suit creative tasks like storytelling or brainstorming, where output diversity and interest matter.

Lower values suit technical or formal content where consistency and precise terminology are required.

qwen-omni-turbo models do not support modification.

seed integer (optional)

Makes generation more deterministic. Pass the same seed value with unchanged parameters to get consistent results across runs.

Valid range: 0 to 231−1. Default: -1.

qwen-omni-turbo models do not support modification.

response.create

The response.create event instructs the server to generate a model response. In VAD mode, the server generates responses automatically, so this event is not needed. In tool calling scenarios, send this event after returning the tool result via conversation.item.create to trigger the final model response.

The server replies with a response.created event, followed by one or more item and content events—such as conversation.item.created and response.content_part.added—and finally a response.done event.

type string (required)

The event type. The value must be response.create.

{
    "type": "response.create",
    "event_id": "event_1718624400000"
}

response.cancel

Send this event to cancel an ongoing response. If no response is in progress, the server returns an error.

type string (required)

The event type. The value must be response.cancel.

{
    "event_id": "event_B4o9RHSTWobB5OQdEHLTo",
    "type": "response.cancel"
}

input_audio_buffer.append

Appends audio bytes to the input audio buffer.

type string (required)

The event type. The value must be input_audio_buffer.append.

{
    "event_id": "event_B4o9RHSTWobB5OQdEHLTo",
    "type": "input_audio_buffer.append",
    "audio": "UklGR..."
}

audio string (required)

The Base64-encoded audio data.

input_audio_buffer.commit

Commits the input audio buffer, creating a new user message item in the conversation. If the buffer is empty, the server returns an error.

  • VAD mode: The client does not need to send this event. The server commits the audio buffer automatically.

  • Manual mode: The client must commit the audio buffer to create a user message item.

Committing the buffer does not trigger a model response. The server replies with an input_audio_buffer.committed event.

If the client sent an input_image_buffer.append event, the input_audio_buffer.commit event also commits the image buffer.

type string (required)

The event type. The value must be input_audio_buffer.commit.

{
    "event_id": "event_B4o9RHSTWobB5OQdEHLTo",
    "type": "input_audio_buffer.commit"
}

input_audio_buffer.clear

Clears audio bytes from the buffer. The server replies with an input_audio_buffer.cleared event.

type string (required)

The event type. The value must be input_audio_buffer.clear.

{
    "event_id": "event_xxx",
    "type": "input_audio_buffer.clear"
}

input_image_buffer.append

Adds image data to the image buffer. Images can come from local files or a live video stream.

The following limits apply to image inputs:

  • Format: JPG or JPEG. Recommended resolution: 480p or 720p; maximum: 1080p.

  • A single image after Base64 encoding must not exceed 256 KB. We recommend keeping the raw image size below 190 KB before encoding.

  • Image data must be Base64-encoded.

  • Recommended send frequency: 1 image per second.

  • Send at least one input_audio_buffer.append event before sending an input_image_buffer.append event.

The image buffer is committed with the audio buffer using the input_audio_buffer.commit event.

type string (required)

The event type. The value must be input_image_buffer.append.

{
    "event_id": "event_xxx",
    "type": "input_image_buffer.append",
    "image": "xxx"
}

image string (required)

The Base64-encoded image data.

conversation.item.create

Send this event to return a tool function's execution result to the server. After the model triggers a tool call, run the tool locally, then send the result back using this event. Follow up with a response.create event to trigger the final model response.

Note

Only items of the function_call_output type are currently supported.

type string (required)

The event type. The value must be conversation.item.create.

{
    "event_id": "event_55099cddb51b4f208cb95d1a994eef80",
    "type": "conversation.item.create",
    "item": {
        "id": "item_2a80d7682b4e473c9c2154da135041e9",
        "type": "function_call_output",
        "call_id": "call_62c24725afdb4c2680ac54",
        "output": "The weather in Beijing today is changing from haze to clear, with a temperature of 4/-4°C and a light breeze."
    }
}

item object (required)

The conversation item to create. Cannot be empty.

Properties

id string (optional)

The conversation item ID. Specify one to align with local state; otherwise the server generates one.

type string (required)

The conversation item type. Currently, only function_call_output is supported.

call_id string (required)

The call_id returned in the response.function_call_arguments.done event.

output string (required)

The tool function's execution result.