Client events for the Qwen-Omni-Realtime API.
See Qwen-Omni-Realtime .
session.update
After establishing a WebSocket connection, send this event first to update the default session configurations. The server validates the parameters upon receiving the session.update event. If the parameters are invalid, the server returns an error. If valid, the server applies the configuration and returns the complete updated configuration.
|
type The event type. The value must be |
|
|
session The session configuration. |
|
|
temperature The sampling temperature, controlling output diversity. A higher value produces more varied content; a lower value produces more deterministic output. Valid range: [0, 2). Because temperature and top_p both control diversity, set only one. Default values:
|
|
|
top_p The nucleus sampling probability threshold, controlling output diversity. A higher value produces more varied content; a lower value produces more deterministic output. Valid range: (0, 1.0]. Because temperature and top_p both control diversity, set only one. Default values:
|
|
|
top_k The candidate set size for sampling. For example, a value of 50 limits each generation step to the 50 highest-scoring tokens. A larger value increases randomness; a smaller value increases determinism. Set to Minimum value: 0. Default values:
|
|
|
max_tokens The maximum number of tokens to return. The The default and maximum values equal the model's maximum output length. See Model list. Use max_tokens to limit output length for tasks like summarization, keyword extraction, cost control, or latency reduction.
|
|
|
repetition_penalty Controls repetition in the model's output. A higher value reduces repetition; 1.0 means no penalty. Must be greater than 0. Default values:
|
|
|
presence_penalty Controls content repetition in the model's output. Valid range: [-2.0, 2.0]. A positive value reduces repetition; a negative value increases it. Default values:
When to use: Higher values suit creative tasks like storytelling or brainstorming, where output diversity and interest matter. Lower values suit technical or formal content where consistency and precise terminology are required.
|
|
|
seed Makes generation more deterministic. Pass the same seed value with unchanged parameters to get consistent results across runs. Valid range: 0 to 231−1. Default: -1.
|
response.create
The response.create event instructs the server to generate a model response. In VAD mode, the server generates responses automatically, so this event is not needed. In tool calling scenarios, send this event after returning the tool result via conversation.item.create to trigger the final model response.
The server replies with a response.created event, followed by one or more item and content events—such as conversation.item.created and response.content_part.added—and finally a response.done event.
|
type The event type. The value must be |
|
response.cancel
Send this event to cancel an ongoing response. If no response is in progress, the server returns an error.
|
type The event type. The value must be |
|
input_audio_buffer.append
Appends audio bytes to the input audio buffer.
|
type The event type. The value must be |
|
|
audio The Base64-encoded audio data. |
input_audio_buffer.commit
Commits the input audio buffer, creating a new user message item in the conversation. If the buffer is empty, the server returns an error.
-
VAD mode: The client does not need to send this event. The server commits the audio buffer automatically.
-
Manual mode: The client must commit the audio buffer to create a user message item.
Committing the buffer does not trigger a model response. The server replies with an input_audio_buffer.committed event.
If the client sent an input_image_buffer.append event, the input_audio_buffer.commit event also commits the image buffer.
|
type The event type. The value must be |
|
input_audio_buffer.clear
Clears audio bytes from the buffer. The server replies with an input_audio_buffer.cleared event.
|
type The event type. The value must be |
|
input_image_buffer.append
Adds image data to the image buffer. Images can come from local files or a live video stream.
The following limits apply to image inputs:
-
Format: JPG or JPEG. Recommended resolution: 480p or 720p; maximum: 1080p.
-
A single image after Base64 encoding must not exceed 256 KB. We recommend keeping the raw image size below 190 KB before encoding.
-
Image data must be Base64-encoded.
-
Recommended send frequency: 1 image per second.
-
Send at least one
input_audio_buffer.appendevent before sending aninput_image_buffer.appendevent.
The image buffer is committed with the audio buffer using the input_audio_buffer.commit event.
|
type The event type. The value must be |
|
|
image The Base64-encoded image data. |
conversation.item.create
Send this event to return a tool function's execution result to the server. After the model triggers a tool call, run the tool locally, then send the result back using this event. Follow up with a response.create event to trigger the final model response.
Only items of the function_call_output type are currently supported.
|
type The event type. The value must be |
|
|
item The conversation item to create. Cannot be empty. |