This topic describes the client events for the Qwen-Omni-Realtime API.
Reference: Real-time (Qwen-Omni-Realtime).
session.update
Send this event after establishing a WebSocket connection to update the session configuration. The service validates parameters, updates the session with the complete configuration if valid, or returns an error.
|
type The event type. Always |
|
|
session The session configuration. |
|
|
temperature The sampling temperature that controls content diversity. Higher values create more diverse content; lower values create more deterministic content. Value range: [0, 2). Both `temperature` and `top_p` control diversity — set only one. Default value:
|
|
|
top_p The probability threshold for nucleus sampling that controls content diversity. Higher `top_p` values create more diverse content; lower values create more deterministic content. Value range: (0, 1.0]. Both `temperature` and `top_p` control diversity — set only one. Default value:
|
|
|
top_k The candidate set size for sampling during generation. For example, setting this to 50 means only the top 50 tokens form the candidate set. Larger values increase randomness; smaller values increase determinism. If The value must be greater than or equal to 0. Default value:
|
|
|
max_tokens The maximum tokens to return for the request.
The default and maximum values are the model's maximum output length. For more information about the maximum output length of each model, see Model list. Use `max_tokens` to limit word count (e.g., summaries, keywords), control costs, or reduce response times.
|
|
|
repetition_penalty The penalty for repetition in consecutive sequences during generation. Higher `repetition_penalty` values reduce repetition. A value of 1.0 means no penalty. The value must be >0 (no strict upper limit). The default value is 1.05.
|
|
|
presence_penalty Controls the repetition of the generated content. The default value is 0.0. The value must be in the range of [-2.0, 2.0]. A positive value reduces repetition, and a negative value increases repetition. Scenarios: A higher `presence_penalty` is suitable for scenarios that require diversity, creativity, or fun, such as creative writing or brainstorming. A lower `presence_penalty` is suitable for scenarios that require consistency or the use of professional terms, such as technical documents or other formal documents.
|
|
|
seed Setting `seed` makes generation deterministic, ensuring consistent results across runs. Passing the same `seed` with unchanged parameters returns identical results. The value must be in the range of 0 to 231-1. The default value is -1.
|
response.create
response.create instructs the service to create a model response. In VAD mode, responses are created automatically — you don't need to send this event.
The service responds with response.created, followed by one or more item/content events (e.g., conversation.item.created, response.content_part.added), and finally response.done to indicate completion.
|
type The event type. Always |
|
response.cancel
Send this event to cancel an ongoing response. If no response exists, the service returns an error.
|
type The event type. Always |
|
input_audio_buffer.append
Append audio bytes to the input audio buffer.
|
type The event type. Always |
|
|
audio The Base64-encoded audio data. |
input_audio_buffer.commit
Submit the user input audio buffer to create a new user message item. If the buffer is empty, the service returns an error.
-
VAD mode: The client does not need to send this event. The service automatically submits the audio buffer.
-
Manual mode: The client must submit the audio buffer to create a user message item.
Submitting the input audio buffer does not create a model response. The service responds with input_audio_buffer.committed.
If the client has sent an input_image_buffer.append event, the input_audio_buffer.commit event submits the image buffer along with the audio buffer.
|
type The event type. Always |
|
input_audio_buffer.clear
Clear audio bytes from the buffer. The service responds with input_audio_buffer.cleared.
|
type The event type. Always |
|
input_image_buffer.append
Add image data to the image buffer. Images can be from local files or real-time video streams.
The following limits apply to image inputs:
-
The format must be JPG or JPEG. Recommended resolution: 480p or 720p. Maximum resolution: 1080p.
-
The maximum size is 500 KB (before Base64 encoding).
-
Image data must be Base64-encoded.
-
Send images at 1 image per second.
-
Before sending `input_image_buffer.append`, you must first send at least one `input_audio_buffer.append` event.
The image buffer is submitted along with the audio buffer through the input_audio_buffer.commit event.
|
type The event type. Always |
|
|
image The Base64-encoded image data. |