This topic describes client events for the Qwen-Omni-Realtime API.
For more information, see Real-time (Qwen-Omni-Realtime).
Session.update
After establishing a WebSocket connection, send this event first to update the session's default configurations. The server validates parameters after receiving the session.update event. If parameters are invalid, an error is returned. If parameters are valid, the server updates the configuration and returns the complete configuration.
type Event type. Set to | |
session Session configuration. | |
temperature Sampling temperature. Controls the diversity of model-generated content. Higher temperature leads to more diverse content. Lower temperature leads to more deterministic content. Range: [0, 2) Because both temperature and top_p control content diversity, set only one value. Default temperature:
| |
top_p Nucleus sampling probability threshold. Controls the diversity of model-generated content. Higher top_p leads to more diverse content. Lower top_p leads to more deterministic content. Range: (0, 1.0] Because both temperature and top_p control content diversity, set only one value. Default top_p:
| |
top_k The size of the candidate set used during the generation process. For example, when the value is 50, only the 50 highest-scoring tokens from a single generation are included in the candidate set for random sampling. The larger the value, the higher the randomness of the generated content. Conversely, the smaller the value, the more deterministic the generated content. A value of The value must be greater than or equal to 0. Default top_k:
| |
max_tokens Maximum number of tokens returned for this request.
The default value and maximum value both represent the maximum output length of the model. For more information, see Model List. The max_tokens parameter applies to scenarios requiring word count limits such as generating summaries or keywords, cost control, or reduced response time. | |
repetition_penalty This parameter controls repetition in consecutive sequences during model generation. A higher repetition_penalty value reduces repetition, and a value of 1.0 indicates no penalty. There is no strict range of valid values; the value must be greater than 0. Default value: 1.05. | |
presence_penalty Controls repetition in model-generated content. Default value: 0.0. Range: [-2.0, 2.0]. Positive values reduce repetition. Negative values increase repetition. Scenarios: Higher presence_penalty applies to scenarios requiring diversity, interest, or creativity, such as creative writing or brainstorming. Lower presence_penalty applies to scenarios requiring consistency or specialized terminology, such as technical documents or other formal documents. | |
seed Setting the seed parameter makes Large Language Model (LLM) generation more deterministic. Use it to ensure consistent results across model runs. Pass the same seed value with each model call. Keep other parameters unchanged. The model returns the same results as much as possible. Range: 0 to 231−1. Default value: -1. |
Response.create
The response.create event instructs the server to create a model response. In VAD mode, the server automatically creates model responses. Do not send this event.
The server responds with a response.created event, one or more item and content events such as conversation.item.created and response.content_part.added, and finally a response.done event to indicate response completion.
type Event type. Set to | |
Response.cancel
The client sends this event to cancel an ongoing response. If no response is available for cancellation, the server responds with an error event.
type Event type. Set to | |
input_audio_buffer.append
Append audio bytes to the input audio buffer.
type Event type. Set to | |
audio Base64-encoded audio data. |
input_audio_buffer.commit
Submit the user input audio buffer. This action creates a new user message item in the conversation. If the input audio buffer is empty, the server returns an error event.
VAD mode: The client does not need to send this event. The server automatically commits the audio buffer.
Manual mode: The client must commit the audio buffer to create a user message item.
Committing the input audio buffer does not create a response from the model. The server responds with an input_audio_buffer.committed event.
If the client sent an input_image_buffer.append event, the input_audio_buffer.commit event commits the image buffer along with it.
type Event type. Set to | |
input_audio_buffer.clear
Clear audio bytes from the buffer. The server responds with an input_audio_buffer.cleared event.
type Event type. Set to | |
input_image_buffer.append
Add image data to the image buffer. Images can come from local files or be captured in real-time from a video stream.
Image input has the following limitations:
Image format must be JPG or JPEG. For optimal performance, a resolution of 480p or 720p is recommended, with a maximum of 1080p.
Single image size must not exceed 500 KB before Base64 encoding.
Image data must be Base64-encoded.
Send images to the server at a recommended frequency of 1 image per second.
Before sending an input_image_buffer.append event, send at least one input_audio_buffer.append event.
The image buffer is committed along with the audio buffer via the input_audio_buffer.commit event.
type Event type. Set to | |
image Base64-encoded image data. |