This topic describes the client events for the Qwen-Omni-Realtime API.
For more information, see Real-time multimodal.
session.update
After establishing a WebSocket connection, send this event to update the default session configuration. When the service receives the session.update event, it validates the parameters. If the parameters are valid, the service updates the session and returns the complete configuration. Otherwise, the service returns an error.
type The event type. Always | |
session The session configuration. | |
temperature The sampling temperature, which controls the diversity of the generated content. A higher temperature value creates more diverse content. A lower value creates more deterministic content. Value range: [0, 2). Because both `temperature` and `top_p` control content diversity, set only one of them. Default value:
| |
top_p The probability threshold for nucleus sampling, which controls the diversity of the generated content. A higher `top_p` value creates more diverse content. A lower value creates more deterministic content. Value range: (0, 1.0]. Because both `temperature` and `top_p` control content diversity, set only one of them. Default value:
| |
top_k The size of the candidate set for sampling during generation. For example, if you set this parameter to 50, only the 50 tokens with the highest scores in a single generation are used to form the candidate set for random sampling. A larger value increases randomness. A smaller value increases determinism. If the value is The value must be greater than or equal to 0. Default value:
| |
max_tokens The maximum number of tokens to return for the request.
The default and maximum values are the model's maximum output length. For more information about the maximum output length of each model, see Model list. Use the `max_tokens` parameter for scenarios where you need to limit the word count, such as generating summaries or keywords, control costs, or reduce response times. | |
repetition_penalty The penalty for repetition in consecutive sequences during model generation. Increasing the `repetition_penalty` value reduces the repetition of the generated content. A value of 1.0 means no penalty. There is no strict value range, but the value must be greater than 0. The default value is 1.05. | |
presence_penalty Controls the repetition of the generated content. The default value is 0.0. The value must be in the range of [-2.0, 2.0]. A positive value reduces repetition, and a negative value increases repetition. Scenarios: A higher `presence_penalty` is suitable for scenarios that require diversity, creativity, or fun, such as creative writing or brainstorming. A lower `presence_penalty` is suitable for scenarios that require consistency or the use of professional terms, such as technical documents or other formal documents. | |
seed Setting the `seed` parameter makes the model's generation process more deterministic. It is typically used to ensure that the model produces consistent results for each run. If you pass the same `seed` value in each model call and keep other parameters unchanged, the model returns the same result. The value must be in the range of 0 to 231−1. The default value is -1. |
response.create
The response.create event instructs the service to create a model response. In VAD mode, the service automatically creates model responses, so you do not need to send this event.
The service responds with a response.created event, one or more item and content events (such as conversation.item.created and response.content_part.added), and finally a response.done event to indicate that the response is complete.
type The event type. Always | |
response.cancel
The client sends this event to cancel an ongoing response. If there is no response to cancel, the service responds with an error event.
type The event type. Always | |
input_audio_buffer.append
Appends audio bytes to the input audio buffer.
type The event type. Always | |
audio The Base64-encoded audio data. |
input_audio_buffer.commit
Submits the user input audio buffer to create a new user message item in the conversation. If the input audio buffer is empty, the service returns an error event.
VAD mode: The client does not need to send this event. The service automatically submits the audio buffer.
Manual mode: The client must submit the audio buffer to create a user message item.
Submitting the input audio buffer does not create a response from the model. The service responds with an input_audio_buffer.committed event.
If the client has sent an input_image_buffer.append event, the input_audio_buffer.commit event submits the image buffer along with the audio buffer.
type The event type. Always | |
input_audio_buffer.clear
Clears the audio bytes from the buffer. The service responds with an input_audio_buffer.cleared event.
type The event type. Always | |
input_image_buffer.append
Adds image data to the image buffer. The image can be from a local file or captured in real-time from a video stream.
The following limits apply to image inputs:
The image format must be JPG or JPEG. Recommended: 480p or 720p. Maximum: 1080p.
The size of a single image cannot exceed 500 KB before Base64 encoding.
The image data must be Base64-encoded.
Send images to the service at a maximum frequency of 2 images per second.
Before you send an input_image_buffer.append event, you must send at least one input_audio_buffer.append event.
The image buffer is submitted along with the audio buffer through the input_audio_buffer.commit event.
type The event type. Always | |
image The Base64-encoded image data. |