This topic describes the server-side events for the Qwen-Omni-Realtime API.
References: Real-time (Qwen-Omni-Realtime).
error
The server-side error message.
event_id string A unique identifier for this event. | {
"event_id": "event_RoUu4T8yExPMI37GKwaOC",
"type": "error",
"error": {
"type": "invalid_request_error",
"code": "invalid_value",
"message": "Invalid modalities: ['audio']. Supported combinations are: ['text'] and ['audio', 'text'].",
"param": "session.modalities"
}
}
|
type string The event type. This value is always error. |
error object Detailed error information. Properties type string The error type. code string The error code. message string The error message. param string The parameter related to the error, such as session.modalities. |
session.created
The server returns this event after a client connects. It contains the default configuration for the session.
event_id string A unique identifier for this event. | {
"event_id": "event_RdvlSpbBb2ssyBjYrDHjt",
"type": "session.created",
"session": {
"object": "realtime.session",
"model": "qwen3-omni-flash-realtime",
"modalities": [
"text",
"audio"
],
"voice": "Cherry",
"input_audio_format": "pcm",
"output_audio_format": "pcm",
"input_audio_transcription": {
"model": "gummy-realtime-v1"
},
"turn_detection": {
"type": "server_vad",
"threshold": 0.5,
"prefix_padding_ms": 300,
"silence_duration_ms": 800,
"create_response": true,
"interrupt_response": true
},
"enable_search": false,
"search_options": {},
"tools": [],
"tool_choice": "auto",
"temperature": 0.8,
"id": "sess_Ov7GOXoNXhNjlxXtOGKQS"
}
}
|
type string The event type. This value is always session.created. |
session object The session configuration. Properties object string This value is always realtime.session. model string The model used. modalities array The output modalities for the model. voice string Specifies the timbre of the audio generated by the model. input_audio_format string The input audio format, which is always pcm. output_audio_format string The output audio format is always pcm. input_audio_transcription object The transcription configuration. Properties model string The transcription model. This value is always gummy-realtime-v1. turn_detection object The voice activity detection (VAD) configuration. Properties type string The server-side VAD type. This value is always server_vad. threshold float The VAD detection threshold. silence_duration_ms integer The duration of silence, in milliseconds, before speech stops. enable_search boolean Whether to enable web search. This parameter is supported only by the Qwen3.5-Omni-Realtime model. search_options object The options for the web search. temperature float The temperature parameter for the model. |
session.updated
You receive this event after you send a session.update request and the request succeeds. If the request fails, the server returns an error event.
event_id string A unique identifier for this event. | {
"event_id": "event_X1HsXS4b4uptp6yo1LgKd",
"type": "session.updated",
"session": {
"id": "sess_Aih6vAcY5Ddt6jwFx1tCa",
"object": "realtime.session",
"model": "qwen3-omni-flash-realtime",
"modalities": [
"text",
"audio"
],
"instructions": "You are Xiao Yun, a personal assistant. Answer user questions accurately and politely. Always respond with a helpful attitude.",
"voice": "Cherry",
"input_audio_format": "pcm",
"output_audio_format": "pcm",
"input_audio_transcription": {
"model": "gummy-realtime-v1"
},
"turn_detection": {
"type": "server_vad",
"threshold": 0.1,
"prefix_padding_ms": 500,
"silence_duration_ms": 900,
"create_response": true,
"interrupt_response": true
},
"enable_search": true,
"search_options": {
"enable_source": true
},
"temperature": 0.8,
"max_response_output_token": "inf",
"max_tokens": 16384,
"repetition_penalty": 1.05,
"presence_penalty": 0.0,
"top_k": 50,
"top_p": 1.0,
"seed":-1
}
}
|
type string The event type. This value is always session.updated. |
session object The session configuration. Properties temperature float The temperature parameter for the model. modalities array The output modalities for the model. voice string Timbre of the audio generated by the model. instructions string The model's goal and role. input_audio_format string The input audio format. Only pcm is supported. output_audio_format string The only supported output audio format is pcm. input_audio_transcription object The transcription configuration. Properties model string The transcription model. This value is always gummy-realtime-v1. turn_detection object The voice activity detection (VAD) configuration. Properties type string The server-side VAD type. This value is always server_vad. threshold float The VAD detection threshold. silence_duration_ms integer The duration of silence that triggers speech stop detection. enable_search boolean (optional) Whether to enable web search. Only supported by the Qwen3.5-Omni-Realtime model. search_options object (optional) The web search options. top_pfloat The probability threshold for nucleus sampling. top_k integer The size of the candidate set used during model generation. max_tokens integer The maximum number of tokens the model returns in this request. repetition_penalty float Controls repetition in consecutive sequences during generation. presence_penalty float Controls repetition in generated content. seed integer The consistency level of results across requests. |
input_audio_buffer.speech_started
In VAD mode, the server returns this event when it detects speech start in the audio buffer.
If the server has not yet detected speech, this event may trigger each time you add audio to the buffer.
event_id string A unique identifier for this event. | {
"event_id": "event_Pvp8nEhsQuGCQbFJ9x58n",
"type": "input_audio_buffer.speech_started",
"audio_start_ms": 3647,
"item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}
|
type string The event type. This value is always input_audio_buffer.speech_started. |
audio_start_ms integer The number of milliseconds from when audio starts writing to the buffer until speech is first detected. |
item_id string The ID of the user message item that will be created when speech stops. User message items append user input to the conversation history for later model inference and generation. |
input_audio_buffer.speech_stopped
In VAD mode, the server returns this event when it detects the end of speech in the audio buffer.
The server also returns a conversation.item.created event that creates the corresponding user message item.
event_id string A unique identifier for this event. | {
"event_id": "event_UhQiqNVRsgUiq4KUS5Xb5",
"type": "input_audio_buffer.speech_stopped",
"audio_end_ms": 4453,
"item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}
|
type string The event type. This value is always input_audio_buffer.speech_stopped. |
audio_end_ms integer The number of milliseconds from session start until speech stops. |
item_id string The ID of the user message item that will be created. |
input_audio_buffer.committed
The server returns this event when the input audio buffer is committed.
In VAD mode, the server automatically commits the audio buffer and returns this event when it detects speech end.
In Manual mode, the server returns this event after the client sends an input_audio_buffer.commit event.
event_id string A unique identifier for this event. | {
"event_id": "event_Iy6sUzL1nmdFgshFYxJEz",
"type": "input_audio_buffer.committed",
"item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}
|
type string The event type. This value is always input_audio_buffer.committed. |
item_id string The ID of the user message item that will be created. |
input_audio_buffer.cleared
After the client sends an input_audio_buffer.clear event, the server returns this event.
event_id string A unique identifier for this event. | {
"event_id": "event_RoUu4T8yExPMI37GKwaOC",
"type": "input_audio_buffer.cleared"
}
|
type string The event type. This value is always input_audio_buffer.cleared. |
conversation.item.created
The server returns this event when a conversation item is created.
event_id string A unique identifier for this event. | {
"event_id": "event_JEfkrr9gO3Ny7Xcv9bGVd",
"type": "conversation.item.created",
"item": {
"id": "item_YbAiGvK2H7YaS34o4R6Ba",
"object": "realtime.item",
"type": "message",
"status": "in_progress",
"role": "assistant",
"content": [
{
"type": "input_audio"
}
]
}
}
|
type string The event type. This value is always conversation.item.created. |
item object The conversation item to add. Properties id string The unique ID of the conversation item. object string This value is always realtime.item. status string The status of the conversation item. role string The role of the message. content string The message content. |
conversation.item.input_audio_transcription.completed
This event indicates that the user’s audio has been transcribed after being written to the buffer. The transcription is performed by a dedicated speech recognition model (currently fixed to gummy-realtime-v1).
The transcribed text from the speech recognition model may differ from the interpretation generated by the Qwen-Omni-Realtime model. Use the transcription for reference only.
event_id string A unique identifier for this event. | {
"event_id": "event_FrrZcxiDfTB9LD9p4pVng",
"type": "conversation.item.input_audio_transcription.completed",
"item_id": "item_YbAiGvK2H7YaS34o4R6Ba",
"content_index": 0,
"transcript": "Hello."
}
|
type string The event type. This value is always conversation.item.input_audio_transcription.completed. |
item_id string The ID of the user message item. |
content_index integer This value is always 0. |
transcript string The transcribed text. |
conversation.item.input_audio_transcription.failed
The server returns this event when input audio transcription is enabled and fails. This event is independent of the error event to help clients identify transcription failures.
event_id string A unique identifier for this event. | {
"type": "conversation.item.input_audio_transcription.failed",
"item_id": "<item_id>",
"content_index": 0,
"error": {
"code": "<code>",
"message": "<message>",
"param": "<param>"
}
}
|
type string The event type. This value is always conversation.item.input_audio_transcription.failed. |
item_id string The ID of the user message item. |
content_index integer This value is always 0. |
error object The error information. Properties code string The error code. message string The error message. param string The parameter related to the error. |
response.created
The server returns this event when it generates a new model response.
event_id string A unique identifier for this event. | {
"event_id": "event_XuDavMzQN3KKepqGu3KRh",
"type": "response.created",
"response": {
"id": "resp_HaVOPdbmX6vifiV5pAfJY",
"object": "realtime.response",
"conversation_id": "conv_FjJaccpnvwHNo9cPVuzGc",
"status": "in_progress",
"modalities": [
"text",
"audio"
],
"voice": "Cherry",
"output_audio_format": "pcm",
"output": []
}
}
|
type string The event type. This value is always response.created. |
response object The response object. Properties id string The unique ID of the response. conversation_id string The unique ID of the current session. object string The object type. For this event, this value is always realtime.response. status string The response status. Valid values are completed, failed, in_progress, or incomplete. modalities array The response modalities. voice string The audio timbre generated by the model. output string This field is empty for this event. |
response.done
The server returns this event after the response finishes generating. The response object includes all output items except raw audio data.
event_id string A unique identifier for this event. | {
"event_id": "event_CSaxRRYLvbrfexDXAEuDG",
"type": "response.done",
"response": {
"id": "resp_HaVOPdbmX6vifiV5pAfJY",
"object": "realtime.response",
"conversation_id": "conv_FjJaccpnvwHNo9cPVuzGc",
"status": "completed",
"modalities": [
"text",
"audio"
],
"voice": "Cherry",
"output_audio_format": "pcm",
"output": [
{
"id": "item_Ls6MtCUWO7LM4E59QziNv",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "audio",
"transcript": "Hello! How can I help you?"
}
]
}
],
"usage": {
"total_tokens": 377,
"input_tokens": 336,
"output_tokens": 41,
"input_tokens_details": {
"text_tokens": 228,
"audio_tokens": 108
},
"output_tokens_details": {
"text_tokens": 9,
"audio_tokens": 32
},
"plugins": {
"search": {
"count": 1,
"strategy": "agent"
}
}
}
}
}
|
type string The event type. This value is always response.done. |
response object The response object. Properties id string The unique ID of the response. conversation_id string The unique ID of the current session. object string The object type. For this event, this value is always realtime.response. status string The response status. modalities array The response modalities. voice string The audio voice used for the model's output. output object The response output. Properties id string The ID of the response output. type string The output item type. This value is always message. object string The output item object type. This value is always realtime.item. status string The output item status. role string The output item role. content array The output item content. Properties type string The content type. Use text for plain text output. Use audio for audio output. text string The text output. transcript string The text transcript of the audio. usage object Token usage details for this response. Properties total_tokens integer The total number of tokens used in this response. input_tokens integer The number of input tokens. output_tokens integer The number of output tokens. input_tokens_details object Details about input token usage, including text_tokens (text tokens) and audio_tokens (audio tokens). output_tokens_details object Details about output token usage, including text_tokens (text tokens) and audio_tokens (audio tokens). plugins object (optional) Plugin usage metrics. Returned when web search (enable_search) is enabled. Properties search object Search the web for metering information. Properties count integer The number of searches. strategy string The search strategy. |
response.text.delta
The server returns this event when the output modality is text only and the model generates new text incrementally.
event_id string A unique identifier for this event. | {
"delta": "Hello",
"event_id": "event_TH49MauuPmRo1RGaMSlP7",
"type": "response.text.delta",
"response_id": "resp_PrRSvPVpnCExdUOGHHLuP",
"item_id": "item_L8IRm9kRXFpxoOjDqDC96",
"output_index": 0,
"content_index": 0
}
|
type string The event type. This value is always response.text.delta. |
delta string The incremental text returned. |
response_id string The response ID. |
item_id string The message item ID. You can use this to reference the same message item. |
output_index integer The index of the output item in the response. This value is always 0. |
content_index integer The index of the internal part within the output item. This value is always 0. |
response.text.done
When the output modality is text only and the model finishes generating text, the server returns this event.
The server also returns this event if the response is interrupted, incomplete, or canceled.
event_id string A unique identifier for this event. | {
"event_id": "event_B1lIeE2Nac33zn5V7h2mm",
"type": "response.text.done",
"response_id": "resp_B1lIdtjF4Noqpn5NOjznj",
"item_id": "item_B1lIdJsAJlJiFs8ztWpJt",
"output_index": 0,
"content_index": 0,
"text": "How can I assist you today?"
}
|
type string The event type. This value is always response.text.done. |
response_id string The response ID. |
item_id string The message item ID. |
output_index integer The index of the output item in the response. |
content_index integer The index of the content item in the response. |
text string The full text output by the model. |
response.audio.delta
When the output modality includes audio and the model generates new audio data incrementally, the server returns this event.
event_id string A unique identifier for this event. | {
"event_id": "event_B1osWMZBtrEQbiIwW0qHQ",
"type": "response.audio.delta",
"response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
"item_id": "item_OFaPGtzfWCPyGzxnuEX9i",
"output_index": 0,
"content_index": 0,
"delta": "{base64 audio}"
}
|
type string The event type. This value is always response.audio.delta. |
response_id string The response ID. |
item_id string The message item ID. |
output_index integer The index of the output item in the response. |
content_index integer The index of the output item in the response. |
delta string The audio data output by the model, encoded in Base64. |
response.audio.done
When the output modality includes audio and the model finishes generating audio data, the server returns this event.
The server also returns this event if the response is interrupted, incomplete, or canceled.
event_id string A unique identifier for this event. | {
"event_id": "event_Le1TDl7VfyHQxl47DtGxI",
"type": "response.audio.done",
"response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
"item_id": "item_Ls6MtCUWO7LM4E59QziNv",
"output_index": 0,
"content_index": 0
}
|
type string The event type. This value is always response.audio.done. |
response_id string The response ID. |
item_id string The message item ID. |
output_index integer The index of the output item in the response. |
content_index integer The index of the output item in the response. |
response.audio_transcript.delta
The server returns a response.audio_transcript.delta event when the output modality includes audio and the model generates new text for the audio incrementally.
event_id string A unique identifier for this event. | {
"event_id": "event_BksW7fOwnyavZdDxIzZYM",
"type": "response.audio_transcript.delta",
"response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
"item_id": "item_Ls6MtCUWO7LM4E59QziNv",
"output_index": 0,
"content_index": 0,
"delta": "What"
}
|
type string The event type. This value is always response.audio_transcript.delta. |
response_id string The response ID. |
item_id string The message item ID. |
output_index integer The index of the output item in the response. |
content_index integer The index of the content item in the response. |
delta string The incremental text. |
response.audio_transcript.done
The server returns a response.audio_transcript.done event when the output modality includes audio and the model completes transcribing the audio.
event_id string A unique identifier for this event. | {
"event_id": "event_X49tL2WerT4WjxcmH16lS",
"type": "response.audio_transcript.done",
"response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
"item_id": "item_Ls6MtCUWO7LM4E59QziNv",
"output_index": 0,
"content_index": 0,
"transcript": "Hello! How can I help you?"
}
|
type string The event type. This value is always response.audio_transcript.done. |
response_id string The response ID. |
item_id string The message item ID. |
output_index integer The index of the output item in the response. |
content_index integer The index of the content item in the response. |
transcript string The full transcription text. |
response.output_item.added
The server returns this event when it creates a new output item during response generation.
event_id string A unique identifier for this event. | {
"event_id": "event_DsCO341DEVtiATtCB6BUY",
"type": "response.output_item.added",
"response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
"output_index": 0,
"item": {
"id": "item_Ls6MtCUWO7LM4E59QziNv",
"object": "realtime.item",
"type": "message",
"status": "in_progress",
"role": "assistant",
"content": []
}
}
|
type string The event type. This value is always response.output_item.added. |
response_id string The ID of the response. |
output_index integer The index of the output item in the response. |
item object Information about the output item. Properties id string The unique ID of the output item. object string This value is always realtime.item. status string The status of the output item. role string The role of the sender. content string The message content. |
response.output_item.done
The server returns this event when a new output item is complete.
event_id string A unique identifier for this event. | {
"event_id": "event_MEu5nlLw1LsOguHiehIP8",
"type": "response.output_item.done",
"response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
"output_index": 0,
"item": {
"id": "item_Ls6MtCUWO7LM4E59QziNv",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "audio",
"text": "Hello! How can I help you?"
}
]
}
}
|
type string The event type. This value is always response.output_item.done. |
response_id string The response ID. |
output_index integer The index of the output item in the response. |
item object The output item information. Properties id string The unique ID of the output item. object string This value is always realtime.item. status string The status of the output item. role string The role of the sender. content string The message content. |
response.content_part.added
The server returns this event when it adds a new content part to an assistant message item during response generation.
event_id string A unique identifier for this event. | {
"event_id": "event_AVBOmrgY3C8bjlRajfSUT",
"type": "response.content_part.added",
"response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
"item_id": "item_Ls6MtCUWO7LM4E59QziNv",
"output_index": 0,
"content_index": 0,
"part": {
"type": "audio",
"text": ""
}
}
|
type string The event type. This value is always response.content_part.added. |
response_id string The response ID. |
item_id string The message item ID. |
output_index integer The index of the output item in the response. This value is always 0. |
content_index integer The index of the internal part within the output item. This value is always 0. |
part object The output item information. Properties type string The type of the content part. text string The text of the content part. |
response.content_part.done
The server returns this event when the streaming of a content part within an assistant message item finishes.
event_id string A unique identifier for this event. | {
"event_id": "event_Il8HD19v58Qr5IBkw7LtN",
"type": "response.content_part.done",
"response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
"item_id": "item_Ls6MtCUWO7LM4E59QziNv",
"output_index": 0,
"content_index": 0,
"part": {
"type": "audio",
"text": "Hello! How can I help you?"
}
}
|
type string The event type. This value is always response.content_part.done. |
response_id string The response ID. |
item_id string The message item ID. |
output_index integer The index of the output item in the response. This value is always 0. |
content_index integer The index of the content part in the content array. This value is always 0. |
part object The output item information. Properties type string The type of the content part. text string The text of the content part. |