Server events for the Qwen-Omni-Realtime API, including function calling events.
See Qwen-Omni-Realtime.
error
Indicates a server error message.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
error Detailed error information. |
session.created
The server returns this event after a client connects. It contains the default configuration for the session.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
session The session configuration. |
session.updated
The server returns this event after a session.update request succeeds. If the request fails, the server returns an error event.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
session The session configuration. |
input_audio_buffer.speech_started
In VAD mode, the server returns this event when it detects the start of speech in the audio buffer.
This event may be triggered each time you add audio to the buffer before speech is detected.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
audio_start_ms The time in milliseconds from when audio writing to the buffer starts until speech is first detected. |
|
|
item_id The ID of the user message item that is created when the end of speech is detected. User message items are used to append user input to the conversation history for subsequent model inference and generation. |
input_audio_buffer.speech_stopped
In VAD mode, the server returns this event when it detects the end of speech in the audio buffer.
The server also returns a conversation.item.created event that creates the corresponding user message item.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
audio_end_ms The time in milliseconds from the start of the session until the end of speech is detected. |
|
|
item_id The ID of the user message item that will be created. |
input_audio_buffer.committed
The server returns this event when the input audio buffer is committed.
-
In VAD mode, the server automatically commits the audio buffer and returns this event when it detects the end of speech.
-
In Manual mode, the server returns this event after the client sends an
input_audio_buffer.commitevent.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
item_id The ID of the user message item that will be created. |
input_audio_buffer.cleared
The server returns this event after the client sends an input_audio_buffer.clear event.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
conversation.item.created
The server returns this event when a conversation item is created.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
item The conversation item to add. |
conversation.item.input_audio_transcription.delta
When input audio transcription is enabled, this event is sent frequently while the user is speaking. It provides real-time intermediate transcription results. You can concatenate text + stash to get the most complete sentence preview at any point in time.
|
event_id A unique identifier for this event. |
To get the most complete sentence preview at any moment, concatenate these two fields: real-time preview = |
|
type The event type. This value is always |
|
|
item_id The ID of the associated conversation item. |
|
|
content_index The index of the content part that contains the audio. |
|
|
text The confirmed text prefix. This is the portion of the current sentence that the model has confirmed and will not change. |
|
|
stash The preliminary text suffix. This is the temporary draft that follows the confirmed portion, which the model is still processing and may revise. |
|
|
language The detected language of the recognized audio. |
|
|
emotion The detected emotion of the recognized audio. Valid values: |
conversation.item.input_audio_transcription.completed
Indicates that the user's audio has been transcribed. The transcription is performed by a built-in speech recognition model (qwen3-asr-flash-realtime). This parameter is not configurable.
The transcribed text from the speech recognition model may differ from the interpretation generated by the Qwen-Omni-Realtime model. The transcription is for reference only.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
item_id The ID of the user message item. |
|
|
content_index integer This value is always 0. |
|
|
transcript The transcribed text. |
conversation.item.input_audio_transcription.failed
The server returns this event when input audio transcription is enabled and the transcription fails. This event is independent of the error event and helps clients identify transcription failures.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
item_id The ID of the user message item. |
|
|
content_index This value is always 0. |
|
|
error The error information. |
response.created
The server returns this event when it generates a new model response.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
response The response object. |
response.done
The server returns this event after the response is completely generated. The response object includes all output items except for the raw audio data.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
response The response object. |
response.text.delta
The server returns this event when the output modality is text only and the model generates new text incrementally.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
delta The returned incremental text. |
|
|
response_id The response ID. |
|
|
item_id The message item ID. You can use this to reference the same message item. |
|
|
output_index The index of the output item in the response. This value is always 0. |
|
|
content_index The index of the internal part within the output item. This value is always 0. |
response.text.done
When the output modality is text only and the model finishes generating text, the server returns this event.
The server also returns this event if the response is interrupted, incomplete, or canceled.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
response_id The response ID. |
|
|
item_id The message item ID. |
|
|
output_index The index of the output item in the response. |
|
|
content_index The index of the output item in the response. |
|
|
text The full text generated by the model. |
response.audio.delta
When the output modality includes audio and the model generates new audio data incrementally, the server returns this event.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
response_id The response ID. |
|
|
item_id The message item ID. |
|
|
output_index The index of the output item in the response. |
|
|
content_index The index of the output item in the response. |
|
|
delta The incremental audio data generated by the model, which is Base64-encoded. |
response.audio.done
When the output modality includes audio and the model finishes generating audio data, the server returns this event.
The server also returns this event if the response is interrupted, incomplete, or canceled.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
response_id The response ID. |
|
|
item_id The message item ID. |
|
|
output_index The index of the output item in the response. |
|
|
content_index The index of the output item in the response. |
response.audio_transcript.delta
The server returns a response.audio_transcript.delta event when the output modality includes audio and the model generates new text for the audio incrementally.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
response_id The response ID. |
|
|
item_id The message item ID. |
|
|
output_index The index of the output item in the response. |
|
|
content_index The index of the output item in the response. |
|
|
delta The incremental text. |
response.audio_transcript.done
The server returns a response.audio_transcript.done event when the output modality includes audio and the model completes transcribing the audio.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
response_id The response ID. |
|
|
item_id The message item ID. |
|
|
output_index The index of the output item in the response. |
|
|
content_index The index of the output item in the response. |
|
|
transcript The full transcription text. |
response.function_call_arguments.delta
When the model generates the argument string for a function call in a streaming manner, the server pushes this event for each new segment of content. You should concatenate the delta fields from each event in the order they are received to obtain the current argument text. The complete content is provided in the subsequent response.function_call_arguments.done event.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
response_id The response ID. |
|
|
item_id The message item ID. |
|
|
output_index The index of the output item in the response. |
|
|
call_id The unique ID for this function invocation. This is consistent with the |
|
|
delta The new segment of the argument string (delta). You must concatenate these segments in order. |
response.function_call_arguments.done
Indicates that the function call arguments have been fully generated. The arguments field in this event contains the complete argument string. After you receive this event, you can parse the arguments and call the local tool function. You must use the complete arguments from this event, not the concatenated delta result.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
response_id The response ID. |
|
|
item_id The message item ID. |
|
|
output_index The index of the output item in the response. |
|
|
call_id The unique ID for this function invocation. |
|
|
name The name of the function that was called. |
|
|
arguments The complete arguments for the function invocation, typically represented as a JSON string. |
response.output_item.added
The server returns this event when it creates a new item during response generation. The item type can be message or function_call.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
response_id The response ID. |
|
|
output_index The index of the output item in the response. |
|
|
item Information about the output item. |
response.output_item.done
The server returns this event when the generation of a new output item is complete.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
response_id The response ID. |
|
|
output_index The index of the output item in the response. |
|
|
item The output item information. |
response.content_part.added
The server returns this event when it adds a new content part to an assistant message item during response generation.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
response_id The response ID. |
|
|
item_id The message item ID. |
|
|
output_index The index of the output item in the response. This value is always 0. |
|
|
content_index The index of the internal part within the output item. This value is always 0. |
|
|
part The output item information. |
response.content_part.done
The server returns this event when the streaming of a content part within an assistant message item finishes.
|
event_id A unique identifier for this event. |
|
|
type The event type. This value is always |
|
|
response_id The response ID. |
|
|
item_id The message item ID. |
|
|
output_index The index of the output item in the response. This value is always 0. |
|
|
content_index The index of the content part in the content array. This value is always 0. |
|
|
part The output item information. |