All Products
Search
Document Center

Alibaba Cloud Model Studio:Server-Side Events

Last Updated:Mar 31, 2026

This topic describes the server-side events for the Qwen-Omni-Realtime API.

References: Real-time (Qwen-Omni-Realtime).

error

The server-side error message.

event_id string

A unique identifier for this event.

{
  "event_id": "event_RoUu4T8yExPMI37GKwaOC",
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "code": "invalid_value",
    "message": "Invalid modalities: ['audio']. Supported combinations are: ['text'] and ['audio', 'text'].",
    "param": "session.modalities"
  }
}

type string

The event type. This value is always error.

error object

Detailed error information.

Properties

type string

The error type.

code string

The error code.

message string

The error message.

param string

The parameter related to the error, such as session.modalities.

session.created

The server returns this event after a client connects. It contains the default configuration for the session.

event_id string

A unique identifier for this event.

{
    "event_id": "event_RdvlSpbBb2ssyBjYrDHjt",
    "type": "session.created",
    "session": {
        "object": "realtime.session",
        "model": "qwen3-omni-flash-realtime",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Cherry",
        "input_audio_format": "pcm",
        "output_audio_format": "pcm",
        "input_audio_transcription": {
            "model": "gummy-realtime-v1"
        },
        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.5,
            "prefix_padding_ms": 300,
            "silence_duration_ms": 800,
            "create_response": true,
            "interrupt_response": true
        },
        "enable_search": false,
        "search_options": {},
        "tools": [],
        "tool_choice": "auto",
        "temperature": 0.8,
        "id": "sess_Ov7GOXoNXhNjlxXtOGKQS"
    }
}

type string

The event type. This value is always session.created.

session object

The session configuration.

Properties

object string

This value is always realtime.session.

model string

The model used.

modalities array

The output modalities for the model.

voice string

Specifies the timbre of the audio generated by the model.

input_audio_format string

The input audio format, which is always pcm.

output_audio_format string

The output audio format is always pcm.

input_audio_transcription object

The transcription configuration.

Properties

model string

The transcription model. This value is always gummy-realtime-v1.

turn_detection object

The voice activity detection (VAD) configuration.

Properties

type string

The server-side VAD type. This value is always server_vad.

threshold float

The VAD detection threshold.

silence_duration_ms integer

The duration of silence, in milliseconds, before speech stops.

enable_search boolean

Whether to enable web search. This parameter is supported only by the Qwen3.5-Omni-Realtime model.

search_options object

The options for the web search.

temperature float

The temperature parameter for the model.

session.updated

You receive this event after you send a session.update request and the request succeeds. If the request fails, the server returns an error event.

event_id string

A unique identifier for this event.

{
    "event_id": "event_X1HsXS4b4uptp6yo1LgKd",
    "type": "session.updated",
    "session": {
        "id": "sess_Aih6vAcY5Ddt6jwFx1tCa",
        "object": "realtime.session",
        "model": "qwen3-omni-flash-realtime",
        "modalities": [
            "text",
            "audio"
        ],
        "instructions": "You are Xiao Yun, a personal assistant. Answer user questions accurately and politely. Always respond with a helpful attitude.",
        "voice": "Cherry",
        "input_audio_format": "pcm",
        "output_audio_format": "pcm",
        "input_audio_transcription": {
            "model": "gummy-realtime-v1"
        },
        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.1,
            "prefix_padding_ms": 500,
            "silence_duration_ms": 900,
            "create_response": true,
            "interrupt_response": true
        },
        "enable_search": true,
        "search_options": {
            "enable_source": true
        },
        "temperature": 0.8,
        "max_response_output_token": "inf",
        "max_tokens": 16384,
        "repetition_penalty": 1.05,
        "presence_penalty": 0.0,
        "top_k": 50,
        "top_p": 1.0,
        "seed":-1
    }
}

type string

The event type. This value is always session.updated.

session object

The session configuration.

Properties

temperature float

The temperature parameter for the model.

modalities array

The output modalities for the model.

voice string

Timbre of the audio generated by the model.

instructions string

The model's goal and role.

input_audio_format string

The input audio format. Only pcm is supported.

output_audio_format string

The only supported output audio format is pcm.

input_audio_transcription object

The transcription configuration.

Properties

model string

The transcription model. This value is always gummy-realtime-v1.

turn_detection object

The voice activity detection (VAD) configuration.

Properties

type string

The server-side VAD type. This value is always server_vad.

threshold float

The VAD detection threshold.

silence_duration_ms integer

The duration of silence that triggers speech stop detection.

enable_search boolean (optional)

Whether to enable web search. Only supported by the Qwen3.5-Omni-Realtime model.

search_options object (optional)

The web search options.

top_pfloat

The probability threshold for nucleus sampling.

top_k integer

The size of the candidate set used during model generation.

max_tokens integer

The maximum number of tokens the model returns in this request.

repetition_penalty float

Controls repetition in consecutive sequences during generation.

presence_penalty float

Controls repetition in generated content.

seed integer

The consistency level of results across requests.

input_audio_buffer.speech_started

In VAD mode, the server returns this event when it detects speech start in the audio buffer.

If the server has not yet detected speech, this event may trigger each time you add audio to the buffer.

event_id string

A unique identifier for this event.

{
    "event_id": "event_Pvp8nEhsQuGCQbFJ9x58n",
    "type": "input_audio_buffer.speech_started",
    "audio_start_ms": 3647,
    "item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}

type string

The event type. This value is always input_audio_buffer.speech_started.

audio_start_ms integer

The number of milliseconds from when audio starts writing to the buffer until speech is first detected.

item_id string

The ID of the user message item that will be created when speech stops.

User message items append user input to the conversation history for later model inference and generation.

input_audio_buffer.speech_stopped

In VAD mode, the server returns this event when it detects the end of speech in the audio buffer.

The server also returns a conversation.item.created event that creates the corresponding user message item.

event_id string

A unique identifier for this event.

{
    "event_id": "event_UhQiqNVRsgUiq4KUS5Xb5",
    "type": "input_audio_buffer.speech_stopped",
    "audio_end_ms": 4453,
    "item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}

type string

The event type. This value is always input_audio_buffer.speech_stopped.

audio_end_ms integer

The number of milliseconds from session start until speech stops.

item_id string

The ID of the user message item that will be created.

input_audio_buffer.committed

The server returns this event when the input audio buffer is committed.

  • In VAD mode, the server automatically commits the audio buffer and returns this event when it detects speech end.

  • In Manual mode, the server returns this event after the client sends an input_audio_buffer.commit event.

event_id string

A unique identifier for this event.

{
    "event_id": "event_Iy6sUzL1nmdFgshFYxJEz",
    "type": "input_audio_buffer.committed",
    "item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}

type string

The event type. This value is always input_audio_buffer.committed.

item_id string

The ID of the user message item that will be created.

input_audio_buffer.cleared

After the client sends an input_audio_buffer.clear event, the server returns this event.

event_id string

A unique identifier for this event.

{
  "event_id": "event_RoUu4T8yExPMI37GKwaOC",
  "type": "input_audio_buffer.cleared"
}

type string

The event type. This value is always input_audio_buffer.cleared.

conversation.item.created

The server returns this event when a conversation item is created.

event_id string

A unique identifier for this event.

{
    "event_id": "event_JEfkrr9gO3Ny7Xcv9bGVd",
    "type": "conversation.item.created",
    "item": {
        "id": "item_YbAiGvK2H7YaS34o4R6Ba",
        "object": "realtime.item",
        "type": "message",
        "status": "in_progress",
        "role": "assistant",
        "content": [
            {
                "type": "input_audio"
            }
        ]
    }
}

type string

The event type. This value is always conversation.item.created.

item object

The conversation item to add.

Properties

id string

The unique ID of the conversation item.

object string

This value is always realtime.item.

status string

The status of the conversation item.

role string

The role of the message.

content string

The message content.

conversation.item.input_audio_transcription.completed

This event indicates that the user’s audio has been transcribed after being written to the buffer. The transcription is performed by a dedicated speech recognition model (currently fixed to gummy-realtime-v1).

The transcribed text from the speech recognition model may differ from the interpretation generated by the Qwen-Omni-Realtime model. Use the transcription for reference only.

event_id string

A unique identifier for this event.

{
    "event_id": "event_FrrZcxiDfTB9LD9p4pVng",
    "type": "conversation.item.input_audio_transcription.completed",
    "item_id": "item_YbAiGvK2H7YaS34o4R6Ba",
    "content_index": 0,
    "transcript": "Hello."
}

type string

The event type. This value is always conversation.item.input_audio_transcription.completed.

item_id string

The ID of the user message item.

content_index integer

This value is always 0.

transcript string

The transcribed text.

conversation.item.input_audio_transcription.failed

The server returns this event when input audio transcription is enabled and fails. This event is independent of the error event to help clients identify transcription failures.

event_id string

A unique identifier for this event.

{
  "type": "conversation.item.input_audio_transcription.failed",
  "item_id": "<item_id>",
  "content_index": 0,
  "error": {
    "code": "<code>",
    "message": "<message>",
    "param": "<param>"
  }
}

type string

The event type. This value is always conversation.item.input_audio_transcription.failed.

item_id string

The ID of the user message item.

content_index integer

This value is always 0.

error object

The error information.

Properties

code string

The error code.

message string

The error message.

param string

The parameter related to the error.

response.created

The server returns this event when it generates a new model response.

event_id string

A unique identifier for this event.

{
    "event_id": "event_XuDavMzQN3KKepqGu3KRh",
    "type": "response.created",
    "response": {
        "id": "resp_HaVOPdbmX6vifiV5pAfJY",
        "object": "realtime.response",
        "conversation_id": "conv_FjJaccpnvwHNo9cPVuzGc",
        "status": "in_progress",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Cherry",
        "output_audio_format": "pcm",
        "output": []
    }
}

type string

The event type. This value is always response.created.

response object

The response object.

Properties

id string

The unique ID of the response.

conversation_id string

The unique ID of the current session.

object string

The object type. For this event, this value is always realtime.response.

status string

The response status. Valid values are completed, failed, in_progress, or incomplete.

modalities array

The response modalities.

voice string

The audio timbre generated by the model.

output string

This field is empty for this event.

response.done

The server returns this event after the response finishes generating. The response object includes all output items except raw audio data.

event_id string

A unique identifier for this event.

{
    "event_id": "event_CSaxRRYLvbrfexDXAEuDG",
    "type": "response.done",
    "response": {
        "id": "resp_HaVOPdbmX6vifiV5pAfJY",
        "object": "realtime.response",
        "conversation_id": "conv_FjJaccpnvwHNo9cPVuzGc",
        "status": "completed",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Cherry",
        "output_audio_format": "pcm",
        "output": [
            {
                "id": "item_Ls6MtCUWO7LM4E59QziNv",
                "object": "realtime.item",
                "type": "message",
                "status": "completed",
                "role": "assistant",
                "content": [
                    {
                        "type": "audio",
                        "transcript": "Hello! How can I help you?"
                    }
                ]
            }
        ],
        "usage": {
            "total_tokens": 377,
            "input_tokens": 336,
            "output_tokens": 41,
            "input_tokens_details": {
                "text_tokens": 228,
                "audio_tokens": 108
            },
            "output_tokens_details": {
                "text_tokens": 9,
                "audio_tokens": 32
            },
            "plugins": {
                "search": {
                    "count": 1,
                    "strategy": "agent"
                }
            }
        }
    }
}

type string

The event type. This value is always response.done.

response object

The response object.

Properties

id string

The unique ID of the response.

conversation_id string

The unique ID of the current session.

object string

The object type. For this event, this value is always realtime.response.

status string

The response status.

modalities array

The response modalities.

voice string

The audio voice used for the model's output.

output object

The response output.

Properties

id string

The ID of the response output.

type string

The output item type. This value is always message.

object string

The output item object type. This value is always realtime.item.

status string

The output item status.

role string

The output item role.

content array

The output item content.

Properties

type string

The content type. Use text for plain text output. Use audio for audio output.

text string

The text output.

transcript string

The text transcript of the audio.

usage object

Token usage details for this response.

Properties

total_tokens integer

The total number of tokens used in this response.

input_tokens integer

The number of input tokens.

output_tokens integer

The number of output tokens.

input_tokens_details object

Details about input token usage, including text_tokens (text tokens) and audio_tokens (audio tokens).

output_tokens_details object

Details about output token usage, including text_tokens (text tokens) and audio_tokens (audio tokens).

plugins object (optional)

Plugin usage metrics. Returned when web search (enable_search) is enabled.

Properties

search object

Search the web for metering information.

Properties

count integer

The number of searches.

strategy string

The search strategy.

response.text.delta

The server returns this event when the output modality is text only and the model generates new text incrementally.

event_id string

A unique identifier for this event.

{
    "delta": "Hello",
    "event_id": "event_TH49MauuPmRo1RGaMSlP7",
    "type": "response.text.delta",
    "response_id": "resp_PrRSvPVpnCExdUOGHHLuP",
    "item_id": "item_L8IRm9kRXFpxoOjDqDC96",
    "output_index": 0,
    "content_index": 0
}

type string

The event type. This value is always response.text.delta.

delta string

The incremental text returned.

response_id string

The response ID.

item_id string

The message item ID. You can use this to reference the same message item.

output_index integer

The index of the output item in the response. This value is always 0.

content_index integer

The index of the internal part within the output item. This value is always 0.

response.text.done

When the output modality is text only and the model finishes generating text, the server returns this event.

The server also returns this event if the response is interrupted, incomplete, or canceled.

event_id string

A unique identifier for this event.

{
  "event_id": "event_B1lIeE2Nac33zn5V7h2mm",
  "type": "response.text.done",
  "response_id": "resp_B1lIdtjF4Noqpn5NOjznj",
  "item_id": "item_B1lIdJsAJlJiFs8ztWpJt",
  "output_index": 0,
  "content_index": 0,
  "text": "How can I assist you today?"
}

type string

The event type. This value is always response.text.done.

response_id string

The response ID.

item_id string

The message item ID.

output_index integer

The index of the output item in the response.

content_index integer

The index of the content item in the response.

text string

The full text output by the model.

response.audio.delta

When the output modality includes audio and the model generates new audio data incrementally, the server returns this event.

event_id string

A unique identifier for this event.

{
  "event_id": "event_B1osWMZBtrEQbiIwW0qHQ",
  "type": "response.audio.delta",
  "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
  "item_id": "item_OFaPGtzfWCPyGzxnuEX9i",
  "output_index": 0,
  "content_index": 0,
  "delta": "{base64 audio}"
}

type string

The event type. This value is always response.audio.delta.

response_id string

The response ID.

item_id string

The message item ID.

output_index integer

The index of the output item in the response.

content_index integer

The index of the output item in the response.

delta string

The audio data output by the model, encoded in Base64.

response.audio.done

When the output modality includes audio and the model finishes generating audio data, the server returns this event.

The server also returns this event if the response is interrupted, incomplete, or canceled.

event_id string

A unique identifier for this event.

{
    "event_id": "event_Le1TDl7VfyHQxl47DtGxI",
    "type": "response.audio.done",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0
}

type string

The event type. This value is always response.audio.done.

response_id string

The response ID.

item_id string

The message item ID.

output_index integer

The index of the output item in the response.

content_index integer

The index of the output item in the response.

response.audio_transcript.delta

The server returns a response.audio_transcript.delta event when the output modality includes audio and the model generates new text for the audio incrementally.

event_id string

A unique identifier for this event.

{
    "event_id": "event_BksW7fOwnyavZdDxIzZYM",
    "type": "response.audio_transcript.delta",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0,
    "delta": "What"
}

type string

The event type. This value is always response.audio_transcript.delta.

response_id string

The response ID.

item_id string

The message item ID.

output_index integer

The index of the output item in the response.

content_index integer

The index of the content item in the response.

delta string

The incremental text.

response.audio_transcript.done

The server returns a response.audio_transcript.done event when the output modality includes audio and the model completes transcribing the audio.

event_id string

A unique identifier for this event.

{
    "event_id": "event_X49tL2WerT4WjxcmH16lS",
    "type": "response.audio_transcript.done",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0,
    "transcript": "Hello! How can I help you?"
}

type string

The event type. This value is always response.audio_transcript.done.

response_id string

The response ID.

item_id string

The message item ID.

output_index integer

The index of the output item in the response.

content_index integer

The index of the content item in the response.

transcript string

The full transcription text.

response.output_item.added

The server returns this event when it creates a new output item during response generation.

event_id string

A unique identifier for this event.

{
    "event_id": "event_DsCO341DEVtiATtCB6BUY",
    "type": "response.output_item.added",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "output_index": 0,
    "item": {
        "id": "item_Ls6MtCUWO7LM4E59QziNv",
        "object": "realtime.item",
        "type": "message",
        "status": "in_progress",
        "role": "assistant",
        "content": []
    }
}

type string

The event type. This value is always response.output_item.added.

response_id string

The ID of the response.

output_index integer

The index of the output item in the response.

item object

Information about the output item.

Properties

id string

The unique ID of the output item.

object string

This value is always realtime.item.

status string

The status of the output item.

role string

The role of the sender.

content string

The message content.

response.output_item.done

The server returns this event when a new output item is complete.

event_id string

A unique identifier for this event.

{
    "event_id": "event_MEu5nlLw1LsOguHiehIP8",
    "type": "response.output_item.done",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "output_index": 0,
    "item": {
        "id": "item_Ls6MtCUWO7LM4E59QziNv",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "assistant",
        "content": [
            {
                "type": "audio",
                "text": "Hello! How can I help you?"
            }
        ]
    }
}

type string

The event type. This value is always response.output_item.done.

response_id string

The response ID.

output_index integer

The index of the output item in the response.

item object

The output item information.

Properties

id string

The unique ID of the output item.

object string

This value is always realtime.item.

status string

The status of the output item.

role string

The role of the sender.

content string

The message content.

response.content_part.added

The server returns this event when it adds a new content part to an assistant message item during response generation.

event_id string

A unique identifier for this event.

{
    "event_id": "event_AVBOmrgY3C8bjlRajfSUT",
    "type": "response.content_part.added",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0,
    "part": {
        "type": "audio",
        "text": ""
    }
}

type string

The event type. This value is always response.content_part.added.

response_id string

The response ID.

item_id string

The message item ID.

output_index integer

The index of the output item in the response. This value is always 0.

content_index integer

The index of the internal part within the output item. This value is always 0.

part object

The output item information.

Properties

type string

The type of the content part.

text string

The text of the content part.

response.content_part.done

The server returns this event when the streaming of a content part within an assistant message item finishes.

event_id string

A unique identifier for this event.

{
    "event_id": "event_Il8HD19v58Qr5IBkw7LtN",
    "type": "response.content_part.done",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0,
    "part": {
        "type": "audio",
        "text": "Hello! How can I help you?"
    }
}

type string

The event type. This value is always response.content_part.done.

response_id string

The response ID.

item_id string

The message item ID.

output_index integer

The index of the output item in the response. This value is always 0.

content_index integer

The index of the content part in the content array. This value is always 0.

part object

The output item information.

Properties

type string

The type of the content part.

text string

The text of the content part.