All Products
Search
Document Center

Alibaba Cloud Model Studio:Server events

Last Updated:Mar 15, 2026

This topic describes the server events for the Qwen-Omni-Realtime API.

Reference: Real-time (Qwen-Omni-Realtime).

error

Error message from server.

event_id string

Unique event identifier.

{
  "event_id": "event_RoUu4T8yExPMI37GKwaOC",
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "code": "invalid_value",
    "message": "Invalid modalities: ['audio']. Supported combinations are: ['text'] and ['audio', 'text'].",
    "param": "session.modalities"
  }
}

type string

Always error.

error object

Error details.

Properties

type string

The error type.

code string

The error code.

message string

The error message.

param string

The parameter related to the error, such as session.modalities.

session.created

First event returned after client connection, containing default session configuration.

event_id string

Unique event identifier.

{
    "event_id": "event_RdvlSpbBb2ssyBjYrDHjt",
    "type": "session.created",
    "session": {
        "object": "realtime.session",
        "model": "qwen3-omni-flash-realtime",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Cherry",
        "input_audio_format": "pcm",
        "output_audio_format": "pcm",
        "input_audio_transcription": {
            "model": "gummy-realtime-v1"
        },
        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.5,
            "prefix_padding_ms": 300,
            "silence_duration_ms": 800,
            "create_response": true,
            "interrupt_response": true
        },
        "tools": [],
        "tool_choice": "auto",
        "temperature": 0.8,
        "id": "sess_Ov7GOXoNXhNjlxXtOGKQS"
    }
}

type string

Always session.created.

session object

Session configuration.

Properties

object string

Always realtime.session.

model string

Model name.

modalities array

Model output modalities.

voice string

The timbre of the audio generated by the model.

input_audio_format string

The input audio format. Always pcm.

output_audio_format string

The output audio format. Always pcm.

input_audio_transcription object

Speech transcription configuration.

Properties

model string

The speech transcription model. Always gummy-realtime-v1.

turn_detection object

VAD configuration.

Properties

type string

The server VAD type. Always server_vad.

threshold float

The VAD detection threshold.

silence_duration_ms integer

Duration of silence to detect speech end.

temperature float

Model temperature.

session.updated

Returned after a successful session.update request. On error, an error event is returned instead.

event_id string

Unique event identifier.

{
    "event_id": "event_X1HsXS4b4uptp6yo1LgKd",
    "type": "session.updated",
    "session": {
        "id": "sess_Aih6vAcY5Ddt6jwFx1tCa",
        "object": "realtime.session",
        "model": "qwen3-omni-flash-realtime",
        "modalities": [
            "text",
            "audio"
        ],
        "instructions": "You are a personal assistant named Xiaoyun. Please answer user questions accurately and in a friendly manner, always responding with a helpful attitude.",
        "voice": "Cherry",
        "input_audio_format": "pcm",
        "output_audio_format": "pcm",
        "input_audio_transcription": {
            "model": "gummy-realtime-v1"
        },
        "turn_detection": {
            "type": "server_vad",
            "threshold": 0.1,
            "prefix_padding_ms": 500,
            "silence_duration_ms": 900,
            "create_response": true,
            "interrupt_response": true
        },
        "temperature": 0.8,
        "max_response_output_token": "inf",
        "max_tokens": 16384,
        "repetition_penalty": 1.05,
        "presence_penalty": 0.0,
        "top_k": 50,
        "top_p": 1.0,
        "seed":-1
    }
}

type string

Always session.updated.

session object

Session configuration.

Properties

temperature float

Model temperature.

modalities array

Model output modalities.

voice string

The timbre of the audio generated by the model.

instructions string

Model role and objective.

input_audio_format string

The input audio format. Always pcm.

output_audio_format string

The output audio format. Always pcm.

input_audio_transcription object

The configuration for speech transcription.

Properties

model string

The speech transcription model. Always gummy-realtime-v1.

turn_detection object

VAD configuration.

Properties

type string

The server VAD type. Always server_vad.

threshold float

The VAD detection threshold.

silence_duration_ms integer

The duration of silence to detect the end of speech.

top_pfloat

The probability threshold for nucleus sampling.

top_k integer

Candidate set size for sampling.

max_tokens integer

Maximum tokens per request.

repetition_penalty float

Controls repetition in consecutive token sequences during model generation.

presence_penalty float

Controls content repetition.

seed integer

Result consistency across requests.

input_audio_buffer.speech_started

Returned when VAD detects speech start in audio buffer.

May also trigger each time audio is added before speech detection.

event_id string

Unique event identifier.

{
    "event_id": "event_Pvp8nEhsQuGCQbFJ9x58n",
    "type": "input_audio_buffer.speech_started",
    "audio_start_ms": 3647,
    "item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}

type string

Always input_audio_buffer.speech_started.

audio_start_ms integer

Milliseconds from when audio writing begins until speech is first detected.

item_id string

User message item ID (created on speech stop).

User message items append input to conversation history for subsequent model inference and generation.

input_audio_buffer.speech_stopped

Returned when VAD detects speech end in audio buffer.

A conversation.item.created event is also returned to create the user message item.

event_id string

Unique event identifier.

{
    "event_id": "event_UhQiqNVRsgUiq4KUS5Xb5",
    "type": "input_audio_buffer.speech_stopped",
    "audio_end_ms": 4453,
    "item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}

type string

Always input_audio_buffer.speech_stopped.

audio_end_ms integer

Milliseconds from session start to speech stop.

item_id string

User message item ID (to be created).

input_audio_buffer.committed

Returned when audio buffer is committed.

  • In VAD mode, returned automatically when speech ends.

  • In Manual mode, returned after client sends input_audio_buffer.commit.

event_id string

Unique event identifier.

{
    "event_id": "event_Iy6sUzL1nmdFgshFYxJEz",
    "type": "input_audio_buffer.committed",
    "item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}

type string

Always input_audio_buffer.committed.

item_id string

User message item ID (to be created).

input_audio_buffer.cleared

Returned after client sends input_audio_buffer.clear.

event_id string

Unique event identifier.

{
  "event_id": "event_RoUu4T8yExPMI37GKwaOC",
  "type": "input_audio_buffer.cleared"
}

type string

Always input_audio_buffer.cleared.

conversation.item.created

Returned when a conversation item is created.

event_id string

Unique event identifier.

{
    "event_id": "event_JEfkrr9gO3Ny7Xcv9bGVd",
    "type": "conversation.item.created",
    "item": {
        "id": "item_YbAiGvK2H7YaS34o4R6Ba",
        "object": "realtime.item",
        "type": "message",
        "status": "in_progress",
        "role": "assistant",
        "content": [
            {
                "type": "input_audio"
            }
        ]
    }
}

type string

Always conversation.item.created.

item object

Conversation item to add.

Properties

id string

Conversation item ID.

object string

Always realtime.item.

status string

Item status.

role string

Message role.

content string

Message content.

conversation.item.input_audio_transcription.completed

Provides transcription results after user audio is buffered. Transcription is processed by the gummy-realtime-v1 speech recognition model.

Transcribed text from the speech recognition model may differ from text processed by the Qwen-Omni-Realtime model — for reference only.

event_id string

Unique event identifier.

{
    "event_id": "event_FrrZcxiDfTB9LD9p4pVng",
    "type": "conversation.item.input_audio_transcription.completed",
    "item_id": "item_YbAiGvK2H7YaS34o4R6Ba",
    "content_index": 0,
    "transcript": "Hello."
}

type string

Always conversation.item.input_audio_transcription.completed.

item_id string

User message item ID.

content_index integer

Currently fixed at 0.

transcript string

Transcribed text.

conversation.item.input_audio_transcription.failed

Returned when input audio transcription fails. Separate from the error event for easier issue identification.

event_id string

Unique event identifier.

{
  "type": "conversation.item.input_audio_transcription.failed",
  "item_id": "<item_id>",
  "content_index": 0,
  "error": {
    "code": "<code>",
    "message": "<message>",
    "param": "<param>"
  }
}

type string

Always conversation.item.input_audio_transcription.failed.

item_id string

User message item ID.

content_index integer

Currently fixed at 0.

error object

Error details.

Properties

code string

The error code.

message string

Error message.

param string

Error parameter.

response.created

Returned when generating a new model response.

event_id string

Unique event identifier.

{
    "event_id": "event_XuDavMzQN3KKepqGu3KRh",
    "type": "response.created",
    "response": {
        "id": "resp_HaVOPdbmX6vifiV5pAfJY",
        "object": "realtime.response",
        "conversation_id": "conv_FjJaccpnvwHNo9cPVuzGc",
        "status": "in_progress",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Cherry",
        "output_audio_format": "pcm",
        "output": []
    }
}

type string

Always response.created.

response object

Response details.

Properties

id string

Response ID.

conversation_id string

Session ID.

object string

Always realtime.response.

status string

Response status. Valid values: [completed, failed, in_progress, or incomplete].

modalities array

Response modalities.

voice string

The timbre of the audio generated by the model.

output string

Empty for this event.

response.done

Returned after response generation. The response object contains all output items except raw audio data.

event_id string

Unique event identifier.

{
    "event_id": "event_CSaxRRYLvbrfexDXAEuDG",
    "type": "response.done",
    "response": {
        "id": "resp_HaVOPdbmX6vifiV5pAfJY",
        "object": "realtime.response",
        "conversation_id": "conv_FjJaccpnvwHNo9cPVuzGc",
        "status": "completed",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Cherry",
        "output_audio_format": "pcm",
        "output": [
            {
                "id": "item_Ls6MtCUWO7LM4E59QziNv",
                "object": "realtime.item",
                "type": "message",
                "status": "completed",
                "role": "assistant",
                "content": [
                    {
                        "type": "audio",
                        "transcript": "Hello! Is there anything I can help you with?"
                    }
                ]
            }
        ],
        "usage": {
            "total_tokens": 377,
            "input_tokens": 336,
            "output_tokens": 41,
            "input_tokens_details": {
                "text_tokens": 228,
                "audio_tokens": 108
            },
            "output_tokens_details": {
                "text_tokens": 9,
                "audio_tokens": 32
            }
        }
    }
}

type string

Always response.done.

response object

Response details.

Properties

id string

Response ID.

conversation_id string

Session ID.

object string

Always realtime.response.

status string

Response status.

modalities array

Response modalities.

voice string

The timbre of the audio generated by the model.

output object

Response output.

Properties

id string

Output item ID.

type string

Output item type. Currently message.

object string

Always realtime.item.

status string

Output item status.

role string

Output item role.

content array

Output item content.

Properties

type string

The type of the output content. The value is text if the output is plain text, or audio if the output includes audio.

text string

Text content.

transcript string

Audio transcript.

usage object

Token usage for this response.

response.text.delta

Returned when model incrementally generates text (text-only output modality).

event_id string

Unique event identifier.

{
    "delta": "Hello",
    "event_id": "event_TH49MauuPmRo1RGaMSlP7",
    "type": "response.text.delta",
    "response_id": "resp_PrRSvPVpnCExdUOGHHLuP",
    "item_id": "item_L8IRm9kRXFpxoOjDqDC96",
    "output_index": 0,
    "content_index": 0
}

type string

Always response.text.delta.

delta string

Incremental text.

response_id string

Response ID.

item_id string

Message item ID. You can use this ID to associate items from the same message.

output_index integer

Output item index in response. Currently fixed at 0.

content_index integer

Content part index within output item. Currently fixed at 0.

response.text.done

Returned when model finishes generating text (text-only output modality).

Also returned on interruption, incompletion, or cancellation.

event_id string

Unique event identifier.

{
  "event_id": "event_B1lIeE2Nac33zn5V7h2mm",
  "type": "response.text.done",
  "response_id": "resp_B1lIdtjF4Noqpn5NOjznj",
  "item_id": "item_B1lIdJsAJlJiFs8ztWpJt",
  "output_index": 0,
  "content_index": 0,
  "text": "How can I assist you today?"
}

type string

Always response.text.done.

response_id string

Response ID.

item_id string

Message item ID.

output_index integer

Output item index in response.

content_index integer

Content part index within output item.

text string

Complete model text output.

response.audio.delta

Returned when model incrementally generates audio data.

event_id string

Unique event identifier.

{
  "event_id": "event_B1osWMZBtrEQbiIwW0qHQ",
  "type": "response.audio.delta",
  "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
  "item_id": "item_OFaPGtzfWCPyGzxnuEX9i",
  "output_index": 0,
  "content_index": 0,
  "delta": "{base64 audio}"
}

type string

Always response.audio.delta.

response_id string

Response ID.

item_id string

Message item ID.

output_index integer

Output item index in response.

content_index integer

Content part index within output item.

delta string

Incremental audio data (Base64-encoded).

response.audio.done

Returned when model finishes generating audio data.

Also returned on interruption, incompletion, or cancellation.

event_id string

Unique event identifier.

{
    "event_id": "event_Le1TDl7VfyHQxl47DtGxI",
    "type": "response.audio.done",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0
}

type string

Always response.audio.done.

response_id string

Response ID.

item_id string

Message item ID.

output_index integer

Output item index in response.

content_index integer

Content part index within output item.

response.audio_transcript.delta

Returned when model incrementally generates text for audio output.

event_id string

Unique event identifier.

{
    "event_id": "event_BksW7fOwnyavZdDxIzZYM",
    "type": "response.audio_transcript.delta",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0,
    "delta": "Is there anything"
}

type string

Always response.audio_transcript.delta.

response_id string

Response ID.

item_id string

Message item ID.

output_index integer

Output item index in response.

content_index integer

Content part index within output item.

delta string

Incremental text.

response.audio_transcript.done

Returned when model finishes transcribing audio output.

event_id string

Unique event identifier.

{
    "event_id": "event_X49tL2WerT4WjxcmH16lS",
    "type": "response.audio_transcript.done",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0,
    "transcript": "Hello! Is there anything I can help you with?"
}

type string

Always response.audio_transcript.done.

response_id string

Response ID.

item_id string

Message item ID.

output_index integer

Output item index in response.

content_index integer

Content part index within output item.

transcript string

Complete transcript.

response.output_item.added

Returned when creating a new item during response generation.

event_id string

Unique event identifier.

{
    "event_id": "event_DsCO341DEVtiATtCB6BUY",
    "type": "response.output_item.added",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "output_index": 0,
    "item": {
        "id": "item_Ls6MtCUWO7LM4E59QziNv",
        "object": "realtime.item",
        "type": "message",
        "status": "in_progress",
        "role": "assistant",
        "content": []
    }
}

type string

Always response.output_item.added.

response_id string

Response ID.

output_index integer

Output item index in response.

item object

Output item details.

Properties

id string

Output item ID.

object string

Always realtime.item.

status string

Output item status.

role string

Message sender role.

content string

Message content.

response.output_item.done

Returned when new item output completes.

event_id string

Unique event identifier.

{
    "event_id": "event_MEu5nlLw1LsOguHiehIP8",
    "type": "response.output_item.done",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "output_index": 0,
    "item": {
        "id": "item_Ls6MtCUWO7LM4E59QziNv",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "assistant",
        "content": [
            {
                "type": "audio",
                "text": "Hello! Is there anything I can help you with?"
            }
        ]
    }
}

type string

Always response.output_item.done.

response_id string

Response ID.

output_index integer

Output item index in response.

item object

Output item details.

Properties

id string

Output item ID.

object string

Always realtime.item.

status string

Output item status.

role string

Message sender role.

content string

Message content.

response.content_part.added

Returned when adding a new content part to assistant message during response generation.

event_id string

Unique event identifier.

{
    "event_id": "event_AVBOmrgY3C8bjlRajfSUT",
    "type": "response.content_part.added",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0,
    "part": {
        "type": "audio",
        "text": ""
    }
}

type string

Always response.content_part.added.

response_id string

Response ID.

item_id string

Message item ID.

output_index integer

Output item index in response. Currently fixed at 0.

content_index integer

Content part index within output item. Currently fixed at 0.

part object

Content part details.

Properties

type string

Content part type.

text string

Content part text.

response.content_part.done

Returned when content part streaming completes in assistant message.

event_id string

Unique event identifier.

{
    "event_id": "event_Il8HD19v58Qr5IBkw7LtN",
    "type": "response.content_part.done",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0,
    "part": {
        "type": "audio",
        "text": "Hello! Is there anything I can help you with?"
    }
}

type string

Always response.content_part.done.

response_id string

Response ID.

item_id string

Message item ID.

output_index integer

Output item index in response. Currently fixed at 0.

content_index integer

The index of the content part within the content array of the item. Currently fixed at 0.

part object

Returned information

Properties

type string

Content part type.

text string

Content part text.