All Products
Search
Document Center

Alibaba Cloud Model Studio:Server events

Last Updated:Jun 16, 2026

Server events for the Qwen-Omni-Realtime API, including function calling events.

See Qwen-Omni-Realtime.

error

Indicates a server error message.

event_id string

A unique identifier for this event.

{
  "event_id": "event_RoUu4T8yExPMI37GKwaOC",
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "code": "invalid_value",
    "message": "Invalid modalities: ['audio']. Supported combinations are: ['text'] and ['audio', 'text'].",
    "param": "session.modalities"
  }
}

type string

The event type. This value is always error.

error object

Detailed error information.

Properties

type string

The error type.

code string

The error code.

message string

The error message.

param string

The parameter associated with the error, such as session.modalities.

session.created

The server returns this event after a client connects. It contains the default configuration for the session.

event_id string

A unique identifier for this event.

{
    "event_id": "event_RdvlSpbBb2ssyBjYrDHjt",
    "type": "session.created",
    "session": {
        "object": "realtime.session",
        "model": "qwen3-omni-flash-realtime",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Cherry",
        "input_audio_format": "pcm",
        "output_audio_format": "pcm",
        "input_audio_transcription": {
            "model": "qwen3-asr-flash-realtime"
        },
        "turn_detection": {
            // The value can be server_vad or semantic_vad (only supported by qwen3.5-omni-realtime).
            "type": "server_vad",
            "threshold": 0.5,
            "prefix_padding_ms": 300,
            "silence_duration_ms": 800,
            "create_response": true,
            "interrupt_response": true
        },
        "enable_search": false,
        "search_options": {},
        "tools": [],
        "temperature": 0.8,
        "id": "sess_Ov7GOXoNXhNjlxXtOGKQS"
    }
}

type string

The event type. This value is always session.created.

session object

The session configuration.

Properties

object string

This value is always realtime.session.

model string

The model used.

modalities array

The output modalities for the model.

voice string

The voice for model-generated audio.

input_audio_format string

The format of the user's input audio. Currently only supports pcm. The input audio must be a PCM audio stream at a 16 kHz sample rate.

output_audio_format string

The format of the model's output audio. Currently only supports pcm. The output audio is a PCM audio stream at a 24 kHz sample rate.

input_audio_transcription object

The transcription configuration.

Properties

model string

The transcription model. This value is always qwen3-asr-flash-realtime. This parameter is not configurable.

turn_detection object

The voice activity detection (VAD) configuration.

Properties

type string

The VAD type. Valid values are server_vad (default) or semantic_vad. See Client events.

threshold float

The VAD detection threshold.

silence_duration_ms integer

The duration of silence in milliseconds that triggers the detection of the end of speech.

idle_timeout_ms integer

The idle timeout in milliseconds. Returned only in server_vad mode with qwen3.5-omni-plus-realtime or qwen3.5-omni-flash-realtime models.

enable_search boolean

Whether to enable web search. Supported only by Qwen3.5-Omni-Realtime.

search_options object

The options for the web search.

temperature float

The temperature parameter for the model.

session.updated

The server returns this event after a session.update request succeeds. If the request fails, the server returns an error event.

event_id string

A unique identifier for this event.

{
    "event_id": "event_X1HsXS4b4uptp6yo1LgKd",
    "type": "session.updated",
    "session": {
        "id": "sess_Aih6vAcY5Ddt6jwFx1tCa",
        "object": "realtime.session",
        "model": "qwen3-omni-flash-realtime",
        "modalities": [
            "text",
            "audio"
        ],
        "instructions": "You are Xiao Yun, a personal assistant. Answer user questions accurately and in a friendly manner. Always respond with a helpful attitude.",
        "voice": "Cherry",
        "input_audio_format": "pcm",
        "output_audio_format": "pcm",
        "input_audio_transcription": {
            "model": "qwen3-asr-flash-realtime"
        },
        "turn_detection": {
            // The value can be server_vad or semantic_vad (only supported by qwen3.5-omni-realtime).
            "type": "server_vad",
            "threshold": 0.1,
            "prefix_padding_ms": 500,
            "silence_duration_ms": 900,
            "create_response": true,
            "interrupt_response": true
        },
        "enable_search": true,
        "search_options": {
            "enable_source": true
        },
        "tools": [
            {
                "type": "function",
                "function": {
                    "name": "get_current_weather",
                    "description": "Useful for querying the weather in a specific city.",
                    "parameters": {
                        "type": "object",
                        "properties": {
                            "location": {"type": "string", "description": "The city name"}
                        },
                        "required": ["location"]
                    }
                }
            }
        ],
        "temperature": 0.8,
        "max_response_output_token": "inf",
        "max_tokens": 16384,
        "repetition_penalty": 1.05,
        "presence_penalty": 0.0,
        "top_k": 50,
        "top_p": 1.0,
        "seed":-1
    }
}

type string

The event type. This value is always session.updated.

session object

The session configuration.

Properties

temperature float

The temperature parameter for the model.

modalities array

The output modalities for the model.

voice string

The voice for model-generated audio.

instructions string

The model's goal and role.

input_audio_format string

The format of the user's input audio. Currently only supports pcm. The input audio must be a PCM audio stream at a 16 kHz sample rate.

output_audio_format string

The format of the model's output audio. Currently only supports pcm. The output audio is a PCM audio stream at a 24 kHz sample rate.

input_audio_transcription object

The transcription configuration.

Properties

model string

The transcription model. This value is always qwen3-asr-flash-realtime. This parameter is not configurable.

turn_detection object

The voice activity detection (VAD) configuration.

Properties

type string

The VAD type. Valid values are server_vad (default) or semantic_vad. See Client events.

threshold float

The VAD detection threshold.

silence_duration_ms integer

The duration of silence in milliseconds that triggers the detection of the end of speech.

idle_timeout_ms integer

The idle timeout in milliseconds. Returned only in server_vad mode with qwen3.5-omni-plus-realtime or qwen3.5-omni-flash-realtime models.

enable_search boolean (optional)

Whether to enable web search. Supported only by Qwen3.5-Omni-Realtime.

search_options object (optional)

The web search options.

tools array (optional)

A list of tool definitions. When you configure tools, the model can decide whether to call a tool based on the user's input.

Properties

type string (required)

This value is always function.

function.name string (required)

The name of the custom tool function. We recommend using the same name as the function, such as get_current_weather or get_current_time.

function.description string (optional)

A description of the tool function's purpose. The model uses this field to decide whether to use the tool function.

function.parameters object (optional)

A description of the tool function's input parameters. The model uses this field to extract the input parameters. If the tool function does not require input parameters, you do not need to specify this parameter.

Properties

type string (required)

This value is always object.

properties object (optional)

Describes the name, data type, and description of each input parameter. The key is the parameter name, and the value is an object that contains the data type (type) and description (description).

required array (optional)

Specifies which input parameters are required.

top_p float

The probability threshold for nucleus sampling.

top_k integer

The size of the candidate set for sampling during model generation.

max_tokens integer

The maximum number of tokens that the model can return for the request.

repetition_penalty float

Controls repetition in consecutive sequences during generation.

presence_penalty float

Controls repetition in generated content.

seed integer

The degree of consistency in the model's output per request.

input_audio_buffer.speech_started

In VAD mode, the server returns this event when it detects the start of speech in the audio buffer.

This event may be triggered each time you add audio to the buffer before speech is detected.

event_id string

A unique identifier for this event.

{
    "event_id": "event_Pvp8nEhsQuGCQbFJ9x58n",
    "type": "input_audio_buffer.speech_started",
    "audio_start_ms": 3647,
    "item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}

type string

The event type. This value is always input_audio_buffer.speech_started.

audio_start_ms integer

The time in milliseconds from when audio writing to the buffer starts until speech is first detected.

item_id string

The ID of the user message item that is created when the end of speech is detected.

User message items are used to append user input to the conversation history for subsequent model inference and generation.

input_audio_buffer.speech_stopped

In VAD mode, the server returns this event when it detects the end of speech in the audio buffer.

The server also returns a conversation.item.created event that creates the corresponding user message item.

event_id string

A unique identifier for this event.

{
    "event_id": "event_UhQiqNVRsgUiq4KUS5Xb5",
    "type": "input_audio_buffer.speech_stopped",
    "audio_end_ms": 4453,
    "item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}

type string

The event type. This value is always input_audio_buffer.speech_stopped.

audio_end_ms integer

The time in milliseconds from the start of the session until the end of speech is detected.

item_id string

The ID of the user message item that will be created.

input_audio_buffer.committed

The server returns this event when the input audio buffer is committed.

  • In VAD mode, the server automatically commits the audio buffer and returns this event when it detects the end of speech.

  • In Manual mode, the server returns this event after the client sends an input_audio_buffer.commit event.

event_id string

A unique identifier for this event.

{
    "event_id": "event_Iy6sUzL1nmdFgshFYxJEz",
    "type": "input_audio_buffer.committed",
    "item_id": "item_YbAiGvK2H7YaS34o4R6Ba"
}

type string

The event type. This value is always input_audio_buffer.committed.

item_id string

The ID of the user message item that will be created.

input_audio_buffer.cleared

The server returns this event after the client sends an input_audio_buffer.clear event.

event_id string

A unique identifier for this event.

{
  "event_id": "event_RoUu4T8yExPMI37GKwaOC",
  "type": "input_audio_buffer.cleared"
}

type string

The event type. This value is always input_audio_buffer.cleared.

conversation.item.created

The server returns this event when a conversation item is created.

event_id string

A unique identifier for this event.

{
    "event_id": "event_JEfkrr9gO3Ny7Xcv9bGVd",
    "type": "conversation.item.created",
    "item": {
        "id": "item_YbAiGvK2H7YaS34o4R6Ba",
        "object": "realtime.item",
        "type": "message",
        "status": "in_progress",
        "role": "assistant",
        "content": [
            {
                "type": "input_audio"
            }
        ]
    }
}
// Tool calling scenario
{
    "event_id": "event_S1hkaIQgcuQD8OEdOpGHQ",
    "type": "conversation.item.created",
    "item": {
        "id": "item_FEG9qJGNkPcdf4et3p7BV",
        "object": "realtime.item",
        "type": "function_call",
        "status": "in_progress",
        "call_id": "call_bc0a7fb7235840f69ecfe4",
        "name": "get_current_weather",
        "arguments": ""
    }
}

type string

The event type. This value is always conversation.item.created.

item object

The conversation item to add.

Properties

id string

The unique ID of the conversation item.

object string

This value is always realtime.item.

status string

The status of the conversation item.

role string

The role of the message.

content array

The content of the message. This parameter is returned when the type is message.

type string

The type of the conversation item. Valid values are message or function_call.

name string

The name of the function that is called when the type is function_call.

call_id string

When the type is function_call, this is the unique ID of the function invocation.

arguments string

When the type is function_call, this parameter contains the arguments for the function invocation as a JSON string.

conversation.item.input_audio_transcription.delta

When input audio transcription is enabled, this event is sent frequently while the user is speaking. It provides real-time intermediate transcription results. You can concatenate text + stash to get the most complete sentence preview at any point in time.

event_id string

A unique identifier for this event.

{
    "event_id": "event_C7jzoeSFuiwOZS6tR14yx",
    "type": "conversation.item.input_audio_transcription.delta",
    "item_id": "item_ThVYhLHOdeXb4bBSvzSFF",
    "content_index": 0,
    "text": "",
    "stash": "How is the weather today?",
    "language": "en",
    "emotion": "neutral",
    "obfuscation": "ABEXGYmxdmc97u"
}

To get the most complete sentence preview at any moment, concatenate these two fields: real-time preview = text + stash.

Click to view example

Suppose the user is saying: "The weather is nice today, sunny and warm."

The following shows the event stream you might receive and how to interpret them:

Time

User speech progress

API response (text and stash)

Client UI display (text + stash)

T1

"The weather..."

text: ""

stash: "The weather"

The weather

T2

"...is nice..."

text: ""

stash: "The weather is nice"

The weather is nice

T3

"...today,"

text: "The weather"

stash: " is nice today,"

The weather is nice today,

("The weather" has been confirmed and moved to text)

T4

(brief pause)

text: "The weather is nice today, "

stash: ""

The weather is nice today,

(first clause fully confirmed)

T5

"sunny..."

text: "The weather is nice today, "

stash: "sunny"

The weather is nice today, sunny

T6

"...and warm."

text: "The weather is nice today, "

stash: "sunny and warm."

The weather is nice today, sunny and warm.

T7

(stops speaking)

-

Use the transcript from conversation.item.input_audio_transcription.completed as the final result.

type string

The event type. This value is always conversation.item.input_audio_transcription.delta.

item_id string

The ID of the associated conversation item.

content_index integer

The index of the content part that contains the audio.

text string

The confirmed text prefix. This is the portion of the current sentence that the model has confirmed and will not change.

stash string

The preliminary text suffix. This is the temporary draft that follows the confirmed portion, which the model is still processing and may revise.

language string

The detected language of the recognized audio.

emotion string

The detected emotion of the recognized audio. Valid values: neutral, happy, sad, angry, surprised, disgusted, fearful.

conversation.item.input_audio_transcription.completed

Indicates that the user's audio has been transcribed. The transcription is performed by a built-in speech recognition model (qwen3-asr-flash-realtime). This parameter is not configurable.

The transcribed text from the speech recognition model may differ from the interpretation generated by the Qwen-Omni-Realtime model. The transcription is for reference only.

event_id string

A unique identifier for this event.

{
    "event_id": "event_FrrZcxiDfTB9LD9p4pVng",
    "type": "conversation.item.input_audio_transcription.completed",
    "item_id": "item_YbAiGvK2H7YaS34o4R6Ba",
    "content_index": 0,
    "transcript": "Hello."
}

type string

The event type. This value is always conversation.item.input_audio_transcription.completed.

item_id string

The ID of the user message item.

content_index integer

This value is always 0.

transcript string

The transcribed text.

conversation.item.input_audio_transcription.failed

The server returns this event when input audio transcription is enabled and the transcription fails. This event is independent of the error event and helps clients identify transcription failures.

event_id string

A unique identifier for this event.

{
  "type": "conversation.item.input_audio_transcription.failed",
  "item_id": "<item_id>",
  "content_index": 0,
  "error": {
    "code": "<code>",
    "message": "<message>",
    "param": "<param>"
  }
}

type string

The event type. This value is always conversation.item.input_audio_transcription.failed.

item_id string

The ID of the user message item.

content_index integer

This value is always 0.

error object

The error information.

Properties

code string

The error code.

message string

The error message.

param string

The parameter related to the error.

response.created

The server returns this event when it generates a new model response.

event_id string

A unique identifier for this event.

{
    "event_id": "event_XuDavMzQN3KKepqGu3KRh",
    "type": "response.created",
    "response": {
        "id": "resp_HaVOPdbmX6vifiV5pAfJY",
        "object": "realtime.response",
        "conversation_id": "conv_FjJaccpnvwHNo9cPVuzGc",
        "status": "in_progress",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Cherry",
        "output_audio_format": "pcm",
        "output": []
    }
}

type string

The event type. This value is always response.created.

response object

The response object.

Properties

id string

The unique ID of the response.

conversation_id string

The unique ID of the current session.

object string

The object type. For this event, this value is always realtime.response.

status string

The response status. Valid values are completed, failed, in_progress, or incomplete.

modalities array

The response modalities.

voice string

The voice for model-generated audio.

output array

This field is empty for this event.

response.done

The server returns this event after the response is completely generated. The response object includes all output items except for the raw audio data.

event_id string

A unique identifier for this event.

{
    "event_id": "event_CSaxRRYLvbrfexDXAEuDG",
    "type": "response.done",
    "response": {
        "id": "resp_HaVOPdbmX6vifiV5pAfJY",
        "object": "realtime.response",
        "conversation_id": "conv_FjJaccpnvwHNo9cPVuzGc",
        "status": "completed",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Cherry",
        "output_audio_format": "pcm",
        "output": [
            {
                "id": "item_Ls6MtCUWO7LM4E59QziNv",
                "object": "realtime.item",
                "type": "message",
                "status": "completed",
                "role": "assistant",
                "content": [
                    {
                        "type": "audio",
                        "transcript": "Hello! How can I help you?"
                    }
                ]
            }
        ],
        "usage": {
            "total_tokens": 377,
            "input_tokens": 336,
            "output_tokens": 41,
            "input_tokens_details": {
                "text_tokens": 228,
                "audio_tokens": 108
            },
            "output_tokens_details": {
                "text_tokens": 9,
                "audio_tokens": 32
            },
            "plugins": {
                "search": {
                    "count": 1,
                    "strategy": "agent"
                }
            }
        }
    }
}
// Tool calling scenario
{
    "event_id": "event_T1EFAJp43X2DWtDRmxTtx",
    "type": "response.done",
    "response": {
        "id": "resp_TucN5QgymL5MA8vkJvFlS",
        "object": "realtime.response",
        "conversation_id": "conv_SEDZESRlefT8WvLSmEn6E",
        "status": "completed",
        "modalities": ["text", "audio"],
        "voice": "Ethan",
        "output_audio_format": "pcm",
        "output": [
            {
                "id": "item_FEG9qJGNkPcdf4et3p7BV",
                "object": "realtime.item",
                "type": "function_call",
                "status": "completed",
                "call_id": "call_bc0a7fb7235840f69ecfe4",
                "name": "get_current_weather",
                "arguments": " {\"location\": \"Hangzhou\"}"
            }
        ],
        "usage": {
            "total_tokens": 567,
            "input_tokens": 524,
            "output_tokens": 43,
            "input_tokens_details": {
                "text_tokens": 487,
                "audio_tokens": 37
            },
            "output_tokens_details": {
                "text_tokens": 43
            }
        }
    }
}

type string

The event type. This value is always response.done.

response object

The response object.

Properties

id string

The unique ID of the response.

conversation_id string

The unique ID of the current session.

object string

The object type. For this event, this value is always realtime.response.

status string

The response status.

modalities array

The response modalities.

voice string

The voice for model-generated audio.

output object

The response output.

Properties

id string

The ID of the response output.

type string

The type of the output item. Valid values are message or function_call.

object string

The output item object type. This value is always realtime.item.

status string

The output item status.

role string

The output item role.

content array

The content of the output item. This field is returned only when the type is message.

Properties

type string

The content type. The value can be text for plain text output or audio for audio output.

text string

The text output.

transcript string

The text transcript of the audio.

name string

The name of the function that is invoked when the type is function_call.

call_id string

When the type is function_call, this is the unique ID of the function invocation.

arguments string

When the type is function_call, this field contains the full arguments for the function call as a JSON string.

usage object

Token usage details for this response.

Properties

total_tokens integer

The total number of tokens used in this response.

input_tokens integer

The number of input tokens.

output_tokens integer

The number of output tokens.

input_tokens_details object

Details about input token usage, including text_tokens and audio_tokens.

output_tokens_details object

Details about output token usage, including text_tokens and audio_tokens.

plugins object (optional)

Plugin usage metrics. This field is returned when web search (enable_search) is enabled.

Properties

search object

Search metering data.

Properties

count integer

The number of searches.

strategy string

The search strategy.

response.text.delta

The server returns this event when the output modality is text only and the model generates new text incrementally.

event_id string

A unique identifier for this event.

{
    "delta": "Hello",
    "event_id": "event_TH49MauuPmRo1RGaMSlP7",
    "type": "response.text.delta",
    "response_id": "resp_PrRSvPVpnCExdUOGHHLuP",
    "item_id": "item_L8IRm9kRXFpxoOjDqDC96",
    "output_index": 0,
    "content_index": 0
}

type string

The event type. This value is always response.text.delta.

delta string

The returned incremental text.

response_id string

The response ID.

item_id string

The message item ID. You can use this to reference the same message item.

output_index integer

The index of the output item in the response. This value is always 0.

content_index integer

The index of the internal part within the output item. This value is always 0.

response.text.done

When the output modality is text only and the model finishes generating text, the server returns this event.

The server also returns this event if the response is interrupted, incomplete, or canceled.

event_id string

A unique identifier for this event.

{
  "event_id": "event_B1lIeE2Nac33zn5V7h2mm",
  "type": "response.text.done",
  "response_id": "resp_B1lIdtjF4Noqpn5NOjznj",
  "item_id": "item_B1lIdJsAJlJiFs8ztWpJt",
  "output_index": 0,
  "content_index": 0,
  "text": "How can I assist you today?"
}

type string

The event type. This value is always response.text.done.

response_id string

The response ID.

item_id string

The message item ID.

output_index integer

The index of the output item in the response.

content_index integer

The index of the output item in the response.

text string

The full text generated by the model.

response.audio.delta

When the output modality includes audio and the model generates new audio data incrementally, the server returns this event.

event_id string

A unique identifier for this event.

{
  "event_id": "event_B1osWMZBtrEQbiIwW0qHQ",
  "type": "response.audio.delta",
  "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
  "item_id": "item_OFaPGtzfWCPyGzxnuEX9i",
  "output_index": 0,
  "content_index": 0,
  "delta": "{base64 audio}"
}

type string

The event type. This value is always response.audio.delta.

response_id string

The response ID.

item_id string

The message item ID.

output_index integer

The index of the output item in the response.

content_index integer

The index of the output item in the response.

delta string

The incremental audio data generated by the model, which is Base64-encoded.

response.audio.done

When the output modality includes audio and the model finishes generating audio data, the server returns this event.

The server also returns this event if the response is interrupted, incomplete, or canceled.

event_id string

A unique identifier for this event.

{
    "event_id": "event_Le1TDl7VfyHQxl47DtGxI",
    "type": "response.audio.done",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0
}

type string

The event type. This value is always response.audio.done.

response_id string

The response ID.

item_id string

The message item ID.

output_index integer

The index of the output item in the response.

content_index integer

The index of the output item in the response.

response.audio_transcript.delta

The server returns a response.audio_transcript.delta event when the output modality includes audio and the model generates new text for the audio incrementally.

event_id string

A unique identifier for this event.

{
    "event_id": "event_BksW7fOwnyavZdDxIzZYM",
    "type": "response.audio_transcript.delta",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0,
    "delta": "What"
}

type string

The event type. This value is always response.audio_transcript.delta.

response_id string

The response ID.

item_id string

The message item ID.

output_index integer

The index of the output item in the response.

content_index integer

The index of the output item in the response.

delta string

The incremental text.

response.audio_transcript.done

The server returns a response.audio_transcript.done event when the output modality includes audio and the model completes transcribing the audio.

event_id string

A unique identifier for this event.

{
    "event_id": "event_X49tL2WerT4WjxcmH16lS",
    "type": "response.audio_transcript.done",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0,
    "transcript": "Hello! How can I help you?"
}

type string

The event type. This value is always response.audio_transcript.done.

response_id string

The response ID.

item_id string

The message item ID.

output_index integer

The index of the output item in the response.

content_index integer

The index of the output item in the response.

transcript string

The full transcription text.

response.function_call_arguments.delta

When the model generates the argument string for a function call in a streaming manner, the server pushes this event for each new segment of content. You should concatenate the delta fields from each event in the order they are received to obtain the current argument text. The complete content is provided in the subsequent response.function_call_arguments.done event.

event_id string

A unique identifier for this event.

{
    "event_id": "event_SlKoJyEbPEqLq14DSM1u5",
    "type": "response.function_call_arguments.delta",
    "response_id": "resp_JnTOsWXlFhKcFohZbtfz6",
    "item_id": "item_Rhcms7CauTNsQprV5S4Hr",
    "output_index": 0,
    "call_id": "call_2be200f4cafe419b9530dd",
    "delta": " {\"location\": \"Beijing\"}"
}

type string

The event type. This value is always response.function_call_arguments.delta.

response_id string

The response ID.

item_id string

The message item ID.

output_index integer

The index of the output item in the response.

call_id string

The unique ID for this function invocation. This is consistent with the done event in the same turn.

delta string

The new segment of the argument string (delta). You must concatenate these segments in order.

response.function_call_arguments.done

Indicates that the function call arguments have been fully generated. The arguments field in this event contains the complete argument string. After you receive this event, you can parse the arguments and call the local tool function. You must use the complete arguments from this event, not the concatenated delta result.

event_id string

A unique identifier for this event.

{
    "event_id": "event_X6suLyuL5agdH7r6koesM",
    "type": "response.function_call_arguments.done",
    "response_id": "resp_JnTOsWXlFhKcFohZbtfz6",
    "item_id": "item_Rhcms7CauTNsQprV5S4Hr",
    "output_index": 0,
    "name": "get_current_weather",
    "call_id": "call_2be200f4cafe419b9530dd",
    "arguments": " {\"location\": \"Beijing\"}"
}

type string

The event type. This value is always response.function_call_arguments.done.

response_id string

The response ID.

item_id string

The message item ID.

output_index integer

The index of the output item in the response.

call_id string

The unique ID for this function invocation.

name string

The name of the function that was called.

arguments string

The complete arguments for the function invocation, typically represented as a JSON string.

response.output_item.added

The server returns this event when it creates a new item during response generation. The item type can be message or function_call.

event_id string

A unique identifier for this event.

{
    "event_id": "event_DsCO341DEVtiATtCB6BUY",
    "type": "response.output_item.added",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "output_index": 0,
    "item": {
        "id": "item_Ls6MtCUWO7LM4E59QziNv",
        "object": "realtime.item",
        "type": "message",
        "status": "in_progress",
        "role": "assistant",
        "content": []
    }
}
// Tool calling scenario
{
    "event_id": "event_HXmKt5pGoiRtXx7Hq7zpN",
    "type": "response.output_item.added",
    "response_id": "resp_TucN5QgymL5MA8vkJvFlS",
    "output_index": 0,
    "item": {
        "id": "item_FEG9qJGNkPcdf4et3p7BV",
        "object": "realtime.item",
        "type": "function_call",
        "status": "in_progress",
        "call_id": "call_bc0a7fb7235840f69ecfe4",
        "name": "get_current_weather",
        "arguments": ""
    }
}

type string

The event type. This value is always response.output_item.added.

response_id string

The response ID.

output_index integer

The index of the output item in the response.

item object

Information about the output item.

Properties

id string

The unique ID of the output item.

object string

This value is always realtime.item.

status string

The status of the output item.

role string

The role of the sender.

content array

The content of the message. This field is returned when the type is message.

type string

The type of the output item. Valid values are message or function_call.

name string

The name of the function to call when the type is function_call.

call_id string

The unique ID of the current function invocation when the type is function_call.

arguments string

The arguments of the function call as a JSON string when the type is function_call. In an added event, this field is initially an empty string.

response.output_item.done

The server returns this event when the generation of a new output item is complete.

event_id string

A unique identifier for this event.

{
    "event_id": "event_MEu5nlLw1LsOguHiehIP8",
    "type": "response.output_item.done",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "output_index": 0,
    "item": {
        "id": "item_Ls6MtCUWO7LM4E59QziNv",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "assistant",
        "content": [
            {
                "type": "audio",
                "text": "Hello! How can I help you?"
            }
        ]
    }
}
// Tool calling scenario
{
    "event_id": "event_FHspdfAnCyjuME3mmAwSY",
    "type": "response.output_item.done",
    "response_id": "resp_TucN5QgymL5MA8vkJvFlS",
    "output_index": 0,
    "item": {
        "id": "item_FEG9qJGNkPcdf4et3p7BV",
        "object": "realtime.item",
        "type": "function_call",
        "status": "completed",
        "call_id": "call_bc0a7fb7235840f69ecfe4",
        "name": "get_current_weather",
        "arguments": " {\"location\": \"Hangzhou\"}"
    }
}

type string

The event type. This value is always response.output_item.done.

response_id string

The response ID.

output_index integer

The index of the output item in the response.

item object

The output item information.

Properties

id string

The unique ID of the output item.

object string

This value is always realtime.item.

status string

The status of the output item.

role string

The role of the sender.

content array

The content of the message. This field is returned when the type is message.

type string

The type of the output item. Valid values are message or function_call.

name string

The name of the function that is called when the type is function_call.

call_id string

When the type is function_call, this is the unique ID for the function invocation.

arguments string

When the type is function_call, this field is a JSON string that contains the full arguments for the function call.

response.content_part.added

The server returns this event when it adds a new content part to an assistant message item during response generation.

event_id string

A unique identifier for this event.

{
    "event_id": "event_AVBOmrgY3C8bjlRajfSUT",
    "type": "response.content_part.added",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0,
    "part": {
        "type": "audio",
        "text": ""
    }
}

type string

The event type. This value is always response.content_part.added.

response_id string

The response ID.

item_id string

The message item ID.

output_index integer

The index of the output item in the response. This value is always 0.

content_index integer

The index of the internal part within the output item. This value is always 0.

part object

The output item information.

Properties

type string

The type of the content part.

text string

The text of the content part.

response.content_part.done

The server returns this event when the streaming of a content part within an assistant message item finishes.

event_id string

A unique identifier for this event.

{
    "event_id": "event_Il8HD19v58Qr5IBkw7LtN",
    "type": "response.content_part.done",
    "response_id": "resp_HaVOPdbmX6vifiV5pAfJY",
    "item_id": "item_Ls6MtCUWO7LM4E59QziNv",
    "output_index": 0,
    "content_index": 0,
    "part": {
        "type": "audio",
        "text": "Hello! How can I help you?"
    }
}

type string

The event type. This value is always response.content_part.done.

response_id string

The response ID.

item_id string

The message item ID.

output_index integer

The index of the output item in the response. This value is always 0.

content_index integer

The index of the content part in the content array. This value is always 0.

part object

The output item information.

Properties

type string

The type of the content part.

text string

The text of the content part.