This document describes the events that the server sends to the client during a WebSocket session with the Qwen real-time speech recognition API.
error
This event is sent to the client when the server detects an error. The error can be a client-side or server-side error.
Parameter | Type | Description | type | string | The event type. Fixed to error. | event_id | string | The event ID. | error.type | string | The error type. | error.code | string | The error code. | error.message | string | The specific error message. For solutions, see Error messages. | error.param | string | The parameter related to the error. | error.event_id | string | The event ID related to the error. |
| {
"event_id": "event_B2uoU7VOt1AAITsPRPH9n",
"type": "error",
"error": {
"type": "invalid_request_error",
"code": "invalid_value",
"message": "Invalid value: 'whisper-1xx'. Supported values are: 'whisper-1'.",
"param": "session.input_audio_transcription.model",
"event_id": "event_123"
}
}
|
session.created
This is the first event that the server sends after a client successfully connects. It contains the default configurations that the server sets for the session.
Parameter | Type | Description | type | string | The event type. Fixed to session.created. | event_id | string | The event ID. | session.id | string | The ID of the current WebSocket session. | session.object | string | Fixed to realtime.session. | session.model | string | The model name. | session.modalities | array[string] | The output modality of the model. Fixed to ["text"]. | session.input_audio_format | string | The input audio format. | session.input_audio_transcription | object | Configuration parameters for speech recognition. For more information, see the input_audio_transcription parameter of the client's session.update event. | session.turn_detection | object | The Voice Activity Detection (VAD) configuration. | session.turn_detection.type | string | Fixed to server_vad. | session.turn_detection.threshold | float | The VAD detection threshold. | session.turn_detection.silence_duration_ms | integer | The VAD sentence-break detection threshold in milliseconds (ms). |
| {
"event_id": "event_1234",
"type": "session.created",
"session": {
"id": "sess_001",
"object": "realtime.session",
"model": "qwen3-asr-flash-realtime",
"modalities": ["text"],
"input_audio_format": "pcm16",
"input_audio_transcription": null,
"turn_detection": {
"type": "server_vad",
"threshold": 0.5,
"silence_duration_ms": 200
}
}
}
|
session.updated
The server sends this event after it successfully processes a session.update event from the client. If an error occurs during processing, the server sends an error event instead.
Parameter | Type | Description | type | string | The event type. The value is session.updated. |
For descriptions of the other parameters, see session.created. | {
"event_id": "event_1234",
"type": "session.updated",
"session": {
"id": "sess_001",
"object": "realtime.session",
"model": "gpt-4o-realtime-preview-2024-12-17",
"modalities": ["text"],
"input_audio_format": "pcm16",
"input_audio_transcription": null,
"turn_detection": {
"type": "server_vad",
"threshold": 0.5,
"silence_duration_ms": 200
}
}
}
|
input_audio_buffer.speech_started
This event is sent only in VAD mode. The server sends it when it detects the start of speech in the audio buffer.
This event can occur each time audio is added to the buffer, unless the start of speech has already been detected.
Parameter | Type | Description | type | string | The event type. Fixed to input_audio_buffer.speech_started. | event_id | string | The event ID. | audio_start_ms | integer | The time in milliseconds from when audio started writing to the buffer until speech was first detected during the session. | item_id | string | The ID of the user message item that will be created. |
| {
"event_id": "event_B1lV7FPbgTv9qGxPI1tH4",
"type": "input_audio_buffer.speech_started",
"audio_start_ms": 64,
"item_id": "item_B1lV7jWLscp4mMV8hSs8c"
}
|
input_audio_buffer.speech_stopped
This event is sent only in VAD mode. The server sends it when it detects the end of speech in the audio buffer.
After this event is triggered, the server immediately sends a conversation.item.created event, which contains the user message item created from the audio buffer.
Parameter | Type | Description | type | string | The event type. Fixed to input_audio_buffer.speech_stopped. | event_id | string | The event ID. | audio_end_ms | integer | The elapsed time in milliseconds from the start of the session to when speech stopped. | item_id | string | The ID of the user message item that is created when speech stops. |
| {
"event_id": "event_B3GGEYh2orwNIdhUagZPz",
"type": "input_audio_buffer.speech_stopped",
"audio_end_ms": 28128,
"item_id": "item_B3GGE8ry4yqbqJGzrVhEM"
}
|
input_audio_buffer.committed
Parameter | Type | Description | type | string | The event type. Fixed to input_audio_buffer.committed. | event_id | string | The event ID. | previous_item_id | string | The ID of the previous conversation item. | item_id | string | The ID of the user conversation item to be created. |
| {
"event_id": "event_1121",
"type": "input_audio_buffer.committed",
"previous_item_id": "msg_001",
"item_id": "msg_002"
}
|
conversation.item.created
The server sends this event when a new conversation item is created.
Parameter | Type | Description | type | string | The type of the event. Fixed to conversation.item.created. | event_id | string | The event ID. | previous_item_id | string | The ID of the previous conversation item. | item | object | The item to add to the conversation. | item.id | string | The unique ID of the conversation item. | item.object | string | Fixed to realtime.item. | item.type | string | Fixed to message. | item.status | string | The status of the conversation item. | item.role | string | The role of the message sender. | item.content | array[object] | The content of the message. | item.content.type | string | Fixed to input_audio. | item.content.transcript | string | Fixed to null. The complete recognition result is provided in the conversation.item.input_audio_transcription.completed event. |
| {
"type": "conversation.item.created",
"event_id": "event_B3GGKbCfBZTpqFHZ0P8vg",
"previous_item_id": "item_B3GGE8ry4yqbqJGzrVhEM",
"item": {
"id": "item_B3GGEPlolCqdMiVbYIf5L",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "user",
"content": [
{
"type": "input_audio",
"transcript": null
}
]
}
}
|
conversation.item.input_audio_transcription.text
This event is sent frequently to provide real-time recognition results.
Parameter | Type | Description | type | string | The type of the event. Fixed to conversation.item.input_audio_transcription.text. | event_id | string | The ID of the event. | item_id | string | The ID of the associated conversation item. | content_index | integer | The index of the content part that contains the audio. | text | string | The final and confirmed recognition result. This value will not change. | stash | string | The temporary recognition result. This is an intermediate value that might be corrected in subsequent events. |
| {
"event_id": "event_R7Pfu8QVBfP5HmpcbEFSd",
"type": "conversation.item.input_audio_transcription.text",
"item_id": "item_MpJQPNQzqVRc9aC9zMwSj",
"content_index": 0,
"text": "",
"stash": "Beijing's"
}
|
conversation.item.input_audio_transcription.completed
This event sends the final recognition result to the client. It marks the end of a conversation item.
Parameter | Type | Description | type | string | The event type. Fixed to conversation.item.input_audio_transcription.completed. | event_id | string | The event ID. | item_id | string | The ID of the associated conversation item. | content_index | integer | The index of the content part that contains the audio. | transcript | string | The transcription result. |
| {
"event_id": "event_B3GGEjPT2sLzjBM74W6kB",
"type": "conversation.item.input_audio_transcription.completed",
"item_id": "item_B3GGC53jGOuIFcjZkmEQ9",
"content_index": 0,
"transcript": "What's the weather like today?"
}
|
conversation.item.input_audio_transcription.failed
The server sends this event if recognition fails for the input audio. This event is handled separately from other error events to help the client identify the specific item that failed.
Parameter | Type | Description | type | string | The event type. Fixed to conversation.item.input_audio_transcription.failed. | item_id | string | The ID of the associated conversation item. | content_index | integer | The index of the content part that contains the audio. | error.code | string | The error code. | error.message | string | The error message. | error.param | string | The parameter related to the error. |
| {
"type": "conversation.item.input_audio_transcription.failed",
"item_id": "<item_id>",
"content_index": 0,
"error": {
"code": "<code>",
"message": "<message>",
"param": "<param>"
}
}
|