This topic describes the server-side events for the qwen3-livetranslate-flash-realtime API.
Reference: Real-time audio and video translation - Qwen
error
An error message that the server returns.
event_id string The unique identifier for this event. | {
"event_id": "event_RoUu4T8yExPMI37GKwaOC",
"type": "error",
"error": {
"type": "invalid_request_error",
"code": "invalid_value",
"message": "Invalid modalities: ['audio']. Supported combinations are: ['text'] and ['audio', 'text'].",
"param": "session.modalities"
}
}
|
type string The event type. The value is always error. |
error object Detailed information about the error. Properties type string The error type. code string The error code. message string The error message. param string The parameter that is related to the error, such as session.modalities. |
session.created
After a client connects, the server returns this event first. This event contains the default configurations for the connection.
event_id string The unique identifier for this event. | {
"event_id": "event_QxBGpjBDmDDQQWDtrqBKB",
"type": "session.created",
"session": {
"id": "sess_OozZ1vtbPt2muDflHODIH",
"object": "realtime.session",
"model": "qwen3-livetranslate-flash-realtime",
"modalities": [
"text",
"audio"
],
"voice": "Cherry",
"input_audio_format": "pcm16",
"output_audio_format": "pcm24",
"translation": {
"language": "en"
}
}
}
|
type string The event type. The value is always session.created. |
session object The session configuration. Properties id string The unique identifier for the session. object string The value is always realtime.session. model string The model in use. modalities array The output modality settings for the model. voice string The voice for the audio generated by the model. input_audio_format string The format of the input audio. The value is always pcm16. output_audio_format string The format of the output audio. The value is always pcm24. translation object (Optional) The translation configuration. Properties translation string (Optional) The target language for translation. |
session.updated
After receiving a session.update request, the server returns this event if the request is successful. If an error occurs, the server returns an error event.
event_id string The unique identifier for this event. | {
"event_id": "event_QxBGpjBDmDDQQWDtrqBKB",
"type": "session.updated",
"session": {
"id": "sess_OozZ1vtbPt2muDflHODIH",
"object": "realtime.session",
"model": "qwen3-livetranslate-flash-realtime",
"modalities": [
"text",
"audio"
],
"voice": "Ethan",
"input_audio_format": "pcm16",
"output_audio_format": "pcm24",
"translation": {
"language": "en"
}
}
}
|
type string The event type. The value is always session.updated. |
session object The session configuration. Properties id string The unique identifier for the session. object string The value is always realtime.session. model string The model in use. modalities array The output modality settings for the model. voice string The voice for the audio generated by the model. input_audio_format string The format of the input audio. The value is always pcm16. output_audio_format string The format of the output audio. The value is always pcm24. translation object (Optional) The translation configuration. Properties translation string (Optional) The target language for translation. |
session.finished
This event indicates that the session is finished and all audio translations in the current session are complete.
This event is sent only after the client sends a Client events request. After receiving this event, the client can disconnect.
event_id string The unique identifier for this event. | {
"event_id": "event_xxx",
"type": "session.finished"
}
|
type string The event type. The value is always session.finished. |
response.created
When the server generates a new model response, it returns this event.
event_id string The unique identifier for this event. | {
"event_id": "event_L8hHVI5jYis6BzAjnPWJh",
"type": "response.created",
"response": {
"id": "resp_P79OOMs8LnrXVpiIHUCKR",
"object": "realtime.response",
"conversation_id": "conv_UFClXtYkRkFXrs48y8pmK",
"status": "in_progress",
"modalities": [
"text",
"audio"
],
"voice": "Cherry",
"output_audio_format": "pcm24",
"output": []
}
}
|
type string The event type. The value is always response.created. |
response object The response object. Properties id string The unique identifier for the response. conversation_id string The unique identifier for the current session. object string The object type. For this event, the value is always realtime.response. status string The response status. Valid values: completed
failed
in_progress
incomplete
modalities array Response modality. voice string The voice of the generated audio. output_audio_format string The format of the output audio. The value is fixed to pcm24. output string This event is currently empty. |
response.done
The server returns this event after the response is generated. The response object in the event contains all output items except for the raw audio data.
event_id string The unique identifier for this event. | {
"event_id": "event_CNea8oXNipVanSg2VIzkO",
"type": "response.done",
"response": {
"id": "resp_TfhYTqej692vsGA2jNEtH",
"object": "realtime.response",
"conversation_id": "conv_ZtyLfKVm8XqLwYRlsuDih",
"status": "completed",
"modalities": [
"text",
"audio"
],
"voice": "Cherry",
"output_audio_format": "pcm24",
"output": [
{
"id": "item_MKtkMwN9RtcyE9eJShyWy",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "audio",
"transcript": "Hello? "
}
]
}
],
"usage": {
"total_tokens": 56,
"input_tokens": 47,
"output_tokens": 9,
"input_tokens_details": {
"text_tokens": 20,
"audio_tokens": 27
},
"output_tokens_details": {
"text_tokens": 2,
"audio_tokens": 7
}
}
}
}
|
type string The event type. The value is always response.done. |
response object The response object. Properties id string The unique identifier for the response. conversation_id string The unique identifier for the current session. object string The object type. For this event, the value is always realtime.response. status string The status of the response. modalities array The modality of the response. voice string The voice used for the audio generated by the model. output_audio_format string The format of the output audio. The value is always pcm24. output object The output of the response. Properties id string The unique identifier for the response output. type string The type of the output item. The value is currently always message. object string The object type of the output item. The value is currently always realtime.item. status string The status of the output item. role string The role of the output item. content array The content of the output item. Properties type string The type of the output content. The value is text for plain text output and audio when the output includes audio. text string The text content of the output. transcript string The text transcription of the audio content. usage object The token consumption information for this response. |
response.text.text
The server returns this event when the output modality is text-only and the model generates text incrementally.
event_id string A unique identifier for the event. | {
"event_id": "event_B1lIeyOXR7qJMEExbqtTG",
"type": "response.text.text",
"response_id": "resp_B1lIdtjF4Noqpn5NOjznj",
"item_id": "item_B1lIdJsAJlJiFs8ztWpJt",
"output_index": 0,
"content_index": 0,
"text": "How are"
}
|
type string The type of the event. The value is always response.text.text. |
text string The incremental text that is returned. |
response_id string The response ID. |
item_id string A unique identifier for the message item. |
output_index integer Currently, the value is always 0. |
content_index integer Currently, the value is always 0. |
response.text.done
The server returns this event when the model finishes generating text for a text-only output.
The server also returns this event if the response is interrupted, incomplete, or canceled.
event_id string The unique identifier for this event. | {
"event_id": "event_B1lIeE2Nac33zn5V7h2mm",
"type": "response.text.done",
"response_id": "resp_B1lIdtjF4Noqpn5NOjznj",
"item_id": "item_B1lIdJsAJlJiFs8ztWpJt",
"output_index": 0,
"content_index": 0,
"text": "How can I assist you today?"
}
|
type string The event type. The value is always response.text.done. |
response_id string The unique identifier for the response. |
item_id string The unique identifier for the message item. |
output_indexinteger The value is currently always 0. |
content_indexinteger The value is currently always 0. |
text string The complete text output from the model. |
response.audio.delta
The server returns this event when the output modality includes audio and the model incrementally generates audio data.
event_id string A unique identifier for the event. | {
"event_id": "event_B1osWMZBtrEQbiIwW0qHQ",
"type": "response.audio.delta",
"response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
"item_id": "item_OFaPGtzfWCPyGzxnuEX9i",
"output_index": 0,
"content_index": 0,
"delta": "UklGRnoGAABXQVZFZm10IBAAAAAB..."
}
|
type string The event type. The value is always response.audio.delta. |
response_id string A unique identifier for the response. |
item_id string A unique identifier for the message item. |
output_index integer The value is always 0. |
content_index integer The value is always 0. |
delta string The incremental audio data that is output by the model. The data is Base64-encoded. |
response.audio.done
If the output modality includes audio, the server returns this event once audio generation is complete.
The server also returns this event if the response is interrupted, incomplete, or canceled.
This event does not contain the complete audio data.
event_id string The unique identifier for this event. | {
"event_id": "event_B1osWMWoDRYyITDyNYcBu",
"type": "response.audio.done",
"response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
"item_id": "item_OFaPGtzfWCPyGzxnuEX9i",
"output_index": 0,
"content_index": 0
}
|
type string The event type. This is always response.audio.done. |
response_id string The unique identifier for the response. |
item_id string The unique identifier for the message item. |
output_indexinteger The value is always 0. |
content_indexinteger The value is always 0. |
conversation.item.input_audio_transcription.text
When the input_audio_transcription.model parameter is configured, the server streams the speech recognition results of the input audio as text in the original source language.
event_id string The unique identifier for this event. | {
"event_id": "event_xxx",
"type": "conversation.item.input_audio_transcription.text",
"item_id": "item_xxx",
"content_index": 0,
"text": "",
"stash": "The weather is really nice today",
"language": "zh"
}
|
type string The event type. The value is always conversation.item.input_audio_transcription.text. |
item_id string The unique identifier for the message item. |
content_index integer The value is currently always 0. |
text string The confirmed recognition text. |
stash string The recognition text that is pending confirmation. This text may be corrected by subsequent events. |
language string The detected source language. |
conversation.item.input_audio_transcription.completed
When the input_audio_transcription.model parameter is configured, the server returns this event after speech recognition is complete. It contains the final, complete recognition result.
event_id string The unique identifier for this event. | {
"event_id": "event_xxx",
"type": "conversation.item.input_audio_transcription.completed",
"item_id": "item_xxx",
"content_index": 0,
"transcript": "The weather is really nice today, let's go for a walk in the park.",
"language": "zh"
}
|
type string The event type. This is always conversation.item.input_audio_transcription.completed. |
item_id string The unique identifier for the message item. |
content_index integer This is currently always 0. |
transcript string The complete speech recognition result in the original source language. |
language string The detected source language. |
response.audio_transcript.text
If the output modality includes audio, the server returns this event to display the real-time translation.
event_id string The unique identifier for this event. | {
"event_id": "event_xxx",
"type": "response.audio_transcript.text",
"response_id": "resp_xxx",
"item_id": "item_xxx",
"output_index": 0,
"content_index": 0,
"text": "Hello,",
"stash": " who are you?"
}
|
type string The type of the event. The value is always response.audio_transcript.text. |
response_id string The unique identifier for the response. |
item_id string The unique identifier for the message item. |
output_index integer Currently, the value is always 0. |
content_index integer Currently, the value is always 0. |
text string The confirmed translation text segment. |
stash string The temporary text from the initial translation. It is concatenated with the current text to form a temporary translation result. The system continuously updates text and stash through response.audio_transcript.text events until a response.audio_transcript.done event is received. At that point, you can retrieve the complete final translated text from the transcript field. | |
response.audio_transcript.done
The server returns this event when the output modality includes audio and the model finishes generating text.
event_id string The unique identifier for this event. | {
"event_id": "event_VN4Q4GJugLcc1S23viW8E",
"type": "response.audio_transcript.done",
"response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
"item_id": "item_JvJauNH2CTXb1D9WV6pD4",
"output_index": 0,
"content_index": 0,
"transcript": "How can I assist you today?"
}
|
type string The event type. This is always response.audio_transcript.done. |
response_id string The unique identifier for the response. |
item_id string The unique identifier for the message item. |
output_index integer This is currently always 0. |
content_index integer This is currently always 0. |
transcript string The complete text. |
response.output_item.added
The server returns this event when a new output item is created while generating a response.
event_id string The unique identifier for this event. | {
"event_id": "event_B4O5yPt3Gjnjy5eYH3plG",
"type": "response.output_item.added",
"response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
"output_index": 0,
"item": {
"id": "item_OFaPGtzfWCPyGzxnuEX9i",
"object": "realtime.item",
"type": "message",
"status": "in_progress",
"role": "assistant",
"content": []
}
}
|
type string The event type. The value is always response.output_item.added. |
response_id string The unique identifier for the response. |
output_index integer The value is currently always 0. |
item object Information about the output item. Properties id string The unique identifier for the output item. type string The value is always message. object string The value is always realtime.item. status string The status of the output item. role string The role of the message. content string The content of the message. |
response.output_item.done
The server sends this event when a new item has been completely output.
event_id string The unique identifier for this event. | {
"event_id": "event_XkiwbYTBC9Wcdwy6uYJ2G",
"type": "response.output_item.done",
"response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
"output_index": 0,
"item": {
"id": "item_JvJauNH2CTXb1D9WV6pD4",
"object": "realtime.item",
"type": "message",
"status": "completed",
"role": "assistant",
"content": [
{
"type": "audio",
"text": "Hello, I am a large language model developed by Alibaba Cloud. My name is Qwen. How can I help you?"
}
]
}
}
|
type string The event type. The value is always response.output_item.done. |
response_id string The unique identifier for the response. |
output_indexinteger The value is currently always 0. |
itemobject Information about the output item. Properties id string The unique identifier for the output item. object string The value is always realtime.item. type string The value is always message. status string The status of the output item. role string The role of the message sender. content string The content of the message. |
response.content_part.added
This event is returned by the server when a new content part is output.
event_id string The unique ID of the event. | {
"event_id": "event_J2UixwYKZsXg7c9YXZetL",
"type": "response.content_part.added",
"response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
"item_id": "item_OFaPGtzfWCPyGzxnuEX9i",
"output_index": 0,
"content_index": 0,
"part": {
"type": "audio",
"text": ""
}
}
|
type string The type of the event. The value is always response.content_part.added. |
response_id string The unique ID of the response. |
item_id string The unique ID of the message item. |
output_index integer The value is always 0. |
content_index integer The value is always 0. |
part object Outputs item information. Properties type string The type of the content part. text string The text of the content part. |
response.content_part.done
The server returns this event after a new content part is completely output.
event_id string The unique identifier for this event. | {
"event_id": "event_VN4Q4GJugLcc1S23viW8E",
"type": "response.content_part.done",
"response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
"item_id": "item_JvJauNH2CTXb1D9WV6pD4",
"output_index": 0,
"content_index": 0,
"part": {
"type": "audio",
"text": "Hello, I am a large language model developed by Alibaba Cloud. My name is Qwen. How can I help you?"
}
}
|
type string The event type. This is always response.content_part.done. |
response_id string The unique identifier for the response. |
item_id string The unique identifier for the message item. |
output_index integer The value is always 0. |
content_index integer The value is always 0. |
part object Information about the content part. Properties type string The type of the content part. text string The text of the content part. |