Server events - Alibaba Cloud Model Studio - Alibaba Cloud Documentation Center

This topic describes the server-side events for the qwen3-livetranslate-flash-realtime API.

Reference: Real-time audio and video translation - Qwen

error

An error message that the server returns.

event_id string

The unique identifier for this event.

{
  "event_id": "event_RoUu4T8yExPMI37GKwaOC",
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "code": "invalid_value",
    "message": "Invalid modalities: ['audio']. Supported combinations are: ['text'] and ['audio', 'text'].",
    "param": "session.modalities"
  }
}

type string

The event type. The value is always error.

error object

Detailed information about the error.

Properties

type string

The error type.

code string

The error code.

message string

The error message.

param string

The parameter that is related to the error, such as session.modalities.

session.created

After a client connects, the server returns this event first. This event contains the default configurations for the connection.

event_id string

The unique identifier for this event.

{
    "event_id": "event_QxBGpjBDmDDQQWDtrqBKB",
    "type": "session.created",
    "session": {
        "id": "sess_OozZ1vtbPt2muDflHODIH",
        "object": "realtime.session",
        "model": "qwen3-livetranslate-flash-realtime",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Cherry",
        "input_audio_format": "pcm16",
        "output_audio_format": "pcm24",
        "translation": {
           "language": "en"
        }
    }
}

type string

The event type. The value is always session.created.

session object

The session configuration.

Properties

id string

The unique identifier for the session.

object string

The value is always realtime.session.

model string

The model in use.

modalities array

The output modality settings for the model.

voice string

The voice for the audio generated by the model.

input_audio_format string

The format of the input audio. The value is always pcm16.

output_audio_format string

The format of the output audio. The value is always pcm24.

translation object (Optional)

The translation configuration.

Properties

translation string (Optional)

The target language for translation.

session.updated

After receiving a session.update request, the server returns this event if the request is successful. If an error occurs, the server returns an error event.

event_id string

The unique identifier for this event.

{
    "event_id": "event_QxBGpjBDmDDQQWDtrqBKB",
    "type": "session.updated",
    "session": {
        "id": "sess_OozZ1vtbPt2muDflHODIH",
        "object": "realtime.session",
        "model": "qwen3-livetranslate-flash-realtime",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Ethan",
        "input_audio_format": "pcm16",
        "output_audio_format": "pcm24",
        "translation": {
           "language": "en"
        }
    }
}

type string

The event type. The value is always session.updated.

session object

The session configuration.

Properties

id string

The unique identifier for the session.

object string

The value is always realtime.session.

model string

The model in use.

modalities array

The output modality settings for the model.

voice string

The voice for the audio generated by the model.

input_audio_format string

The format of the input audio. The value is always pcm16.

output_audio_format string

The format of the output audio. The value is always pcm24.

translation object (Optional)

The translation configuration.

Properties

translation string (Optional)

The target language for translation.

session.finished

This event indicates that the session is finished and all audio translations in the current session are complete.

This event is sent only after the client sends a Client events request. After receiving this event, the client can disconnect.

event_id string

The unique identifier for this event.

{
    "event_id": "event_xxx",
    "type": "session.finished"
}

type string

The event type. The value is always session.finished.

response.created

When the server generates a new model response, it returns this event.

event_id string

The unique identifier for this event.

{
    "event_id": "event_L8hHVI5jYis6BzAjnPWJh",
    "type": "response.created",
    "response": {
        "id": "resp_P79OOMs8LnrXVpiIHUCKR",
        "object": "realtime.response",
        "conversation_id": "conv_UFClXtYkRkFXrs48y8pmK",
        "status": "in_progress",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Cherry",
        "output_audio_format": "pcm24",
        "output": []
    }
}

type string

The event type. The value is always response.created.

response object

The response object.

Properties

id string

The unique identifier for the response.

conversation_id string

The unique identifier for the current session.

object string

The object type. For this event, the value is always realtime.response.

status string

The response status. Valid values:

completed
failed
in_progress
incomplete

modalities array

Response modality.

voice string

The voice of the generated audio.

output_audio_format string

The format of the output audio. The value is fixed to pcm24.

output string

This event is currently empty.

response.done

The server returns this event after the response is generated. The response object in the event contains all output items except for the raw audio data.

event_id string

The unique identifier for this event.

{
  "event_id": "event_CNea8oXNipVanSg2VIzkO",
  "type": "response.done",
  "response": {
    "id": "resp_TfhYTqej692vsGA2jNEtH",
    "object": "realtime.response",
    "conversation_id": "conv_ZtyLfKVm8XqLwYRlsuDih",
    "status": "completed",
    "modalities": [
      "text",
      "audio"
    ],
    "voice": "Cherry",
    "output_audio_format": "pcm24",
    "output": [
      {
        "id": "item_MKtkMwN9RtcyE9eJShyWy",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "assistant",
        "content": [
          {
            "type": "audio",
            "transcript": "Hello? "
          }
        ]
      }
    ],
    "usage": {
      "total_tokens": 56,
      "input_tokens": 47,
      "output_tokens": 9,
      "input_tokens_details": {
        "text_tokens": 20,
        "audio_tokens": 27
      },
      "output_tokens_details": {
        "text_tokens": 2,
        "audio_tokens": 7
      }
    }
  }
}

type string

The event type. The value is always response.done.

response object

The response object.

Properties

id string

The unique identifier for the response.

conversation_id string

The unique identifier for the current session.

object string

The object type. For this event, the value is always realtime.response.

status string

The status of the response.

modalities array

The modality of the response.

voice string

The voice used for the audio generated by the model.

output_audio_format string

The format of the output audio. The value is always pcm24.

output object

The output of the response.

Properties

id string

The unique identifier for the response output.

type string

The type of the output item. The value is currently always message.

object string

The object type of the output item. The value is currently always realtime.item.

status string

The status of the output item.

role string

The role of the output item.

content array

The content of the output item.

Properties

type string

The type of the output content. The value is text for plain text output and audio when the output includes audio.

text string

The text content of the output.

transcript string

The text transcription of the audio content.

usage object

The token consumption information for this response.

response.text.text

The server returns this event when the output modality is text-only and the model generates text incrementally.

event_id string

A unique identifier for the event.

{
    "event_id": "event_B1lIeyOXR7qJMEExbqtTG",
    "type": "response.text.text",
    "response_id": "resp_B1lIdtjF4Noqpn5NOjznj",
    "item_id": "item_B1lIdJsAJlJiFs8ztWpJt",
    "output_index": 0,
    "content_index": 0,
    "text": "How are"
}

type string

The type of the event. The value is always response.text.text.

text string

The incremental text that is returned.

response_id string

The response ID.

item_id string

A unique identifier for the message item.

output_index integer

Currently, the value is always 0.

content_index integer

Currently, the value is always 0.

response.text.done

The server returns this event when the model finishes generating text for a text-only output.

The server also returns this event if the response is interrupted, incomplete, or canceled.

event_id string

The unique identifier for this event.

{
    "event_id": "event_B1lIeE2Nac33zn5V7h2mm",
    "type": "response.text.done",
    "response_id": "resp_B1lIdtjF4Noqpn5NOjznj",
    "item_id": "item_B1lIdJsAJlJiFs8ztWpJt",
    "output_index": 0,
    "content_index": 0,
    "text": "How can I assist you today?"
}

type string

The event type. The value is always response.text.done.

response_id string

The unique identifier for the response.

item_id string

The unique identifier for the message item.

output_indexinteger

The value is currently always 0.

content_indexinteger

The value is currently always 0.

text string

The complete text output from the model.

response.audio.delta

The server returns this event when the output modality includes audio and the model incrementally generates audio data.

event_id string

A unique identifier for the event.

{
    "event_id": "event_B1osWMZBtrEQbiIwW0qHQ",
    "type": "response.audio.delta",
    "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
    "item_id": "item_OFaPGtzfWCPyGzxnuEX9i",
    "output_index": 0,
    "content_index": 0,
    "delta": "UklGRnoGAABXQVZFZm10IBAAAAAB..."
}

type string

The event type. The value is always response.audio.delta.

response_id string

A unique identifier for the response.

item_id string

A unique identifier for the message item.

output_index integer

The value is always 0.

content_index integer

The value is always 0.

delta string

The incremental audio data that is output by the model. The data is Base64-encoded.

response.audio.done

If the output modality includes audio, the server returns this event once audio generation is complete.

The server also returns this event if the response is interrupted, incomplete, or canceled.

This event does not contain the complete audio data.

event_id string

The unique identifier for this event.

{
    "event_id": "event_B1osWMWoDRYyITDyNYcBu",
    "type": "response.audio.done",
    "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
    "item_id": "item_OFaPGtzfWCPyGzxnuEX9i",
    "output_index": 0,
    "content_index": 0
}

type string

The event type. This is always response.audio.done.

response_id string

The unique identifier for the response.

item_id string

The unique identifier for the message item.

output_indexinteger

The value is always 0.

content_indexinteger

The value is always 0.

conversation.item.input_audio_transcription.text

When the input_audio_transcription.model parameter is configured, the server streams the speech recognition results of the input audio as text in the original source language.

event_id string

The unique identifier for this event.

{
    "event_id": "event_xxx",
    "type": "conversation.item.input_audio_transcription.text",
    "item_id": "item_xxx",
    "content_index": 0,
    "text": "",
    "stash": "The weather is really nice today",
    "language": "zh"
}

type string

The event type. The value is always conversation.item.input_audio_transcription.text.

item_id string

The unique identifier for the message item.

content_index integer

The value is currently always 0.

text string

The confirmed recognition text.

stash string

The recognition text that is pending confirmation. This text may be corrected by subsequent events.

language string

The detected source language.

conversation.item.input_audio_transcription.completed

When the input_audio_transcription.model parameter is configured, the server returns this event after speech recognition is complete. It contains the final, complete recognition result.

event_id string

The unique identifier for this event.

{
    "event_id": "event_xxx",
    "type": "conversation.item.input_audio_transcription.completed",
    "item_id": "item_xxx",
    "content_index": 0,
    "transcript": "The weather is really nice today, let's go for a walk in the park.",
    "language": "zh"
}

type string

The event type. This is always conversation.item.input_audio_transcription.completed.

item_id string

The unique identifier for the message item.

content_index integer

This is currently always 0.

transcript string

The complete speech recognition result in the original source language.

language string

The detected source language.

response.audio_transcript.text

If the output modality includes audio, the server returns this event to display the real-time translation.

event_id string

The unique identifier for this event.

{
  "event_id": "event_xxx",
  "type": "response.audio_transcript.text",
  "response_id": "resp_xxx",
  "item_id": "item_xxx",
  "output_index": 0,
  "content_index": 0,
  "text": "Hello,",
  "stash": " who are you?"
}

type string

The type of the event. The value is always response.audio_transcript.text.

response_id string

The unique identifier for the response.

item_id string

The unique identifier for the message item.

output_index integer

Currently, the value is always 0.

content_index integer

Currently, the value is always 0.

text string

The confirmed translation text segment.

stash string

The temporary text from the initial translation. It is concatenated with the current text to form a temporary translation result. The system continuously updates text and stash through response.audio_transcript.text events until a response.audio_transcript.done event is received. At that point, you can retrieve the complete final translated text from the transcript field.

response.audio_transcript.done

The server returns this event when the output modality includes audio and the model finishes generating text.

event_id string

The unique identifier for this event.

{
    "event_id": "event_VN4Q4GJugLcc1S23viW8E",
    "type": "response.audio_transcript.done",
    "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
    "item_id": "item_JvJauNH2CTXb1D9WV6pD4",
    "output_index": 0,
    "content_index": 0,
    "transcript": "How can I assist you today?"
}

type string

The event type. This is always response.audio_transcript.done.

response_id string

The unique identifier for the response.

item_id string

The unique identifier for the message item.

output_index integer

This is currently always 0.

content_index integer

This is currently always 0.

transcript string

The complete text.

response.output_item.added

The server returns this event when a new output item is created while generating a response.

event_id string

The unique identifier for this event.

{
    "event_id": "event_B4O5yPt3Gjnjy5eYH3plG",
    "type": "response.output_item.added",
    "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
    "output_index": 0,
    "item": {
        "id": "item_OFaPGtzfWCPyGzxnuEX9i",
        "object": "realtime.item",
        "type": "message",
        "status": "in_progress",
        "role": "assistant",
        "content": []
    }
}

type string

The event type. The value is always response.output_item.added.

response_id string

The unique identifier for the response.

output_index integer

The value is currently always 0.

item object

Information about the output item.

Properties

id string

The unique identifier for the output item.

type string

The value is always message.

object string

The value is always realtime.item.

status string

The status of the output item.

role string

The role of the message.

content string

The content of the message.

response.output_item.done

The server sends this event when a new item has been completely output.

event_id string

The unique identifier for this event.

{
    "event_id": "event_XkiwbYTBC9Wcdwy6uYJ2G",
    "type": "response.output_item.done",
    "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
    "output_index": 0,
    "item": {
        "id": "item_JvJauNH2CTXb1D9WV6pD4",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "assistant",
        "content": [
            {
                "type": "audio",
                "text": "Hello, I am a large language model developed by Alibaba Cloud. My name is Qwen. How can I help you?"
            }
        ]
    }
}

type string

The event type. The value is always response.output_item.done.

response_id string

The unique identifier for the response.

output_indexinteger

The value is currently always 0.

itemobject

Information about the output item.

Properties

id string

The unique identifier for the output item.

object string

The value is always realtime.item.

type string

The value is always message.

status string

The status of the output item.

role string

The role of the message sender.

content string

The content of the message.

response.content_part.added

This event is returned by the server when a new content part is output.

event_id string

The unique ID of the event.

{
    "event_id": "event_J2UixwYKZsXg7c9YXZetL",
    "type": "response.content_part.added",
    "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
    "item_id": "item_OFaPGtzfWCPyGzxnuEX9i",
    "output_index": 0,
    "content_index": 0,
    "part": {
        "type": "audio",
        "text": ""
    }
}

type string

The type of the event. The value is always response.content_part.added.

response_id string

The unique ID of the response.

item_id string

The unique ID of the message item.

output_index integer

The value is always 0.

content_index integer

The value is always 0.

part object

Outputs item information.

Properties

type string

The type of the content part.

text string

The text of the content part.

response.content_part.done

The server returns this event after a new content part is completely output.

event_id string

The unique identifier for this event.

{
    "event_id": "event_VN4Q4GJugLcc1S23viW8E",
    "type": "response.content_part.done",
    "response_id": "resp_P79OOMs8LnrXVpiIHUCKR",
    "item_id": "item_JvJauNH2CTXb1D9WV6pD4",
    "output_index": 0,
    "content_index": 0,
    "part": {
        "type": "audio",
        "text": "Hello, I am a large language model developed by Alibaba Cloud. My name is Qwen. How can I help you?"
    }
}

type string

The event type. This is always response.content_part.done.

response_id string

The unique identifier for the response.

item_id string

The unique identifier for the message item.

output_index integer

The value is always 0.

content_index integer

The value is always 0.

part object

Information about the content part.

Properties

type string

The type of the content part.

text string

The text of the content part.