All Products
Search
Document Center

Alibaba Cloud Model Studio:Server-side events

Last Updated:Nov 15, 2025

This topic describes the server-side events for the Qwen-TTS-Realtime API.

For more information, see Real-time speech synthesis - Qwen.

Server-side events

error

The server returns this event when it encounters a client-side or server-side error.

Parameter

Type

Description

type

string

The event type. The value is fixed to error.

error

object

The detailed information about the error.

error.type

string

The error type.

error.code

string

The error code.

error.message

string

The error message.

error.param

string

The parameter related to the error, such as session.voice.

{
  "event_id": "event_B2uoU7VOt1AAITsPRPH9n",
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "code": "invalid_value",
    "message": "Invalid value: 'qwen-tts'. Supported values are: 'Qwen2.5-tts'.",
    "param": "session.input_audio_transcription.model",
    "event_id": "event_123"
  }
}

session.created

The server returns this event immediately after a client connects. This event contains the default server configurations for the connection.

Parameter

Type

Description

type

string

The event type. The value is always session.created.

session

object

The session configuration.

session.id

string

The unique ID of the session.

session.object

string

The service name of the session.

session.mode

string

The response mode of the model.

session.model

string

The model used.

session.voice

string

The voice used to generate the audio.

session.response_format

string

The audio output format. Currently, only pcm is supported.

session.sample_rate

integer

The audio output sample rate. Currently, only 24000 is supported.

{
  "event_id": "event_xxx",
  "type": "session.created",
  "session": {
    "object": "realtime.session",
    "mode": "server_commit",
    "model": "qwen-tts-realtime",
    "voice": "Cherry",
    "response_format": "pcm",
    "sample_rate": 24000,
    "id": "sess_xxx"
  }
}

session.updated

The server returns this event after it receives and successfully processes a session.update request from the client. If an error occurs, an error event is returned instead.

Parameter

Type

Description

type

string

The event type. The value is set to session.updated.

session

object

The session configuration.

session.id

string

The unique ID of the session.

session.object

string

The service name of the session.

session.mode

string

The output mode of the model.

session.model

string

The model used.

session.voice

string

The voice of the generated audio.

session.language_type

string

Specifies the language for speech synthesis. The default value is Auto.

  • Auto: Use this value if the language of the text is uncertain or contains multiple languages. The model automatically detects and pronounces text segments in different languages, but the pronunciation may not be perfectly accurate.

  • Specific language: Use this option if the text is in a single language. Specifying a language significantly improves the synthesis quality and typically yields better results than Auto. Valid values include the following:

    • Chinese

    • English

    • German

    • Italian

    • Portuguese

    • Spanish

    • Japanese

    • Korean

    • French

    • Russian

session.response_format

string

The format of the audio output from the model. Currently, only pcm is supported.

session.sample_rate

integer

The sample rate of the audio output from the model. Currently, only 24000 is supported.

{
  "event_id": "event_xxx",
  "type": "session.updated",
  "session": {
    "id": "sess_xxx",
    "object": "realtime.session",
    "model": "qwen-tts-realtime",
    "voice": "Cherry",
    "language_type": "Chinese",
    "mode": "commit",
    "response_format": "pcm",
    "sample_rate": 24000
  }
}

input_text_buffer.committed

The server returns this event after receiving an input_text_buffer.commit event from the client.

Parameter

Type

Description

event_id

string

The ID of the event.

type

string

The type of the event. The value is fixed to input_text_buffer.committed.

item_id

string

The ID of the user message item to be created.

{
  "event_id": "event_FC6MA88wS2oEeXkPvWsxX",
  "type": "input_text_buffer.committed",
  "item_id": ""
}

input_text_buffer.cleared

This response event is sent by the server after the client sends the input_audio_buffer.clear event.

Parameter

Type

Description

event_id

string

The ID of the event.

type

string

The event type. The value is fixed as input_text_buffer.cleared.

{
    "event_id": "event_1122",
    "type": "input_text_buffer.cleared"
}

response.created

The server returns this event when it generates a new model response.

Parameter

Type

Description

type

string

Set to response.created.

event_id

string

The ID of the event.

response

object

The response object.

response.id

string

The unique ID of the response.

response.object

string

The object type. Set to realtime.response.

response.status

string

The final status of the response. Valid values are completed, failed, in_progress, and incomplete.

response.voice

string

The voice used for the generated audio.

response.output

array

This parameter is currently empty.

{
  "event_id": "event_IMnLqDvG6Ahhk7sWV2uOs",
  "type": "response.created",
  "response": {
    "id": "resp_USvBwHktHcz76r6GaIJUV",
    "object": "realtime.response",
    "conversation_id": "",
    "status": "in_progress",
    "voice": "Cherry",
    "output": []
  }
}

response.output_item.added

The server returns this event when a new item is added to the output.

Parameter

Type

Description

type

string

The value is fixed at response.output_item.added.

response_id

string

The ID that associates all outputs from the same response.

output_index

integer

The index of the output item in the response. The value is currently fixed at 0.

item

object

Information about the output item.

item.id

string

The unique ID of the output item.

item.object

string

The value is fixed at realtime.item.

item.status

string

The status of the output item.

item.content

array

The content of the message.

{
  "event_id": "event_INDGnGNulaXCrStd9ZM5X",
  "type": "response.output_item.added",
  "response_id": "resp_USvBwHktHcz76r6GaIJUV",
  "output_index": 0,
  "item": {
    "id": "item_FIrYGaNVK3rbIZqeY4QjM",
    "object": "realtime.item",
    "type": "message",
    "status": "in_progress",
    "role": "assistant",
    "content": []
  }
}

response.content_part.added

The server returns this event when a new content part is added to the output.

Parameter

Type

Description

type

string

The value is fixed as response.content_part.added.

response_id

string

The ID of the response.

item_id

string

The ID of the message item.

output_index

integer

The index of the output item in the response. The value is currently fixed at 0.

content_index

integer

The index of the content part within the response output item. The value is currently fixed at 0.

part

object

The completed content part.

part.type

string

The type of the content part.

part.text

string

The text of the content part.

{
  "event_id": "event_DigZ95MWN36YYyyjcENoq",
  "type": "response.content_part.added",
  "response_id": "resp_USvBwHktHcz76r6GaIJUV",
  "item_id": "item_FIrYGaNVK3rbIZqeY4QjM",
  "output_index": 0,
  "content_index": 0,
  "part": {
    "type": "audio",
    "text": ""
  }
}

response.audio.delta

The server returns this event when the model incrementally generates new audio data.

Parameter

Type

Description

type

string

The value is fixed at response.audio.delta.

response_id

string

You can associate all outputs from the same response.

item_id

string

The ID of the message item. This ID is used to associate all parts of the same message item.

output_index

integer

The index of the output item in the response. The value is currently fixed at 0.

content_index

integer

The index of the internal part within the response output item. The value is currently fixed at 0.

delta

string

The incremental audio data that is output by the model. The data is Base64-encoded.

{
  "event_id": "event_B1osWMZBtrEQbiIwW0qHQ",
  "type": "response.audio.delta",
  "response_id": "resp_B1osWTzBb8hO0WsELHgVP",
  "item_id": "item_B1osWH81fXDoyim1T5fsF",
  "output_index": 0,
  "content_index": 0,
  "delta": "base64 audio"
}

response.content_part.done

The server returns this event when the output for a content part is complete.

Parameter

Type

Description

type

string

The value is fixed at response.content_part.done.

response_id

string

The ID of the response.

item_id

string

The ID of the message item.

output_index

integer

The index of the output item in the response. The value is fixed at 0.

content_index

integer

The index of the content part within the response output item. The value is fixed at 0.

part

object

The completed content part.

part.type

string

The type of the content part.

part.text

string

The text of the content part.

{
  "event_id": "event_Vo2YUjlYQJ4colH8nVzkU",
  "type": "response.content_part.done",
  "response_id": "resp_USvBwHktHcz76r6GaIJUV",
  "item_id": "item_FIrYGaNVK3rbIZqeY4QjM",
  "output_index": 0,
  "content_index": 0,
  "part": {
    "type": "audio",
    "text": ""
  }
}

response.output_item.done

The server returns this event when the output for an item is complete.

Parameter

Type

Description

type

string

The value is always response.output_item.done.

response_id

string

The ID of the response.

output_index

integer

The index of the output item in the response. The value is 0.

item

object

Information about the output item.

item.id

string

The unique ID of the output item.

item.object

string

The value is always realtime.item.

item.status

string

The status of the output item.

item.content

array

The content of the message.

{
  "event_id": "event_LO6SJRKIQ9NBayyYB8a1A",
  "type": "response.output_item.done",
  "response_id": "resp_USvBwHktHcz76r6GaIJUV",
  "output_index": 0,
  "item": {
    "id": "item_FIrYGaNVK3rbIZqeY4QjM",
    "object": "realtime.item",
    "type": "message",
    "status": "completed",
    "role": "assistant",
    "content": [
      {
        "type": "audio",
        "text": ""
      }
    ]
  }
}

response.audio.done

The server returns this event when the model finishes generating audio data.

Parameter

Type

Description

type

string

The value is response.audio.done.

response_id

string

The ID of the response. This ID associates all outputs that belong to the same response.

item_id

string

The ID of the message item. This ID associates all parts of the same message item.

output_index

integer

The index of the output item in the response. The value is 0.

content_index

integer

The index of the content part within a response output item. The value is 0.

{
  "event_id": "event_LZaOHPzXYMUXGBcVkBmKX",
  "type": "response.audio.done",
  "response_id": "resp_USvBwHktHcz76r6GaIJUV",
  "item_id": "item_FIrYGaNVK3rbIZqeY4QjM",
  "output_index": 0,
  "content_index": 0
}

response.done

The server returns this event when the response generation is complete. The Response object in this event contains all output items, but does not include the raw audio data that has already been returned.

Parameter

Type

Description

type

string

The value is fixed at response.done.

response

object

The response object.

response.id

string

The unique ID of the response.

response.object

string

The object type. The value is fixed at realtime.response.

response.output

array

The output of the response.

response.usage

object

The billing details for the speech synthesis.

response.usage.characters

integer

The number of billable characters for Qwen3-TTS Realtime.

response.usage.total_tokens

integer

The total number of tokens for the Qwen-TTS Realtime input and output (synthesized audio).

response.usage.input_tokens

integer

The total number of tokens for the Qwen-TTS Realtime input.

response.usage.output_tokens

integer

The total number of tokens for the Qwen-TTS Realtime output.

response.usage.input_tokens_details

integer

Details of the tokens in the Qwen-TTS Realtime input.

response.usage.input_tokens_details.text_tokens

integer

The total number of tokens for the Qwen-TTS Realtime text input.

response.usage.output_tokens_details

integer

Details of the tokens in the Qwen-TTS Realtime output.

response.usage.output_tokens_details.text_tokens

integer

The total number of tokens for the Qwen-TTS Realtime text output.

response.usage.output_tokens_details.audio_tokens

integer

The total number of tokens for the Qwen-TTS Realtime audio output.

The audio is converted to tokens at a rate of 50 tokens per second. Audio with a duration of less than one second is counted as 50 tokens.

Qwen3-TTS Realtime

{
    "event_id": "event_Aemy83XqHFFDDSeJIDn6N",
    "type": "response.done",
    "response": {
        "id": "resp_LFeR42yXZ9SxUAeXjmyTz",
        "object": "realtime.response",
        "conversation_id": "",
        "status": "completed",
        "modalities": [
            "text",
            "audio"
        ],
        "voice": "Cherry",
        "output": [
            {
                "id": "item_Ae1lv2XmRljRSG96L8Zm1",
                "object": "realtime.item",
                "type": "message",
                "status": "completed",
                "role": "assistant",
                "content": [
                    {
                        "type": "audio",
                        "transcript": ""
                    }
                ]
            }
        ],
        "usage": {
            "characters": 25
        }
    }
}

Qwen-TTS Realtime

{
  "event_id": "event_xxx",
  "type": "response.done",
  "response": {
    "id": "resp_xxx",
    "object": "realtime.response",
    "conversation_id": "",
    "status": "completed",
    "modalities": [
      "text",
      "audio"
    ],
    "voice": "Cherry",
    "output": [
      {
        "id": "item_FIrYGaNVK3rbIZqeY4QjM",
        "object": "realtime.item",
        "type": "message",
        "status": "completed",
        "role": "assistant",
        "content": [
          {
            "type": "audio",
            "transcript": ""
          }
        ]
      }
    ],
    "usage": {
      "total_tokens": 67,
      "input_tokens": 3,
      "output_tokens": 64,
      "input_tokens_details": {
        "text_tokens": 3
      },
      "output_tokens_details": {
        "text_tokens": 0,
        "audio_tokens": 64
      }
    }
  }
}

session.finished

The server returns this event when the generation of all responses is complete.

Parameter

Type

Description

type

string

The value is fixed at session.finished.

event_id

object

The ID of the event.

{
  "event_id": "event_2239",
  "type": "session.finished"
}