All Products
Search
Document Center

Alibaba Cloud Model Studio:Client events

Last Updated:Dec 25, 2025

This topic describes the client events for the Qwen-TTS Realtime API.

Reference: Real-time speech synthesis - Qwen.

Client events

session.update

The first event a client sends over a new WebSocket connection is session.update. This event updates the default configurations for the session. When a connection is established, the server returns the default input and output configurations. The client sends this event immediately after the connection is established to update these default configurations. After the server receives the session.update event, it validates the parameters. If the parameters are invalid, the server returns an error. Otherwise, the server updates the session configuration and returns the complete configuration.

Parameter

Type

Description

type

string

The event type. The value is fixed to session.update.

event_id

string

The identifier for this event.

session

object

The session configuration.

session.mode

string

The interaction mode. Valid values:

  • server_commit (default)

  • commit

session.voice

string

The voice used for speech synthesis. For more information, see Supported voices.

Supports system and custom voices:

  • System voices: Available only for Qwen3-TTS-Flash-Realtime and Qwen-TTS-Realtime . For voice samples, see Supported voices.

  • Custom voices

session.language_type

string

Specifies the language of the synthesized audio. The default value is Auto.

  • Auto: Use this value when the language of the text is uncertain or the text contains multiple languages. The model automatically matches the pronunciation for segments in different languages, but cannot guarantee perfect accuracy.

  • Specific language: Specify the specific language when the text is in a single language. This significantly improves synthesis quality and typically yields better results than Auto. Valid values include the following:

    • Chinese

    • English

    • German

    • Italian

    • Portuguese

    • Spanish

    • Japanese

    • Korean

    • French

    • Russian

session.response_format

string

The format of the audio output from the model. Supported formats:

  • "pcm" (default)

  • "wav"

  • "mp3"

  • "opus"

Qwen-TTS-Realtime (see Supported models) supports only pcm.

session.sample_rate

integer

The sample rate (in Hz) of the audio output from the model. Supported sample rates:

  • 8000

  • 16000

  • 24000 (default)

  • 48000

Qwen-TTS-Realtime (see Supported models) supports only 24000.

session.speech_rate

float

The speech rate of the audio. A value of 1.0 indicates a normal speed. A value less than 1.0 indicates a slower speed, and a value greater than 1.0 indicates a faster speed.

Default value: 1.0.

Valid values: [0.5, 2.0].

Qwen-TTS-Realtime (see Supported models) does not support this parameter.

session.volume

integer

The volume of the audio.

Default value: 50.

Value range: [0, 100].

Qwen-TTS-Realtime (see Supported models) does not support this parameter.

session.pitch_rate

float

The pitch of the synthesized audio.

Default value: 1.0.

Value range: [0.5, 2.0].

Qwen-TTS-Realtime (see Supported models) does not support this parameter.

session.bit_rate

integer

Specifies the bitrate (in kbps) of the audio. A higher bitrate results in better audio quality and a larger file size. This parameter is available only when the audio format (response_format) is set to opus.

Default value: 128.

Value range: [6, 510].

Qwen-TTS-Realtime (see Supported models) does not support this parameter.

{
    "event_id": "event_123",
    "type": "session.update",
    "session": {
        "mode": "server_commit",
        "voice": "Cherry",
        "language_type": "Chinese",
        "response_format": "pcm",
        "sample_rate": 24000
    }
}

input_text_buffer.append

In server_commit mode, the text is appended to the text buffer on the server. In commit mode, the text is appended to the text buffer on the client.

Parameter

Type

Description

type

string

The event type. The value must be input_text_buffer.append.

event_id

string

The ID of the event.

text

string

The input text.

{
  "event_id": "event_B4o9RHSTWobB5OQdEHLTo",
  "type": "input_text_buffer.append",
  "text": "Hello, I am Qwen."
}

input_text_buffer.commit

This event commits the user input text buffer to create a new user message item in the conversation. If the input text buffer is empty, this event generates an error. In server_commit mode, this event signals the server to immediately synthesize all preceding text and stop caching text. In commit mode, the client is responsible for committing the text buffer to create the user message item. Committing the input text buffer does not create a response from the model. The server responds with an input_text_buffer.committed event.

Parameter

Type

Description

type

string

The event type. The value is fixed to input_text_buffer.commit.

event_id

string

The identifier for this event.

{
  "event_id": "event_B4o9RHSTWobB5OQdEHLTo",
  "type": "input_text_buffer.commit"
}

session.finish

The client sends the session.finish event to indicate that no more text will be sent. The server then returns any remaining audio and closes the connection.

Parameter

Type

Description

type

string

The event type, which is always session.finish.

event_id

string

The unique identifier for the event.

{
  "event_id": "event_2239",
  "type": "session.finish"
}