All Products
Search
Document Center

Alibaba Cloud Model Studio:Client events

Last Updated:Mar 15, 2026

This topic describes the client events for the qwen3-livetranslate-flash-realtime API.

Reference: Real-time audio and video translation - Qwen.

session.update

Send this event after establishing a WebSocket connection to update the default session configuration. The server validates parameters and returns either an error (if invalid) or the updated configuration (if valid).

type string (Required)

Event type. Must be set to session.update.

{
  "event_id": "event_ToPZqeobitzUJnt3QqtWg",
  "type": "session.update",
  "session": {
    "modalities": [
      "text",
      "audio"
    ],
    "voice": "Cherry",
    "input_audio_format": "pcm16",
    "output_audio_format": "pcm24",
    "input_audio_transcription": {
      "model": "qwen3-asr-flash-realtime",
      "language": "zh"
    },
    "translation": {
      "language": "en"
    }
  }
}

session object (Optional)

The session configuration.

Properties

modalities array (Optional)

Output modalities. Valid values:

  • ["text"]

    Outputs text only.

  • ["text","audio"] (Default)

    Outputs text and audio.

voice string (Optional)

Voice for generated audio. Valid values: Supported voices. Default value: Cherry.

input_audio_transcription object (Optional)

Configuration for input audio.

Properties

model string (Optional)

Speech recognition model. If configured, the server returns both the recognition result (original source language text) and translation via the conversation.item.input_audio_transcription.text and conversation.item.input_audio_transcription.completed events.

Valid value: qwen3-asr-flash-realtime.

language string (Optional)

Source language for translation. Valid values: Supported languages. Default value: en.

input_audio_format string (Optional)

Input audio format. Currently, this parameter can only be set to pcm16.

output_audio_format string (Optional)

Output audio format. Currently, this parameter can only be set to pcm24.

translation object (Optional)

Translation configuration.

Properties

language string (Optional)

Target language for translation. Valid values: Supported languages. Default value: en.

input_audio_buffer.append

Appends audio bytes to the input audio buffer. The service uses this buffer to detect speech and determine when to submit it.

type string (Required)

Event type. Must be set to input_audio_buffer.append.

{
    "event_id": "event_xxx",
    "type": "input_audio_buffer.append",
    "audio": "xxx"
}

audio string (Required)

Base64-encoded audio data.

input_image_buffer.append

Adds image data to the image buffer from a local file or real-time video stream.

Image input limits:

  • Image format: JPG or JPEG. Recommended resolution for optimal performance: 480p or 720p (maximum: 1080p).

  • Maximum image size: 500 KB (before Base64 encoding).

  • Image data must be Base64-encoded.

  • Maximum frequency: 2 images per second.

  • Send at least one input_audio_buffer.append event before sending input_image_buffer.append.

type string (Required)

Event type. Must be set to input_image_buffer.append.

{
    "event_id": "event_xxx",
    "type": "input_image_buffer.append",
    "image": "xxx"
}

image string (Required)

Base64-encoded image data.

session.finish

Send this event to end the current session. Server responses:

The client must disconnect after receiving session.finished.

type string (Required)

Event type. Must be set to session.finish.

{
    "event_id": "event_xxx",
    "type": "session.finish"
}