This topic describes the client events for the qwen3-livetranslate-flash-realtime API.
Reference: Real-time audio and video translation - Qwen.
session.update
Send this event after establishing a WebSocket connection to update the default session configuration. The server validates parameters and returns either an error (if invalid) or the updated configuration (if valid).
|
type Event type. Must be set to |
|
|
session The session configuration. |
input_audio_buffer.append
Appends audio bytes to the input audio buffer. The service uses this buffer to detect speech and determine when to submit it.
|
type Event type. Must be set to |
|
|
audio Base64-encoded audio data. |
input_image_buffer.append
Adds image data to the image buffer from a local file or real-time video stream.
Image input limits:
-
Image format: JPG or JPEG. Recommended resolution for optimal performance: 480p or 720p (maximum: 1080p).
-
Maximum image size: 500 KB (before Base64 encoding).
-
Image data must be Base64-encoded.
-
Maximum frequency: 2 images per second.
-
Send at least one input_audio_buffer.append event before sending input_image_buffer.append.
|
type Event type. Must be set to |
|
|
image Base64-encoded image data. |
session.finish
Send this event to end the current session. Server responses:
-
If speech is detected: Server completes speech recognition, sends conversation.item.input_audio_transcription.completed with the recognition result, then sends session.finished to indicate session end.
-
If no speech is detected: Server sends
session.finisheddirectly.
The client must disconnect after receiving session.finished.
|
type Event type. Must be set to |
|