Translate speech in real time using the DashScope Java SDK and the qwen3-livetranslate-flash-realtime model. The SDK connects over WebSocket, streams audio input, and returns translated text and synthesized speech.
Before you begin
-
Set the API key as an environment variable:
# Linux / macOS export DASHSCOPE_API_KEY=<your-api-key> source ~/.bashrc # Windows set DASHSCOPE_API_KEY=<your-api-key> -
Review the Qwen-LiveTranslate model overview for supported languages and voices.
Quick start
Connect, send audio, and receive translated text:
import com.alibaba.dashscope.audio.omni.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import java.util.Arrays;
// 1. Build connection parameters
OmniRealtimeParam param = OmniRealtimeParam.builder()
.model("qwen3-livetranslate-flash-realtime")
// International endpoint. For Chinese mainland, use:
// wss://dashscope.aliyuncs.com/api-ws/v1/realtime
.url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
// 2. Define a callback to handle server events
OmniRealtimeCallback callback = new OmniRealtimeCallback() {
@Override public void onOpen() { System.out.println("Connected"); }
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
if ("response.audio_transcript.done".equals(type)) {
System.out.println("Translation: " + message.get("transcript").getAsString());
}
}
@Override
public void onClose(int code, String reason) {
System.out.println("Closed: " + code + " " + reason);
}
};
// 3. Open a session and start streaming audio
OmniRealtimeConversation conversation = new OmniRealtimeConversation(param, callback);
conversation.connect();
// 4. Configure translation target language
OmniRealtimeConfig config = OmniRealtimeConfig.builder()
.modalities(Arrays.asList(OmniRealtimeModality.AUDIO, OmniRealtimeModality.TEXT))
.translationConfig(OmniRealtimeTranslationParam.builder()
.language("en")
.build())
.build();
conversation.updateSession(config);
// 5. Send Base64-encoded audio chunks (PCM 16 kHz, 16-bit, mono)
conversation.appendAudio(audioBase64);
// 6. End the session when done
conversation.endSession();
Configuration overview
Three builder objects control different aspects of a translation session:
OmniRealtimeParam --> Connection: model, endpoint, API key
+-- OmniRealtimeConfig --> Session: audio formats, voice, modalities
+-- OmniRealtimeTranslationParam --> Translation: target language, custom terminology
Pass OmniRealtimeParam to the constructor when creating the conversation. After connecting, call updateSession() with OmniRealtimeConfig to configure audio and translation. Omitting updateSession() uses service defaults.
Request parameters
OmniRealtimeParam
Build these with OmniRealtimeParam.builder() to establish the WebSocket connection.
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
String |
Yes | Model name. Set to qwen3-livetranslate-flash-realtime. |
url |
String |
Yes | WebSocket endpoint. Use wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime (international) or wss://dashscope.aliyuncs.com/api-ws/v1/realtime (Chinese mainland). |
apikey |
String |
No | API key for authentication. If not set, configure it via the DASHSCOPE_API_KEY environment variable. |
OmniRealtimeConfig
Build these with OmniRealtimeConfig.builder() and then call conversation.updateSession(config) after connecting.
| Parameter | Type | Required | Description |
|---|---|---|---|
modalities |
List<OmniRealtimeModality> |
No | Output modalities. Default: [AUDIO, TEXT]. Use [TEXT] for text-only output. |
voice |
String |
No | Voice for synthesized speech. Default: Cherry. See supported voices. |
inputAudioFormat |
OmniRealtimeAudioFormat |
No | Input audio format. Default: PCM_16000HZ_MONO_16BIT. |
outputAudioFormat |
OmniRealtimeAudioFormat |
No | Output audio format. Default: PCM_24000HZ_MONO_16BIT. |
InputAudioTranscription |
String |
No | ASR model for transcribing original speech. Set to qwen3-asr-flash-realtime to receive source-language transcription alongside translation. |
translationConfig |
OmniRealtimeTranslationParam |
No | Translation settings. See OmniRealtimeTranslationParam below. |
OmniRealtimeTranslationParam
Build target language and custom terminology with OmniRealtimeTranslationParam.builder().
| Parameter | Type | Required | Description |
|---|---|---|---|
language |
String |
No | Target language code. Default: en. See supported languages. |
corpus |
Corpus |
No | Custom terminology for domain-specific terms. |
corpus.phrases |
Map<String, Object> |
No | Term mapping: keys are source terms, values are target translations. Example: {"Inteligencia Artificial": "Artificial Intelligence"} |
Key interfaces
OmniRealtimeConversation
Manages the WebSocket connection and audio streaming lifecycle.
Import: com.alibaba.dashscope.audio.omni.OmniRealtimeConversation
| Method | Description |
|---|---|
OmniRealtimeConversation(OmniRealtimeParam param, OmniRealtimeCallback callback) |
Creates a conversation instance with connection parameters and an event callback. |
void connect() |
Opens WebSocket connection. Triggers session.created and session.updated events. Throws NoApiKeyException, InterruptedException. |
void updateSession(OmniRealtimeConfig config) |
Updates session configuration after connecting. Triggers session.updated event. Only set parameters you need; defaults apply for omitted parameters. |
void appendAudio(String audioBase64) |
Sends Base64-encoded audio chunk to the server. The server automatically detects speech boundaries and triggers translation. |
void endSession() |
Ends session gracefully. The server completes in-progress translation before sending session.finished. Throws InterruptedException. |
void close(int code, String reason) |
Stops the task and closes WebSocket connection. |
String getSessionId() |
Returns session ID. |
String getResponseId() |
Returns response ID of the most recent server response. |
long getFirstTextDelay() |
Returns first text delay of the most recent response (milliseconds). |
long getFirstAudioDelay() |
Returns first audio delay of the most recent response (milliseconds). |
OmniRealtimeCallback
Handles server events delivered over WebSocket. Extend this class and implement each method to process events.
Import: com.alibaba.dashscope.audio.omni.OmniRealtimeCallback
| Method | Parameters | Description |
|---|---|---|
void onOpen() |
None | Called when WebSocket connection is established. |
abstract void onEvent(JsonObject message) |
message: A JSON object containing a server-side event. |
Called for each server event. Parse the type field to determine the event kind. |
abstract void onClose(int code, String reason) |
code: WebSocket status code. reason: Closure description. |
Called when WebSocket connection closes. |
Common event types received in onEvent:
| Event type | Description |
|---|---|
input_audio_buffer.speech_started |
Server detected speech in audio stream. |
input_audio_buffer.speech_stopped |
Server detected end of speech segment. |
conversation.item.input_audio_transcription.completed |
Source-language transcription is ready. Read message.get("transcript"). Requires InputAudioTranscription to be set. |
response.audio_transcript.done |
Translated text is ready. Read message.get("transcript"). |
response.audio.delta |
A chunk of translated audio is available. Read message.get("delta") for Base64-encoded audio data. |
error |
Error occurred. Read message.get("error").getAsJsonObject().get("message") for details. |
Complete example
This example captures microphone audio, translates it in real time, and plays translated speech through the system speaker.
What it does:
-
Connects to Qwen-LiveTranslate over WebSocket.
-
Configures Spanish-to-English translation with custom terminology.
-
Streams microphone audio in 100 ms chunks.
-
Prints original transcription and translated text.
-
Plays translated audio through speaker.
Replace placeholders with your values:
| Placeholder | Description | Example |
|---|---|---|
<your-api-key> |
Model Studio API key | sk-xxxxxxxx |
Related topics
-
Qwen-LiveTranslate model overview -- Supported languages, voices, and capabilities
-
Server-side events reference -- Event types, JSON schemas, and error codes
-
Install DashScope SDK -- Installation and dependency setup
-
Get an API key -- Creation and management