All Products
Search
Document Center

Alibaba Cloud Model Studio:Qwen-LiveTranslate Java SDK - API reference

Last Updated:Mar 15, 2026

Translate speech in real time using the DashScope Java SDK and the qwen3-livetranslate-flash-realtime model. The SDK connects over WebSocket, streams audio input, and returns translated text and synthesized speech.

Before you begin

  1. Install DashScope SDK version 2.22.5 or later.

  2. Get an API key.

  3. Set the API key as an environment variable:

       # Linux / macOS
       export DASHSCOPE_API_KEY=<your-api-key>
       source ~/.bashrc
    
       # Windows
       set DASHSCOPE_API_KEY=<your-api-key>
  4. Review the Qwen-LiveTranslate model overview for supported languages and voices.

Quick start

Connect, send audio, and receive translated text:

import com.alibaba.dashscope.audio.omni.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import java.util.Arrays;

// 1. Build connection parameters
OmniRealtimeParam param = OmniRealtimeParam.builder()
        .model("qwen3-livetranslate-flash-realtime")
        // International endpoint. For Chinese mainland, use:
        // wss://dashscope.aliyuncs.com/api-ws/v1/realtime
        .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
        .apikey(System.getenv("DASHSCOPE_API_KEY"))
        .build();

// 2. Define a callback to handle server events
OmniRealtimeCallback callback = new OmniRealtimeCallback() {
    @Override public void onOpen() { System.out.println("Connected"); }

    @Override
    public void onEvent(JsonObject message) {
        String type = message.get("type").getAsString();
        if ("response.audio_transcript.done".equals(type)) {
            System.out.println("Translation: " + message.get("transcript").getAsString());
        }
    }

    @Override
    public void onClose(int code, String reason) {
        System.out.println("Closed: " + code + " " + reason);
    }
};

// 3. Open a session and start streaming audio
OmniRealtimeConversation conversation = new OmniRealtimeConversation(param, callback);
conversation.connect();

// 4. Configure translation target language
OmniRealtimeConfig config = OmniRealtimeConfig.builder()
        .modalities(Arrays.asList(OmniRealtimeModality.AUDIO, OmniRealtimeModality.TEXT))
        .translationConfig(OmniRealtimeTranslationParam.builder()
                .language("en")
                .build())
        .build();
conversation.updateSession(config);

// 5. Send Base64-encoded audio chunks (PCM 16 kHz, 16-bit, mono)
conversation.appendAudio(audioBase64);

// 6. End the session when done
conversation.endSession();

Configuration overview

Three builder objects control different aspects of a translation session:

OmniRealtimeParam          --> Connection: model, endpoint, API key
  +-- OmniRealtimeConfig   --> Session: audio formats, voice, modalities
       +-- OmniRealtimeTranslationParam  --> Translation: target language, custom terminology

Pass OmniRealtimeParam to the constructor when creating the conversation. After connecting, call updateSession() with OmniRealtimeConfig to configure audio and translation. Omitting updateSession() uses service defaults.

Request parameters

OmniRealtimeParam

Build these with OmniRealtimeParam.builder() to establish the WebSocket connection.

Click to view the sample code

OmniRealtimeParam param = OmniRealtimeParam.builder()
        .model("qwen3-livetranslate-flash-realtime")
        // The following URL is for the international version. If you use the Chinese Mainland version, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
        .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
        // API keys for the international version and the Chinese Mainland version are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key
        // If an environment variable is not configured, replace the next line with your Model Studio API key: .apikey("sk-xxx")
        .apikey(System.getenv("DASHSCOPE_API_KEY"))
        .build();
Parameter Type Required Description
model String Yes Model name. Set to qwen3-livetranslate-flash-realtime.
url String Yes WebSocket endpoint. Use wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime (international) or wss://dashscope.aliyuncs.com/api-ws/v1/realtime (Chinese mainland).
apikey String No API key for authentication. If not set, configure it via the DASHSCOPE_API_KEY environment variable.

OmniRealtimeConfig

Build these with OmniRealtimeConfig.builder() and then call conversation.updateSession(config) after connecting.

Click to view the sample code

// Set custom translation phrases
Map<String, Object> phrases = new HashMap<>();
phrases.put("Inteligencia Artificial", "Artificial Intelligence");
phrases.put("Aprendizaje Automático", "Machine Learning");

OmniRealtimeConfig config = OmniRealtimeConfig.builder()
        .modalities(Arrays.asList(OmniRealtimeModality.AUDIO, OmniRealtimeModality.TEXT))
        .voice("Cherry")
        .inputAudioFormat(OmniRealtimeAudioFormat.PCM_16000HZ_MONO_16BIT)
        .outputAudioFormat(OmniRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
        .InputAudioTranscription("qwen3-asr-flash-realtime")
        .translationConfig(OmniRealtimeTranslationParam.builder()
                .language("en")
                .corpus(OmniRealtimeTranslationParam.Corpus.builder()
                        .phrases(phrases)
                        .build())
                .build())
        .build();

conversation.updateSession(config);
Parameter Type Required Description
modalities List<OmniRealtimeModality> No Output modalities. Default: [AUDIO, TEXT]. Use [TEXT] for text-only output.
voice String No Voice for synthesized speech. Default: Cherry. See supported voices.
inputAudioFormat OmniRealtimeAudioFormat No Input audio format. Default: PCM_16000HZ_MONO_16BIT.
outputAudioFormat OmniRealtimeAudioFormat No Output audio format. Default: PCM_24000HZ_MONO_16BIT.
InputAudioTranscription String No ASR model for transcribing original speech. Set to qwen3-asr-flash-realtime to receive source-language transcription alongside translation.
translationConfig OmniRealtimeTranslationParam No Translation settings. See OmniRealtimeTranslationParam below.

OmniRealtimeTranslationParam

Build target language and custom terminology with OmniRealtimeTranslationParam.builder().

Click to view the sample code

// Set translation phrases
Map<String, Object> phrases = new HashMap<>();
phrases.put("Inteligencia Artificial", "Artificial Intelligence");  // Source language word: Target language translation
phrases.put("Aprendizaje Automático", "Machine Learning");

OmniRealtimeTranslationParam translationParam = OmniRealtimeTranslationParam.builder()
        .language("en")  // Target language code
        .corpus(OmniRealtimeTranslationParam.Corpus.builder()
                .phrases(phrases)
                .build())
        .build();
Parameter Type Required Description
language String No Target language code. Default: en. See supported languages.
corpus Corpus No Custom terminology for domain-specific terms.
corpus.phrases Map<String, Object> No Term mapping: keys are source terms, values are target translations. Example: {"Inteligencia Artificial": "Artificial Intelligence"}

Key interfaces

OmniRealtimeConversation

Manages the WebSocket connection and audio streaming lifecycle.

Import: com.alibaba.dashscope.audio.omni.OmniRealtimeConversation

Method Description
OmniRealtimeConversation(OmniRealtimeParam param, OmniRealtimeCallback callback) Creates a conversation instance with connection parameters and an event callback.
void connect() Opens WebSocket connection. Triggers session.created and session.updated events. Throws NoApiKeyException, InterruptedException.
void updateSession(OmniRealtimeConfig config) Updates session configuration after connecting. Triggers session.updated event. Only set parameters you need; defaults apply for omitted parameters.
void appendAudio(String audioBase64) Sends Base64-encoded audio chunk to the server. The server automatically detects speech boundaries and triggers translation.
void endSession() Ends session gracefully. The server completes in-progress translation before sending session.finished. Throws InterruptedException.
void close(int code, String reason) Stops the task and closes WebSocket connection.
String getSessionId() Returns session ID.
String getResponseId() Returns response ID of the most recent server response.
long getFirstTextDelay() Returns first text delay of the most recent response (milliseconds).
long getFirstAudioDelay() Returns first audio delay of the most recent response (milliseconds).

OmniRealtimeCallback

Handles server events delivered over WebSocket. Extend this class and implement each method to process events.

Import: com.alibaba.dashscope.audio.omni.OmniRealtimeCallback

Method Parameters Description
void onOpen() None Called when WebSocket connection is established.
abstract void onEvent(JsonObject message) message: A JSON object containing a server-side event. Called for each server event. Parse the type field to determine the event kind.
abstract void onClose(int code, String reason) code: WebSocket status code. reason: Closure description. Called when WebSocket connection closes.

Common event types received in onEvent:

Event type Description
input_audio_buffer.speech_started Server detected speech in audio stream.
input_audio_buffer.speech_stopped Server detected end of speech segment.
conversation.item.input_audio_transcription.completed Source-language transcription is ready. Read message.get("transcript"). Requires InputAudioTranscription to be set.
response.audio_transcript.done Translated text is ready. Read message.get("transcript").
response.audio.delta A chunk of translated audio is available. Read message.get("delta") for Base64-encoded audio data.
error Error occurred. Read message.get("error").getAsJsonObject().get("message") for details.

Complete example

This example captures microphone audio, translates it in real time, and plays translated speech through the system speaker.

What it does:

  1. Connects to Qwen-LiveTranslate over WebSocket.

  2. Configures Spanish-to-English translation with custom terminology.

  3. Streams microphone audio in 100 ms chunks.

  4. Prints original transcription and translated text.

  5. Plays translated audio through speaker.

Sample code for real-time microphone translation

import com.alibaba.dashscope.audio.omni.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;

import javax.sound.sampled.*;
import java.util.*;
import java.util.concurrent.atomic.AtomicBoolean;

/**
 * Example of using a microphone with the real-time audio and video translation model.
 */
public class Main {
    private static final int INPUT_CHUNK_SIZE = 3200;   // 100 ms of 16 kHz, 16-bit, mono audio
    private static final int OUTPUT_CHUNK_SIZE = 4800;  // 100 ms of 24 kHz, 16-bit, mono audio
    private static final AtomicBoolean running = new AtomicBoolean(true);
    private static SourceDataLine speaker;  // Speaker

    public static void main(String[] args) throws InterruptedException {
        String apiKey = System.getenv("DASHSCOPE_API_KEY");
        if (apiKey == null || apiKey.isEmpty()) {
            System.err.println("Set the DASHSCOPE_API_KEY environment variable.");
            System.exit(1);
        }

        // Create connection parameters.
        OmniRealtimeParam param = OmniRealtimeParam.builder()
                .model("qwen3-livetranslate-flash-realtime")
                .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                .apikey(apiKey)
                .build();

        // Create a callback handler.
        OmniRealtimeCallback callback = new OmniRealtimeCallback() {
            @Override
            public void onOpen() {
                System.out.println("[Connection established]");
            }

            @Override
            public void onEvent(JsonObject message) {
                String type = message.get("type").getAsString();
                switch (type) {
                    case "input_audio_buffer.speech_started":
                        System.out.println("====== Speech input detected ======");
                        break;
                    case "input_audio_buffer.speech_stopped":
                        System.out.println("====== Speech input ended ======");
                        break;
                    case "conversation.item.input_audio_transcription.completed":
                        String originalText = message.get("transcript").getAsString();
                        System.out.println("[Original text] " + originalText);
                        break;
                    case "response.audio_transcript.done":
                        String translatedText = message.get("transcript").getAsString();
                        System.out.println("[Translation result] " + translatedText);
                        break;
                    case "response.audio.delta":
                        // Decode and play the translated audio.
                        String audioB64 = message.get("delta").getAsString();
                        byte[] audioBytes = Base64.getDecoder().decode(audioB64);
                        if (speaker != null) {
                            speaker.write(audioBytes, 0, audioBytes.length);
                        }
                        break;
                    case "error":
                        JsonObject error = message.get("error").getAsJsonObject();
                        System.err.println("[Error] " + error.get("message").getAsString());
                        break;
                }
            }

            @Override
            public void onClose(int code, String reason) {
                System.out.println("[Connection closed] code: " + code + ", reason: " + reason);
            }
        };

        // Create a session.
        OmniRealtimeConversation conversation = new OmniRealtimeConversation(param, callback);

        try {
            // Initialize the speaker (for playing the translated speech).
            AudioFormat speakerFormat = new AudioFormat(24000, 16, 1, true, false);
            DataLine.Info speakerInfo = new DataLine.Info(SourceDataLine.class, speakerFormat);
            speaker = (SourceDataLine) AudioSystem.getLine(speakerInfo);
            speaker.open(speakerFormat, OUTPUT_CHUNK_SIZE * 4);
            speaker.start();

            // Initialize the microphone (for capturing speech input).
            AudioFormat micFormat = new AudioFormat(16000, 16, 1, true, false);
            DataLine.Info micInfo = new DataLine.Info(TargetDataLine.class, micFormat);
            if (!AudioSystem.isLineSupported(micInfo)) {
                System.err.println("Microphone is not available.");
                System.exit(1);
            }
            TargetDataLine microphone = (TargetDataLine) AudioSystem.getLine(micInfo);
            microphone.open(micFormat);
            microphone.start();

            // Connect to the server.
            conversation.connect();

            // Configure translation parameters.
            Map<String, Object> phrases = new HashMap<>();
            phrases.put("Inteligencia Artificial", "Artificial Intelligence");
            phrases.put("Aprendizaje Automático", "Machine Learning");

            OmniRealtimeConfig config = OmniRealtimeConfig.builder()
                    .modalities(Arrays.asList(OmniRealtimeModality.AUDIO, OmniRealtimeModality.TEXT))
                    .voice("Cherry")
                    .inputAudioFormat(OmniRealtimeAudioFormat.PCM_16000HZ_MONO_16BIT)
                    .outputAudioFormat(OmniRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
                    .InputAudioTranscription("qwen3-asr-flash-realtime")
                    .translationConfig(OmniRealtimeTranslationParam.builder()
                            .language("en")
                            .corpus(OmniRealtimeTranslationParam.Corpus.builder()
                                    .phrases(phrases)
                                    .build())
                            .build())
                    .build();

            conversation.updateSession(config);

            // Register a shutdown hook.
            Runtime.getRuntime().addShutdownHook(new Thread(() -> {
                System.out.println("\n[Exiting...]");
                running.set(false);
                microphone.stop();
                microphone.close();
                speaker.stop();
                speaker.close();
                conversation.close(1000, "User stopped");
            }));

            System.out.println("[Starting real-time translation] Speak into the microphone. Press Ctrl+C to exit.");

            // Continuously capture and send microphone audio.
            byte[] buffer = new byte[INPUT_CHUNK_SIZE];
            while (running.get()) {
                int bytesRead = microphone.read(buffer, 0, buffer.length);
                if (bytesRead > 0) {
                    conversation.appendAudio(Base64.getEncoder().encodeToString(buffer));
                }
            }

        } catch (NoApiKeyException e) {
            System.err.println("API Key error: " + e.getMessage());
        } catch (Exception e) {
            System.err.println("An exception occurred: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Replace placeholders with your values:

Placeholder Description Example
<your-api-key> Model Studio API key sk-xxxxxxxx

Related topics