All Products
Search
Document Center

Alibaba Cloud Model Studio:Real-time speech synthesis - Qwen

Last Updated:Jan 30, 2026

The Qwen real-time speech synthesis model provides low-latency Text-to-Speech (TTS) with streaming text input and audio output. It offers a variety of human-like voices, supports multiple languages and dialects, and maintains a consistent voice across different languages. The model also automatically adjusts its tone and smoothly processes complex text.

Core features

  • Generates high-fidelity, real-time speech and supports natural-sounding voices in multiple languages, including Chinese and English.

  • Provides two voice customization methods: voice cloning (cloning a voice from reference audio) and voice design (generating a voice from a text description) to quickly create custom voices.

  • Supports streaming input and output for low-latency responses in real-time interactive scenarios.

  • Enables fine-grained control over speech performance by adjusting speed, pitch, volume, and bitrate.

  • Compatible with major audio formats and supports audio output with a sample rate of up to 48 kHz.

Availability

Supported models:

International

In the international deployment mode, endpoints and data storage are located in the Singapore region, and model inference compute resources are dynamically scheduled globally (excluding the Mainland China).

When you call the following models, select an API Key from the Singapore region:

  • Qwen3-TTS-VD-Realtime: qwen3-tts-vd-realtime-2026-01-15 (latest snapshot), qwen3-tts-vd-realtime-2025-12-16 (snapshot)

  • Qwen3-TTS-VC-Realtime: qwen3-tts-vc-realtime-2026-01-15 (latest snapshot), qwen3-tts-vc-realtime-2025-11-27 (snapshot)

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime (stable version, currently equivalent to qwen3-tts-flash-realtime-2025-11-27), qwen3-tts-flash-realtime-2025-11-27 (latest snapshot), qwen3-tts-flash-realtime-2025-09-18 (snapshot)

Mainland China

In the Mainland China deployment mode, the endpoint and data storage are located in the Beijing region, and the model inference compute resource is limited to the Mainland China.

When you call the following models, select an API Key from the Beijing region:

  • Qwen3-TTS-VD-Realtime: qwen3-tts-vd-realtime-2026-01-15 (latest snapshot), qwen3-tts-vd-realtime-2025-12-16 (snapshot)

  • Qwen3-TTS-VC-Realtime: qwen3-tts-vc-realtime-2026-01-15 (latest snapshot), qwen3-tts-vc-realtime-2025-11-27 (snapshot)

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime (stable version, currently equivalent to qwen3-tts-flash-realtime-2025-11-27), qwen3-tts-flash-realtime-2025-11-27 (latest snapshot), qwen3-tts-flash-realtime-2025-09-18 (snapshot)

  • Qwen-TTS-Realtime: qwen-tts-realtime (stable version, currently equivalent to qwen-tts-realtime-2025-07-15), qwen-tts-realtime-latest (latest version, currently equivalent to qwen-tts-realtime-2025-07-15), qwen-tts-realtime-2025-07-15 (snapshot)

For more information, see Models.

Model selection

Scenario

Recommended model

Reason

Notes

Customize voices for a brand image, exclusive use, or to extend system voices (based on text description)

qwen3-tts-vd-realtime-2026-01-15

Supports voice design. This method creates custom voices from text descriptions without requiring audio samples and is ideal for designing a unique brand voice from scratch.

Does not support system voices or voice cloning.

Customize voices for a brand image, exclusive use, or to extend system voices (based on audio samples)

qwen3-tts-vc-realtime-2026-01-15

Supports voice cloning. This method quickly clones voices from real audio samples to create a human-like brand voiceprint, ensuring high fidelity and consistency.

System voices or voice design is not supported.

Intelligent customer service and conversational bots

qwen3-tts-flash-realtime-2025-11-27

Supports streaming input and output. Adjustable speed and pitch provide a natural interactive experience. The multi-format audio output adapts to different devices.

Only system voices are supported. Voice cloning or voice design is not supported.

Multilingual content broadcasting

qwen3-tts-flash-realtime-2025-11-27

Supports multiple languages and Chinese dialects to meet global content delivery needs.

Only system voices are supported. Voice cloning or voice design is not supported.

Audio reading and content production

qwen3-tts-flash-realtime-2025-11-27

Adjustable volume, speed, and pitch meet the fine-grained production requirements for content, such as audiobooks and podcasts.

Only system voices are supported. Neither voice cloning nor voice design is supported.

E-commerce livestreaming and short video dubbing

qwen3-tts-flash-realtime-2025-11-27

Supports compressed formats such as MP3 and Opus, which are suitable for bandwidth-limited scenarios. Adjustable parameters meet the needs of different dubbing styles.

Only system voices are supported. Voice cloning and voice design are not supported.

For more information, see Model feature comparison.

Getting started

Before you run the code, create and configure an API key. If you use the SDK to call the service, install the latest version of the DashScope SDK.

Synthesize speech using a system voice

The following example shows how to use a system voice for speech synthesis. For more information, see Supported voices.

Use the DashScope SDK

Python

server_commit mode

import os
import base64
import threading
import time
import dashscope
from dashscope.audio.qwen_tts_realtime import *


qwen_tts_realtime: QwenTtsRealtime = None
text_to_synthesize = [
    'Right? I really love this kind of supermarket,',
    'especially during the Chinese New Year.',
    'Going to the supermarket',
    'makes me feel',
    'super, super happy!',
    'I want to buy so many things!'
]

DO_VIDEO_TEST = False

def init_dashscope_api_key():
    """
        Set your DashScope API-key. More information:
        https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
    """

    # The API keys for the Singapore and Beijing regions are different. To get an API key, visit: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    if 'DASHSCOPE_API_KEY' in os.environ:
        dashscope.api_key = os.environ[
            'DASHSCOPE_API_KEY']  # load API-key from environment variable DASHSCOPE_API_KEY
    else:
        dashscope.api_key = 'your-dashscope-api-key'  # set API-key manually



class MyCallback(QwenTtsRealtimeCallback):
    def __init__(self):
        self.complete_event = threading.Event()
        self.file = open('result_24k.pcm', 'wb')

    def on_open(self) -> None:
        print('connection opened, init player')

    def on_close(self, close_status_code, close_msg) -> None:
        self.file.close()
        print('connection closed with code: {}, msg: {}, destroy player'.format(close_status_code, close_msg))

    def on_event(self, response: str) -> None:
        try:
            global qwen_tts_realtime
            type = response['type']
            if 'session.created' == type:
                print('start session: {}'.format(response['session']['id']))
            if 'response.audio.delta' == type:
                recv_audio_b64 = response['delta']
                self.file.write(base64.b64decode(recv_audio_b64))
            if 'response.done' == type:
                print(f'response {qwen_tts_realtime.get_last_response_id()} done')
            if 'session.finished' == type:
                print('session finished')
                self.complete_event.set()
        except Exception as e:
            print('[Error] {}'.format(e))
            return

    def wait_for_finished(self):
        self.complete_event.wait()


if __name__  == '__main__':
    init_dashscope_api_key()

    print('Initializing ...')

    callback = MyCallback()

    qwen_tts_realtime = QwenTtsRealtime(
        model='qwen3-tts-flash-realtime',
        callback=callback,
        # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
        url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
        )

    qwen_tts_realtime.connect()
    qwen_tts_realtime.update_session(
        voice = 'Cherry',
        response_format = AudioFormat.PCM_24000HZ_MONO_16BIT,
        mode = 'server_commit'        
    )
    for text_chunk in text_to_synthesize:
        print(f'send text: {text_chunk}')
        qwen_tts_realtime.append_text(text_chunk)
        time.sleep(0.1)
    qwen_tts_realtime.finish()
    callback.wait_for_finished()
    print('[Metric] session: {}, first audio delay: {}'.format(
                    qwen_tts_realtime.get_session_id(), 
                    qwen_tts_realtime.get_first_audio_delay(),
                    ))

commit mode

import base64
import os
import threading
import dashscope
from dashscope.audio.qwen_tts_realtime import *


qwen_tts_realtime: QwenTtsRealtime = None
text_to_synthesize = [
    'This is the first sentence.',
    'This is the second sentence.',
    'This is the third sentence.',
]

DO_VIDEO_TEST = False

def init_dashscope_api_key():
    """
        Set your DashScope API-key. More information:
        https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
    """

    # The API keys for the Singapore and Beijing regions are different. To get an API key, visit: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    if 'DASHSCOPE_API_KEY' in os.environ:
        dashscope.api_key = os.environ[
            'DASHSCOPE_API_KEY']  # load API-key from environment variable DASHSCOPE_API_KEY
    else:
        dashscope.api_key = 'your-dashscope-api-key'  # set API-key manually



class MyCallback(QwenTtsRealtimeCallback):
    def __init__(self):
        super().__init__()
        self.response_counter = 0
        self.complete_event = threading.Event()
        self.file = open(f'result_{self.response_counter}_24k.pcm', 'wb')

    def reset_event(self):
        self.response_counter += 1
        self.file = open(f'result_{self.response_counter}_24k.pcm', 'wb')
        self.complete_event = threading.Event()

    def on_open(self) -> None:
        print('connection opened, init player')

    def on_close(self, close_status_code, close_msg) -> None:
        print('connection closed with code: {}, msg: {}, destroy player'.format(close_status_code, close_msg))

    def on_event(self, response: str) -> None:
        try:
            global qwen_tts_realtime
            type = response['type']
            if 'session.created' == type:
                print('start session: {}'.format(response['session']['id']))
            if 'response.audio.delta' == type:
                recv_audio_b64 = response['delta']
                self.file.write(base64.b64decode(recv_audio_b64))
            if 'response.done' == type:
                print(f'response {qwen_tts_realtime.get_last_response_id()} done')
                self.complete_event.set()
                self.file.close()
            if 'session.finished' == type:
                print('session finished')
                self.complete_event.set()
        except Exception as e:
            print('[Error] {}'.format(e))
            return

    def wait_for_response_done(self):
        self.complete_event.wait()


if __name__  == '__main__':
    init_dashscope_api_key()

    print('Initializing ...')

    callback = MyCallback()

    qwen_tts_realtime = QwenTtsRealtime(
        model='qwen3-tts-flash-realtime',
        callback=callback, 
        # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
        url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
        )

    qwen_tts_realtime.connect()
    qwen_tts_realtime.update_session(
        voice = 'Cherry',
        response_format = AudioFormat.PCM_24000HZ_MONO_16BIT,
        mode = 'commit'        
    )
    print(f'send text: {text_to_synthesize[0]}')
    qwen_tts_realtime.append_text(text_to_synthesize[0])
    qwen_tts_realtime.commit()
    callback.wait_for_response_done()
    callback.reset_event()
    
    print(f'send text: {text_to_synthesize[1]}')
    qwen_tts_realtime.append_text(text_to_synthesize[1])
    qwen_tts_realtime.commit()
    callback.wait_for_response_done()
    callback.reset_event()

    print(f'send text: {text_to_synthesize[2]}')
    qwen_tts_realtime.append_text(text_to_synthesize[2])
    qwen_tts_realtime.commit()
    callback.wait_for_response_done()
    
    qwen_tts_realtime.finish()
    print('[Metric] session: {}, first audio delay: {}'.format(
                    qwen_tts_realtime.get_session_id(), 
                    qwen_tts_realtime.get_first_audio_delay(),
                    ))

Java

Server commit mode

// The Dashscope SDK version must be 2.21.16 or later.
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.AudioSystem;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class Main {
    static String[] textToSynthesize = {
            "Right? I especially love this kind of supermarket.",
            "Especially during the New Year.",
            "Going to the supermarket.",
            "It just makes me feel.",
            "Super, super happy!",
            "I want to buy so many things!"
    };

    // Real-time PCM audio player class
    public static class RealtimePcmPlayer {
        private int sampleRate;
        private SourceDataLine line;
        private AudioFormat audioFormat;
        private Thread decoderThread;
        private Thread playerThread;
        private AtomicBoolean stopped = new AtomicBoolean(false);
        private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
        private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();

        // The constructor initializes the audio format and audio line.
        public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
            this.sampleRate = sampleRate;
            this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
            DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
            line = (SourceDataLine) AudioSystem.getLine(info);
            line.open(audioFormat);
            line.start();
            decoderThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        String b64Audio = b64AudioBuffer.poll();
                        if (b64Audio != null) {
                            byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
                            RawAudioBuffer.add(rawAudio);
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            playerThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        byte[] rawAudio = RawAudioBuffer.poll();
                        if (rawAudio != null) {
                            try {
                                playChunk(rawAudio);
                            } catch (IOException e) {
                                throw new RuntimeException(e);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            decoderThread.start();
            playerThread.start();
        }

        // Play an audio chunk and block until playback is complete.
        private void playChunk(byte[] chunk) throws IOException, InterruptedException {
            if (chunk == null || chunk.length == 0) return;

            int bytesWritten = 0;
            while (bytesWritten < chunk.length) {
                bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
            }
            int audioLength = chunk.length / (this.sampleRate*2/1000);
            // Wait for the audio in the buffer to finish playing.
            Thread.sleep(audioLength - 10);
        }

        public void write(String b64Audio) {
            b64AudioBuffer.add(b64Audio);
        }

        public void cancel() {
            b64AudioBuffer.clear();
            RawAudioBuffer.clear();
        }

        public void waitForComplete() throws InterruptedException {
            while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
                Thread.sleep(100);
            }
            line.drain();
        }

        public void shutdown() throws InterruptedException {
            stopped.set(true);
            decoderThread.join();
            playerThread.join();
            if (line != null && line.isRunning()) {
                line.drain();
                line.close();
            }
        }
    }

    public static void main(String[] args) throws InterruptedException, LineUnavailableException, FileNotFoundException {
        QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
                .model("qwen3-tts-flash-realtime")
                // The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with wss://dashscope.aliyuncs.com/api-ws/v1/realtime.
                .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                // The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
                .apikey(System.getenv("DASHSCOPE_API_KEY"))
                .build();
        AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
        final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);

        // Create a real-time audio player instance.
        RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);

        QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
            @Override
            public void onOpen() {
                // Handle the event when the connection is established.
            }
            @Override
            public void onEvent(JsonObject message) {
                String type = message.get("type").getAsString();
                switch(type) {
                    case "session.created":
                        // Handle the event when the session is created.
                        break;
                    case "response.audio.delta":
                        String recvAudioB64 = message.get("delta").getAsString();
                        // Play the audio in real time.
                        audioPlayer.write(recvAudioB64);
                        break;
                    case "response.done":
                        // Handle the event when the response is complete.
                        break;
                    case "session.finished":
                        // Handle the event when the session is finished.
                        completeLatch.get().countDown();
                    default:
                        break;
                }
            }
            @Override
            public void onClose(int code, String reason) {
                // Handle the event when the connection is closed.
            }
        });
        qwenTtsRef.set(qwenTtsRealtime);
        try {
            qwenTtsRealtime.connect();
        } catch (NoApiKeyException e) {
            throw new RuntimeException(e);
        }
        QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
                .voice("Cherry")
                .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
                .mode("server_commit")
                .build();
        qwenTtsRealtime.updateSession(config);
        for (String text:textToSynthesize) {
            qwenTtsRealtime.appendText(text);
            Thread.sleep(100);
        }
        qwenTtsRealtime.finish();
        completeLatch.get().await();
        qwenTtsRealtime.close();

        // Wait for the audio to finish playing and then shut down the player.
        audioPlayer.waitForComplete();
        audioPlayer.shutdown();
        System.exit(0);
    }
}

Commit mode

// The Dashscope SDK version must be 2.21.16 or later.
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.AudioSystem;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Base64;
import java.util.Queue;
import java.util.Scanner;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class commit {
    // Real-time PCM audio player class
    public static class RealtimePcmPlayer {
        private int sampleRate;
        private SourceDataLine line;
        private AudioFormat audioFormat;
        private Thread decoderThread;
        private Thread playerThread;
        private AtomicBoolean stopped = new AtomicBoolean(false);
        private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
        private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();

        // The constructor initializes the audio format and audio line.
        public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
            this.sampleRate = sampleRate;
            this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
            DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
            line = (SourceDataLine) AudioSystem.getLine(info);
            line.open(audioFormat);
            line.start();
            decoderThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        String b64Audio = b64AudioBuffer.poll();
                        if (b64Audio != null) {
                            byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
                            RawAudioBuffer.add(rawAudio);
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            playerThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        byte[] rawAudio = RawAudioBuffer.poll();
                        if (rawAudio != null) {
                            try {
                                playChunk(rawAudio);
                            } catch (IOException e) {
                                throw new RuntimeException(e);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            decoderThread.start();
            playerThread.start();
        }

        // Play an audio chunk and block until playback is complete.
        private void playChunk(byte[] chunk) throws IOException, InterruptedException {
            if (chunk == null || chunk.length == 0) return;

            int bytesWritten = 0;
            while (bytesWritten < chunk.length) {
                bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
            }
            int audioLength = chunk.length / (this.sampleRate*2/1000);
            // Wait for the audio in the buffer to finish playing.
            Thread.sleep(audioLength - 10);
        }

        public void write(String b64Audio) {
            b64AudioBuffer.add(b64Audio);
        }

        public void cancel() {
            b64AudioBuffer.clear();
            RawAudioBuffer.clear();
        }

        public void waitForComplete() throws InterruptedException {
            // Wait for all audio data in the buffers to finish playing.
            while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
                Thread.sleep(100);
            }
            // Wait for the audio line to finish playing.
            line.drain();
        }

        public void shutdown() throws InterruptedException {
            stopped.set(true);
            decoderThread.join();
            playerThread.join();
            if (line != null && line.isRunning()) {
                line.drain();
                line.close();
            }
        }
    }

    public static void main(String[] args) throws InterruptedException, LineUnavailableException, FileNotFoundException {
        Scanner scanner = new Scanner(System.in);

        QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
                .model("qwen3-tts-flash-realtime")
                // The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with wss://dashscope.aliyuncs.com/api-ws/v1/realtime.
                .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                // The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
                .apikey(System.getenv("DASHSCOPE_API_KEY"))
                .build();

        AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));

        // Create a real-time player instance.
        RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);

        final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
        QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
//            File file = new File("result_24k.pcm");
//            FileOutputStream fos = new FileOutputStream(file);
            @Override
            public void onOpen() {
                System.out.println("connection opened");
                System.out.println("Enter text and press Enter to send. Enter 'quit' to exit the program.");
            }
            @Override
            public void onEvent(JsonObject message) {
                String type = message.get("type").getAsString();
                switch(type) {
                    case "session.created":
                        System.out.println("start session: " + message.get("session").getAsJsonObject().get("id").getAsString());
                        break;
                    case "response.audio.delta":
                        String recvAudioB64 = message.get("delta").getAsString();
                        byte[] rawAudio = Base64.getDecoder().decode(recvAudioB64);
                        //                            fos.write(rawAudio);
                        // Play the audio in real time.
                        audioPlayer.write(recvAudioB64);
                        break;
                    case "response.done":
                        System.out.println("response done");
                        // Wait for the audio to finish playing.
                        try {
                            audioPlayer.waitForComplete();
                        } catch (InterruptedException e) {
                            throw new RuntimeException(e);
                        }
                        // Prepare for the next input.
                        completeLatch.get().countDown();
                        break;
                    case "session.finished":
                        System.out.println("session finished");
                        if (qwenTtsRef.get() != null) {
                            System.out.println("[Metric] response: " + qwenTtsRef.get().getResponseId() +
                                    ", first audio delay: " + qwenTtsRef.get().getFirstAudioDelay() + " ms");
                        }
                        completeLatch.get().countDown();
                    default:
                        break;
                }
            }
            @Override
            public void onClose(int code, String reason) {
                System.out.println("connection closed code: " + code + ", reason: " + reason);
                try {
//                    fos.close();
                    // Wait for playback to complete and then shut down the player.
                    audioPlayer.waitForComplete();
                    audioPlayer.shutdown();
                } catch (InterruptedException e) {
                    throw new RuntimeException(e);
                }
            }
        });
        qwenTtsRef.set(qwenTtsRealtime);
        try {
            qwenTtsRealtime.connect();
        } catch (NoApiKeyException e) {
            throw new RuntimeException(e);
        }
        QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
                .voice("Cherry")
                .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
                .mode("commit")
                .build();
        qwenTtsRealtime.updateSession(config);

        // Loop to read user input.
        while (true) {
            System.out.print("Enter the text to synthesize: ");
            String text = scanner.nextLine();

            // If the user enters 'quit', exit the program.
            if ("quit".equalsIgnoreCase(text.trim())) {
                System.out.println("Closing the connection...");
                qwenTtsRealtime.finish();
                completeLatch.get().await();
                break;
            }

            // If the user input is empty, skip.
            if (text.trim().isEmpty()) {
                continue;
            }

            // Reinitialize the countdown latch.
            completeLatch.set(new CountDownLatch(1));

            // Send the text.
            qwenTtsRealtime.appendText(text);
            qwenTtsRealtime.commit();

            // Wait for the current synthesis to complete.
            completeLatch.get().await();
        }

        // Clean up resources.
        audioPlayer.waitForComplete();
        audioPlayer.shutdown();
        scanner.close();
        System.exit(0);
    }
}

Use the WebSocket API

  1. Prepare the runtime environment

    Install pyaudio for your operating system.

    macOS

    brew install portaudio && pip install pyaudio

    Debian/Ubuntu

    sudo apt-get install python3-pyaudio
    
    or
    
    pip install pyaudio

    CentOS

    sudo yum install -y portaudio portaudio-devel && pip install pyaudio

    Windows

    pip install pyaudio

    After the installation, install the WebSocket dependencies using pip:

    pip install websocket-client==1.8.0 websockets
  2. Create a client

    Create a local Python file named tts_realtime_client.py and copy the following code into the file:

    tts_realtime_client.py

    # -- coding: utf-8 --
    
    import asyncio
    import websockets
    import json
    import base64
    import time
    from typing import Optional, Callable, Dict, Any
    from enum import Enum
    
    
    class SessionMode(Enum):
        SERVER_COMMIT = "server_commit"
        COMMIT = "commit"
    
    
    class TTSRealtimeClient:
        """
        A client for interacting with the TTS Realtime API.
    
        This class provides methods to connect to the TTS Realtime API, send text data, receive audio output, and manage the WebSocket connection.
    
        Attributes:
            base_url (str):
                The base URL of the Realtime API.
            api_key (str):
                The API key for identity verification.
            voice (str):
                The voice used by the server for speech synthesis.
            mode (SessionMode):
                The session mode. Valid values: server_commit or commit.
            audio_callback (Callable[[bytes], None]):
                The callback function to receive audio data.
            language_type(str):
                The language of the synthesized speech. Valid values: Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian, and Auto.
        """
    
        def __init__(
                self,
                base_url: str,
                api_key: str,
                voice: str = "Cherry",
                mode: SessionMode = SessionMode.SERVER_COMMIT,
                audio_callback: Optional[Callable[[bytes], None]] = None,
            language_type: str = "Auto"):
            self.base_url = base_url
            self.api_key = api_key
            self.voice = voice
            self.mode = mode
            self.ws = None
            self.audio_callback = audio_callback
            self.language_type = language_type
    
            # Current response status
            self._current_response_id = None
            self._current_item_id = None
            self._is_responding = False
            self._response_done_future = None
    
    
        async def connect(self) -> None:
            """Establish a WebSocket connection with the TTS Realtime API."""
            headers = {
                "Authorization": f"Bearer {self.api_key}"
            }
    
            self.ws = await websockets.connect(self.base_url, additional_headers=headers)
    
            # Set the default session configuration.
            await self.update_session({
                "mode": self.mode.value,
                "voice": self.voice,
                "language_type": self.language_type,
                "response_format": "pcm",
                "sample_rate": 24000
            })
    
    
        async def send_event(self, event) -> None:
            """Send an event to the server."""
            event['event_id'] = "event_" + str(int(time.time() * 1000))
            print(f"Send event: type={event['type']}, event_id={event['event_id']}")
            await self.ws.send(json.dumps(event))
    
    
        async def update_session(self, config: Dict[str, Any]) -> None:
            """Update the session configuration."""
            event = {
                "type": "session.update",
                "session": config
            }
            print("Update session configuration: ", event)
            await self.send_event(event)
    
    
        async def append_text(self, text: str) -> None:
            """Send text data to the API."""
            event = {
                "type": "input_text_buffer.append",
                "text": text
            }
            await self.send_event(event)
    
    
        async def commit_text_buffer(self) -> None:
            """Commit the text buffer to trigger processing."""
            event = {
                "type": "input_text_buffer.commit"
            }
            await self.send_event(event)
    
    
        async def clear_text_buffer(self) -> None:
            """Clear the text buffer."""
            event = {
                "type": "input_text_buffer.clear"
            }
            await self.send_event(event)
    
    
        async def finish_session(self) -> None:
            """End the session."""
            event = {
                "type": "session.finish"
            }
            await self.send_event(event)
    
    
        async def wait_for_response_done(self):
            """Wait for the response.done event."""
            if self._response_done_future:
                await self._response_done_future
    
    
        async def handle_messages(self) -> None:
            """Handle messages from the server."""
            try:
                async for message in self.ws:
                    event = json.loads(message)
                    event_type = event.get("type")
    
                    if event_type != "response.audio.delta":
                        print(f"Received event: {event_type}")
    
                    if event_type == "error":
                        print("Error: ", event.get('error', {}))
                        continue
                    elif event_type == "session.created":
                        print("Session created, ID: ", event.get('session', {}).get('id'))
                    elif event_type == "session.updated":
                        print("Session updated, ID: ", event.get('session', {}).get('id'))
                    elif event_type == "input_text_buffer.committed":
                        print("Text buffer committed, item ID: ", event.get('item_id'))
                    elif event_type == "input_text_buffer.cleared":
                        print("Text buffer cleared.")
                    elif event_type == "response.created":
                        self._current_response_id = event.get("response", {}).get("id")
                        self._is_responding = True
                        # Create a new future to wait for response.done.
                        self._response_done_future = asyncio.Future()
                        print("Response created, ID: ", self._current_response_id)
                    elif event_type == "response.output_item.added":
                        self._current_item_id = event.get("item", {}).get("id")
                        print("Output item added, ID: ", self._current_item_id)
                    # Process the audio delta.
                    elif event_type == "response.audio.delta" and self.audio_callback:
                        audio_bytes = base64.b64decode(event.get("delta", ""))
                        self.audio_callback(audio_bytes)
                    elif event_type == "response.audio.done":
                        print("Audio generation complete.")
                    elif event_type == "response.done":
                        self._is_responding = False
                        self._current_response_id = None
                        self._current_item_id = None
                        # Mark the future as complete.
                        if self._response_done_future and not self._response_done_future.done():
                            self._response_done_future.set_result(True)
                        print("Response complete.")
                    elif event_type == "session.finished":
                        print("Session finished.")
    
            except websockets.exceptions.ConnectionClosed:
                print("Connection closed.")
            except Exception as e:
                print("Error processing message: ", str(e))
    
    
        async def close(self) -> None:
            """Close the WebSocket connection."""
            if self.ws:
                await self.ws.close()
  3. Select a speech synthesis mode

    The Realtime API supports the following two modes:

    • server_commit mode

      The client only sends text. The server intelligently determines how to segment the text and when to synthesize it. This mode is suitable for low-latency scenarios where you do not need to manually control the synthesis timing, such as GPS navigation.

    • commit mode

      The client first adds text to a buffer and then actively triggers the server to synthesize the specified text. This mode is suitable for scenarios that require fine-grained control over sentence breaks and pauses, such as news broadcasting.

    server_commit mode

    In the same directory as tts_realtime_client.py, create another Python file named server_commit.py, and copy the following code into the file:

    server_commit.py

    import os
    import asyncio
    import logging
    import wave
    from tts_realtime_client import TTSRealtimeClient, SessionMode
    import pyaudio
    
    # QwenTTS service configuration
    # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime?model=qwen3-tts-flash-realtime
    URL = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime?model=qwen3-tts-flash-realtime"
    # API keys are different for the Singapore and Beijing regions. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key/.
    # If you have not configured the environment variable, replace the following line with your Model Studio API key: API_KEY="sk-xxx"
    API_KEY = os.getenv("DASHSCOPE_API_KEY")
    
    if not API_KEY:
        raise ValueError("Please set the DASHSCOPE_API_KEY environment variable")
    
    # Collect audio data.
    _audio_chunks = []
    # Real-time playback related.
    _AUDIO_SAMPLE_RATE = 24000
    _audio_pyaudio = pyaudio.PyAudio()
    _audio_stream = None  # Will be opened at runtime.
    
    def _audio_callback(audio_bytes: bytes):
        """TTSRealtimeClient audio callback: Play back in real time and cache."""
        global _audio_stream
        if _audio_stream is not None:
            try:
                _audio_stream.write(audio_bytes)
            except Exception as exc:
                logging.error(f"PyAudio playback error: {exc}")
        _audio_chunks.append(audio_bytes)
        logging.info(f"Received audio chunk: {len(audio_bytes)} bytes")
    
    def _save_audio_to_file(filename: str = "output.wav", sample_rate: int = 24000) -> bool:
        """Save the collected audio data to a WAV file."""
        if not _audio_chunks:
            logging.warning("No audio data to save")
            return False
    
        try:
            audio_data = b"".join(_audio_chunks)
            with wave.open(filename, 'wb') as wav_file:
                wav_file.setnchannels(1)  # Mono
                wav_file.setsampwidth(2)  # 16-bit
                wav_file.setframerate(sample_rate)
                wav_file.writeframes(audio_data)
            logging.info(f"Audio saved to: {filename}")
            return True
        except Exception as exc:
            logging.error(f"Failed to save audio: {exc}")
            return False
    
    async def _produce_text(client: TTSRealtimeClient):
        """Send text fragments to the server."""
        text_fragments = [
            "Alibaba Cloud Model Studio is a one-stop platform for large model development and application building.",
            "Both developers and business personnel can be deeply involved in designing and building large model applications.", 
            "You can develop a large model application in 5 minutes using simple interface operations,",
            "or train a custom model in a few hours, allowing you to focus more on application innovation.",
        ]
    
        logging.info("Sending text fragments…")
        for text in text_fragments:
            logging.info(f"Sending fragment: {text}")
            await client.append_text(text)
            await asyncio.sleep(0.1)  # Add a short delay between fragments.
    
        # Wait for the server to complete internal processing before ending the session.
        await asyncio.sleep(1.0)
        await client.finish_session()
    
    async def _run_demo():
        """Run the full demo."""
        global _audio_stream
        # Open the PyAudio output stream.
        _audio_stream = _audio_pyaudio.open(
            format=pyaudio.paInt16,
            channels=1,
            rate=_AUDIO_SAMPLE_RATE,
            output=True,
            frames_per_buffer=1024
        )
    
        client = TTSRealtimeClient(
            base_url=URL,
            api_key=API_KEY,
            voice="Cherry",
            mode=SessionMode.SERVER_COMMIT,
            audio_callback=_audio_callback
        )
    
        # Establish a connection.
        await client.connect()
    
        # Execute message handling and text sending in parallel.
        consumer_task = asyncio.create_task(client.handle_messages())
        producer_task = asyncio.create_task(_produce_text(client))
    
        await producer_task  # Wait for the text to be sent.
    
        # Wait for response.done.
        await client.wait_for_response_done()
    
        # Close the connection and cancel the consumer task.
        await client.close()
        consumer_task.cancel()
    
        # Close the audio stream.
        if _audio_stream is not None:
            _audio_stream.stop_stream()
            _audio_stream.close()
        _audio_pyaudio.terminate()
    
        # Save the audio data.
        os.makedirs("outputs", exist_ok=True)
        _save_audio_to_file(os.path.join("outputs", "qwen_tts_output.wav"))
    
    def main():
        """Synchronous entry point."""
        logging.basicConfig(
            level=logging.INFO,
            format='%(asctime)s [%(levelname)s] %(message)s',
            datefmt='%Y-%m-%d %H:%M:%S'
        )
        logging.info("Starting QwenTTS Realtime Client demo…")
        asyncio.run(_run_demo())
    
    if __name__ == "__main__":
        main() 

    Run server_commit.py to hear the audio generated in real time by the Realtime API.

    commit mode

    In the same directory as tts_realtime_client.py, create another Python file named commit.py, and copy the following code into the file:

    commit.py

    import os
    import asyncio
    import logging
    import wave
    from tts_realtime_client import TTSRealtimeClient, SessionMode
    import pyaudio
    
    # QwenTTS service configuration
    # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime?model=qwen3-tts-flash-realtime
    URL = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime?model=qwen3-tts-flash-realtime"
    # API keys are different for the Singapore and Beijing regions. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key/.
    # If you have not configured the environment variable, replace the following line with your Model Studio API key: API_KEY="sk-xxx"
    API_KEY = os.getenv("DASHSCOPE_API_KEY")
    
    if not API_KEY:
        raise ValueError("Please set the DASHSCOPE_API_KEY environment variable")
    
    # Collect audio data.
    _audio_chunks = []
    _AUDIO_SAMPLE_RATE = 24000
    _audio_pyaudio = pyaudio.PyAudio()
    _audio_stream = None
    
    def _audio_callback(audio_bytes: bytes):
        """TTSRealtimeClient audio callback: Play back in real time and cache."""
        global _audio_stream
        if _audio_stream is not None:
            try:
                _audio_stream.write(audio_bytes)
            except Exception as exc:
                logging.error(f"PyAudio playback error: {exc}")
        _audio_chunks.append(audio_bytes)
        logging.info(f"Received audio chunk: {len(audio_bytes)} bytes")
    
    def _save_audio_to_file(filename: str = "output.wav", sample_rate: int = 24000) -> bool:
        """Save the collected audio data to a WAV file."""
        if not _audio_chunks:
            logging.warning("No audio data to save")
            return False
    
        try:
            audio_data = b"".join(_audio_chunks)
            with wave.open(filename, 'wb') as wav_file:
                wav_file.setnchannels(1)  # Mono
                wav_file.setsampwidth(2)  # 16-bit
                wav_file.setframerate(sample_rate)
                wav_file.writeframes(audio_data)
            logging.info(f"Audio saved to: {filename}")
            return True
        except Exception as exc:
            logging.error(f"Failed to save audio: {exc}")
            return False
    
    async def _user_input_loop(client: TTSRealtimeClient):
        """Continuously get user input and send text. When the user enters empty text, send a commit event and end the current session."""
        print("Enter text. Press Enter to send a commit event and end the current session. Press Ctrl+C or Ctrl+D to end the program.")
        
        while True:
            try:
                user_text = input("> ")
                if not user_text:  # The user input is empty.
                    # Empty input is treated as the end of a conversation: commit buffer -> end session -> break loop.
                    logging.info("Empty input. Sending commit event and ending the current session.")
                    await client.commit_text_buffer()
                    # Wait for the server to process the commit to prevent losing audio due to premature session termination.
                    await asyncio.sleep(0.3)
                    await client.finish_session()
                    break  # Exit the user input loop directly. You do not need to press Enter again.
                else:
                    logging.info(f"Sending text: {user_text}")
                    await client.append_text(user_text)
                    
            except EOFError:  # The user presses Ctrl+D.
                break
            except KeyboardInterrupt:  # The user presses Ctrl+C.
                break
        
        # End the session.
        logging.info("Ending the session...")
    async def _run_demo():
        """Run the full demo."""
        global _audio_stream
        # Open the PyAudio output stream.
        _audio_stream = _audio_pyaudio.open(
            format=pyaudio.paInt16,
            channels=1,
            rate=_AUDIO_SAMPLE_RATE,
            output=True,
            frames_per_buffer=1024
        )
    
        client = TTSRealtimeClient(
            base_url=URL,
            api_key=API_KEY,
            voice="Cherry",
            mode=SessionMode.COMMIT,  # Change to COMMIT mode.
            audio_callback=_audio_callback
        )
    
        # Establish a connection.
        await client.connect()
    
        # Execute message handling and user input in parallel.
        consumer_task = asyncio.create_task(client.handle_messages())
        producer_task = asyncio.create_task(_user_input_loop(client))
    
        await producer_task  # Wait for user input to complete.
    
        # Wait for response.done.
        await client.wait_for_response_done()
    
        # Close the connection and cancel the consumer task.
        await client.close()
        consumer_task.cancel()
    
        # Close the audio stream.
        if _audio_stream is not None:
            _audio_stream.stop_stream()
            _audio_stream.close()
        _audio_pyaudio.terminate()
    
        # Save the audio data.
        os.makedirs("outputs", exist_ok=True)
        _save_audio_to_file(os.path.join("outputs", "qwen_tts_output.wav"))
    
    def main():
        logging.basicConfig(
            level=logging.INFO,
            format='%(asctime)s [%(levelname)s] %(message)s',
            datefmt='%Y-%m-%d %H:%M:%S'
        )
        logging.info("Starting QwenTTS Realtime Client demo…")
        asyncio.run(_run_demo())
    
    if __name__ == "__main__":
        main() 

    Run commit.py. You can enter text for synthesis multiple times. To hear the audio returned by the Realtime API, press Enter on an empty line.

Synthesize speech using a cloned voice

The voice cloning service does not provide an audio preview. To listen to and evaluate a cloned voice, you must apply it to speech synthesis.

The following example demonstrates how to use a custom voice generated by voice cloning for speech synthesis, producing an output that is highly similar to the original voice. This example is based on the sample code for the server_commit mode of the DashScope SDK and replaces the voice parameter with the custom cloned voice.

  • Key principle: The model used for voice cloning (target_model) must be the same as the model used for the subsequent speech synthesis (model). Otherwise, the synthesis fails.

  • This example uses the local audio file voice.mp3 for voice cloning. You must replace this file with your own audio file when you run the code.

Python

# coding=utf-8
# Installation instructions for pyaudio:
# APPLE Mac OS X
#   brew install portaudio
#   pip install pyaudio
# Debian/Ubuntu
#   sudo apt-get install python-pyaudio python3-pyaudio
#   or
#   pip install pyaudio
# CentOS
#   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
#   python -m pip install pyaudio

import pyaudio
import os
import requests
import base64
import pathlib
import threading
import time
import dashscope  # The DashScope Python SDK version must be 1.23.9 or later.
from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat

# ======= Constant configuration =======
DEFAULT_TARGET_MODEL = "qwen3-tts-vc-realtime-2026-01-15"  # The same model must be used for voice cloning and speech synthesis.
DEFAULT_PREFERRED_NAME = "guanyu"
DEFAULT_AUDIO_MIME_TYPE = "audio/mpeg"
VOICE_FILE_PATH = "voice.mp3"  # The relative path of the local audio file for voice cloning.

TEXT_TO_SYNTHESIZE = [
    'Right? I really like this kind of supermarket,',
    'especially during the New Year.',
    'Going to the supermarket',
    'just makes me feel',
    'super, super happy!',
    'I want to buy so many things!'
]

def create_voice(file_path: str,
                 target_model: str = DEFAULT_TARGET_MODEL,
                 preferred_name: str = DEFAULT_PREFERRED_NAME,
                 audio_mime_type: str = DEFAULT_AUDIO_MIME_TYPE) -> str:
    """
    Create a voice and return the voice parameter.
    """
    # API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key/.
    # If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")

    file_path_obj = pathlib.Path(file_path)
    if not file_path_obj.exists():
        raise FileNotFoundError(f"Audio file not found: {file_path}")

    base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
    data_uri = f"data:{audio_mime_type};base64,{base64_str}"

    # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    payload = {
        "model": "qwen-voice-enrollment", # Do not modify this value.
        "input": {
            "action": "create",
            "target_model": target_model,
            "preferred_name": preferred_name,
            "audio": {"data": data_uri}
        }
    }
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }

    resp = requests.post(url, json=payload, headers=headers)
    if resp.status_code != 200:
        raise RuntimeError(f"Failed to create voice: {resp.status_code}, {resp.text}")

    try:
        return resp.json()["output"]["voice"]
    except (KeyError, ValueError) as e:
        raise RuntimeError(f"Failed to parse voice response: {e}")

def init_dashscope_api_key():
    """
    Initialize the API key for the DashScope SDK.
    """
    # API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key/.
    # If you have not configured an environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
    dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

# ======= Callback class =======
class MyCallback(QwenTtsRealtimeCallback):
    """
    Custom TTS streaming callback.
    """
    def __init__(self):
        self.complete_event = threading.Event()
        self._player = pyaudio.PyAudio()
        self._stream = self._player.open(
            format=pyaudio.paInt16, channels=1, rate=24000, output=True
        )

    def on_open(self) -> None:
        print('[TTS] Connection established.')

    def on_close(self, close_status_code, close_msg) -> None:
        self._stream.stop_stream()
        self._stream.close()
        self._player.terminate()
        print(f'[TTS] Connection closed code={close_status_code}, msg={close_msg}')

    def on_event(self, response: dict) -> None:
        try:
            event_type = response.get('type', '')
            if event_type == 'session.created':
                print(f'[TTS] Session started: {response["session"]["id"]}')
            elif event_type == 'response.audio.delta':
                audio_data = base64.b64decode(response['delta'])
                self._stream.write(audio_data)
            elif event_type == 'response.done':
                print(f'[TTS] Response complete, Response ID: {qwen_tts_realtime.get_last_response_id()}')
            elif event_type == 'session.finished':
                print('[TTS] Session finished.')
                self.complete_event.set()
        except Exception as e:
            print(f'[Error] Failed to process callback event: {e}')

    def wait_for_finished(self):
        self.complete_event.wait()

# ======= Main execution logic =======
if __name__ == '__main__':
    init_dashscope_api_key()
    print('[System] Initializing Qwen TTS Realtime ...')

    callback = MyCallback()
    qwen_tts_realtime = QwenTtsRealtime(
        model=DEFAULT_TARGET_MODEL,
        callback=callback,
        # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
        url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
    )
    qwen_tts_realtime.connect()
    
    qwen_tts_realtime.update_session(
        voice=create_voice(VOICE_FILE_PATH), # Replace the voice parameter with the custom voice generated by cloning.
        response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
        mode='server_commit'
    )

    for text_chunk in TEXT_TO_SYNTHESIZE:
        print(f'[Send text]: {text_chunk}')
        qwen_tts_realtime.append_text(text_chunk)
        time.sleep(0.1)

    qwen_tts_realtime.finish()
    callback.wait_for_finished()

    print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, '
          f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')

Java

Import the Gson dependency. If you use Maven or Gradle, add the dependency as follows:

Maven

Add the following content to the pom.xml file:

<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.13.1</version>
</dependency>

Gradle

Add the following content to the build.gradle file:

// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.Gson;
import com.google.gson.JsonObject;

import javax.sound.sampled.*;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class Main {
    // ===== Constant definitions =====
    // The same model must be used for voice cloning and speech synthesis.
    private static final String TARGET_MODEL = "qwen3-tts-vc-realtime-2026-01-15";
    private static final String PREFERRED_NAME = "guanyu";
    // The relative path of the local audio file for voice cloning.
    private static final String AUDIO_FILE = "voice.mp3";
    private static final String AUDIO_MIME_TYPE = "audio/mpeg";
    private static String[] textToSynthesize = {
            "Right? I really like this kind of supermarket,",
            "especially during the New Year.",
            "Going to the supermarket",
            "just makes me feel",
            "super, super happy!",
            "I want to buy so many things!"
    };

    // Generate a data URI.
    public static String toDataUrl(String filePath) throws IOException {
        byte[] bytes = Files.readAllBytes(Paths.get(filePath));
        String encoded = Base64.getEncoder().encodeToString(bytes);
        return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
    }

    // Call the API to create a voice.
    public static String createVoice() throws Exception {
        // API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key/.
        // If you have not configured an environment variable, replace the following line with your Model Studio API key: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        String jsonPayload =
                "{"
                        + "\"model\": \"qwen-voice-enrollment\"," // Do not modify this value.
                        + "\"input\": {"
                        +     "\"action\": \"create\","
                        +     "\"target_model\": \"" + TARGET_MODEL + "\","
                        +     "\"preferred_name\": \"" + PREFERRED_NAME + "\","
                        +     "\"audio\": {"
                        +         "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
                        +     "}"
                        + "}"
                        + "}";

        HttpURLConnection con = (HttpURLConnection) new URL("https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization").openConnection();
        con.setRequestMethod("POST");
        con.setRequestProperty("Authorization", "Bearer " + apiKey);
        con.setRequestProperty("Content-Type", "application/json");
        con.setDoOutput(true);

        try (OutputStream os = con.getOutputStream()) {
            os.write(jsonPayload.getBytes(StandardCharsets.UTF_8));
        }

        int status = con.getResponseCode();
        System.out.println("HTTP status code: " + status);

        try (BufferedReader br = new BufferedReader(
                new InputStreamReader(status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(),
                        StandardCharsets.UTF_8))) {
            StringBuilder response = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                response.append(line);
            }
            System.out.println("Response content: " + response);

            if (status == 200) {
                JsonObject jsonObj = new Gson().fromJson(response.toString(), JsonObject.class);
                return jsonObj.getAsJsonObject("output").get("voice").getAsString();
            }
            throw new IOException("Failed to create voice: " + status + " - " + response);
        }
    }

    // Real-time PCM player class
    public static class RealtimePcmPlayer {
        private int sampleRate;
        private SourceDataLine line;
        private AudioFormat audioFormat;
        private Thread decoderThread;
        private Thread playerThread;
        private AtomicBoolean stopped = new AtomicBoolean(false);
        private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
        private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();

        // The constructor initializes the audio format and audio line.
        public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
            this.sampleRate = sampleRate;
            this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
            DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
            line = (SourceDataLine) AudioSystem.getLine(info);
            line.open(audioFormat);
            line.start();
            decoderThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        String b64Audio = b64AudioBuffer.poll();
                        if (b64Audio != null) {
                            byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
                            RawAudioBuffer.add(rawAudio);
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            playerThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        byte[] rawAudio = RawAudioBuffer.poll();
                        if (rawAudio != null) {
                            try {
                                playChunk(rawAudio);
                            } catch (IOException e) {
                                throw new RuntimeException(e);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            decoderThread.start();
            playerThread.start();
        }

        // Play an audio chunk and block until playback is complete.
        private void playChunk(byte[] chunk) throws IOException, InterruptedException {
            if (chunk == null || chunk.length == 0) return;

            int bytesWritten = 0;
            while (bytesWritten < chunk.length) {
                bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
            }
            int audioLength = chunk.length / (this.sampleRate*2/1000);
            // Wait for the audio in the buffer to finish playing.
            Thread.sleep(audioLength - 10);
        }

        public void write(String b64Audio) {
            b64AudioBuffer.add(b64Audio);
        }

        public void cancel() {
            b64AudioBuffer.clear();
            RawAudioBuffer.clear();
        }

        public void waitForComplete() throws InterruptedException {
            while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
                Thread.sleep(100);
            }
            line.drain();
        }

        public void shutdown() throws InterruptedException {
            stopped.set(true);
            decoderThread.join();
            playerThread.join();
            if (line != null && line.isRunning()) {
                line.drain();
                line.close();
            }
        }
    }

    public static void main(String[] args) throws Exception {
        QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
                .model(TARGET_MODEL)
                // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
                .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                // API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key/.
                // If you have not configured an environment variable, replace the following line with your Model Studio API key: .apikey("sk-xxx")
                .apikey(System.getenv("DASHSCOPE_API_KEY"))
                .build();
        AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
        final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);

        // Create a real-time audio player instance.
        RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);

        QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
            @Override
            public void onOpen() {
                // Handle connection establishment.
            }
            @Override
            public void onEvent(JsonObject message) {
                String type = message.get("type").getAsString();
                switch(type) {
                    case "session.created":
                        // Handle session creation.
                        break;
                    case "response.audio.delta":
                        String recvAudioB64 = message.get("delta").getAsString();
                        // Play the audio in real time.
                        audioPlayer.write(recvAudioB64);
                        break;
                    case "response.done":
                        // Handle response completion.
                        break;
                    case "session.finished":
                        // Handle session termination.
                        completeLatch.get().countDown();
                    default:
                        break;
                }
            }
            @Override
            public void onClose(int code, String reason) {
                // Handle connection closure.
            }
        });
        qwenTtsRef.set(qwenTtsRealtime);
        try {
            qwenTtsRealtime.connect();
        } catch (NoApiKeyException e) {
            throw new RuntimeException(e);
        }
        QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
                .voice(createVoice()) // Replace the voice parameter with the custom voice generated by cloning.
                .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
                .mode("server_commit")
                .build();
        qwenTtsRealtime.updateSession(config);
        for (String text:textToSynthesize) {
            qwenTtsRealtime.appendText(text);
            Thread.sleep(100);
        }
        qwenTtsRealtime.finish();
        completeLatch.get().await();

        // Wait for the audio to finish playing and then shut down the player.
        audioPlayer.waitForComplete();
        audioPlayer.shutdown();
        System.exit(0);
    }
}

Synthesize speech using a designed voice

When you use the voice design feature, the service returns preview audio data. You can listen to the preview audio to ensure that it meets your needs before using it for speech synthesis. This practice helps reduce call costs.

  1. Generate a custom voice and listen to the preview. If satisfied, proceed; otherwise, regenerate.

    Python

    import requests
    import base64
    import os
    
    def create_voice_and_play():
        # API keys differ between the Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        # If you haven't set an environment variable, replace the following line with: api_key = "sk-xxx"
        api_key = os.getenv("DASHSCOPE_API_KEY")
        
        if not api_key:
            print("Error: DASHSCOPE_API_KEY environment variable not found. Please set your API key.")
            return None, None, None
        
        # Prepare request data
        headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }
        
        data = {
            "model": "qwen-voice-design",
            "input": {
                "action": "create",
                "target_model": "qwen3-tts-vd-realtime-2026-01-15",
                "voice_prompt": "A calm middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.",
                "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
                "preferred_name": "announcer",
                "language": "en"
            },
            "parameters": {
                "sample_rate": 24000,
                "response_format": "wav"
            }
        }
        
        # URL for the Singapore region. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
        
        try:
            # Send request
            response = requests.post(
                url,
                headers=headers,
                json=data,
                timeout=60  # Add timeout setting
            )
            
            if response.status_code == 200:
                result = response.json()
                
                # Get voice name
                voice_name = result["output"]["voice"]
                print(f"Voice name: {voice_name}")
                
                # Get preview audio data
                base64_audio = result["output"]["preview_audio"]["data"]
                
                # Decode Base64 audio data
                audio_bytes = base64.b64decode(base64_audio)
                
                # Save audio file locally
                filename = f"{voice_name}_preview.wav"
                
                # Write audio data to local file
                with open(filename, 'wb') as f:
                    f.write(audio_bytes)
                
                print(f"Audio saved to local file: {filename}")
                print(f"File path: {os.path.abspath(filename)}")
                
                return voice_name, audio_bytes, filename
            else:
                print(f"Request failed. Status code: {response.status_code}")
                print(f"Response content: {response.text}")
                return None, None, None
                
        except requests.exceptions.RequestException as e:
            print(f"Network request error: {e}")
            return None, None, None
        except KeyError as e:
            print(f"Response data format error: missing required field: {e}")
            print(f"Response content: {response.text if 'response' in locals() else 'No response'}")
            return None, None, None
        except Exception as e:
            print(f"Unexpected error: {e}")
            return None, None, None
    
    if __name__ == "__main__":
        print("Creating voice...")
        voice_name, audio_data, saved_filename = create_voice_and_play()
        
        if voice_name:
            print(f"\nSuccessfully created voice '{voice_name}'")
            print(f"Audio file saved: '{saved_filename}'")
            print(f"File size: {os.path.getsize(saved_filename)} bytes")
        else:
            print("\nVoice creation failed")

    Java

    Add the Gson dependency. If you use Maven or Gradle, add the dependency as follows:

    Maven

    Add the following to your pom.xml:

    <!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
    <dependency>
        <groupId>com.google.code.gson</groupId>
        <artifactId>gson</artifactId>
        <version>2.13.1</version>
    </dependency>

    Gradle

    Add the following to your build.gradle:

    // https://mvnrepository.com/artifact/com.google.code.gson/gson
    implementation("com.google.code.gson:gson:2.13.1")
    import com.google.gson.JsonObject;
    import com.google.gson.JsonParser;
    import java.io.*;
    import java.net.HttpURLConnection;
    import java.net.URL;
    import java.util.Base64;
    
    public class Main {
        public static void main(String[] args) {
            Main example = new Main();
            example.createVoice();
        }
    
        public void createVoice() {
            // API keys differ between the Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
            // If you haven't set an environment variable, replace the following line with: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
    
            // Create JSON request body string
            String jsonBody = "{\n" +
                    "    \"model\": \"qwen-voice-design\",\n" +
                    "    \"input\": {\n" +
                    "        \"action\": \"create\",\n" +
                    "        \"target_model\": \"qwen3-tts-vd-realtime-2026-01-15\",\n" +
                    "        \"voice_prompt\": \"A calm middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" +
                    "        \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" +
                    "        \"preferred_name\": \"announcer\",\n" +
                    "        \"language\": \"en\"\n" +
                    "    },\n" +
                    "    \"parameters\": {\n" +
                    "        \"sample_rate\": 24000,\n" +
                    "        \"response_format\": \"wav\"\n" +
                    "    }\n" +
                    "}";
    
            HttpURLConnection connection = null;
            try {
                // URL for the Singapore region. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
                URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
                connection = (HttpURLConnection) url.openConnection();
    
                // Set request method and headers
                connection.setRequestMethod("POST");
                connection.setRequestProperty("Authorization", "Bearer " + apiKey);
                connection.setRequestProperty("Content-Type", "application/json");
                connection.setDoOutput(true);
                connection.setDoInput(true);
    
                // Send request body
                try (OutputStream os = connection.getOutputStream()) {
                    byte[] input = jsonBody.getBytes("UTF-8");
                    os.write(input, 0, input.length);
                    os.flush();
                }
    
                // Get response
                int responseCode = connection.getResponseCode();
                if (responseCode == HttpURLConnection.HTTP_OK) {
                    // Read response content
                    StringBuilder response = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            response.append(responseLine.trim());
                        }
                    }
    
                    // Parse JSON response
                    JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
                    JsonObject outputObj = jsonResponse.getAsJsonObject("output");
                    JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio");
    
                    // Get voice name
                    String voiceName = outputObj.get("voice").getAsString();
                    System.out.println("Voice name: " + voiceName);
    
                    // Get Base64-encoded audio data
                    String base64Audio = previewAudioObj.get("data").getAsString();
    
                    // Decode Base64 audio data
                    byte[] audioBytes = Base64.getDecoder().decode(base64Audio);
    
                    // Save audio to local file
                    String filename = voiceName + "_preview.wav";
                    saveAudioToFile(audioBytes, filename);
    
                    System.out.println("Audio saved to local file: " + filename);
    
                } else {
                    // Read error response
                    StringBuilder errorResponse = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            errorResponse.append(responseLine.trim());
                        }
                    }
    
                    System.out.println("Request failed. Status code: " + responseCode);
                    System.out.println("Error response: " + errorResponse.toString());
                }
    
            } catch (Exception e) {
                System.err.println("Request error: " + e.getMessage());
                e.printStackTrace();
            } finally {
                if (connection != null) {
                    connection.disconnect();
                }
            }
        }
    
        private void saveAudioToFile(byte[] audioBytes, String filename) {
            try {
                File file = new File(filename);
                try (FileOutputStream fos = new FileOutputStream(file)) {
                    fos.write(audioBytes);
                }
                System.out.println("Audio saved to: " + file.getAbsolutePath());
            } catch (IOException e) {
                System.err.println("Error saving audio file: " + e.getMessage());
                e.printStackTrace();
            }
        }
    }
  2. Use the custom voice generated in the previous step for speech synthesis.

    This example references the DashScope SDK's "server commit mode" sample code for using system voices. Replace the voice parameter with the custom voice created through voice design.

    Key principle: The model used in voice design (target_model) must match the model used in the subsequent speech synthesis call (model). Otherwise, synthesis will fail.

    Python

    # coding=utf-8
    # Installation instructions for pyaudio:
    # APPLE Mac OS X
    #   brew install portaudio
    #   pip install pyaudio
    # Debian/Ubuntu
    #   sudo apt-get install python-pyaudio python3-pyaudio
    #   or
    #   pip install pyaudio
    # CentOS
    #   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
    # Microsoft Windows
    #   python -m pip install pyaudio
    
    import pyaudio
    import os
    import base64
    import threading
    import time
    import dashscope  # DashScope Python SDK version must be 1.23.9 or higher
    from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat
    
    # ======= Constant configuration =======
    TEXT_TO_SYNTHESIZE = [
        'Right? I really love this kind of supermarket,',
        'especially during the New Year holidays.',
        'Going to the supermarket',
        'makes me feel',
        'super super happy!',
        'I want to buy so many things!'
    ]
    
    def init_dashscope_api_key():
        """
        Initialize the DashScope SDK API key.
        """
        # API keys differ between the Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        # If you haven't set an environment variable, replace the following line with: dashscope.api_key = "sk-xxx"
        dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
    
    # ======= Callback class =======
    class MyCallback(QwenTtsRealtimeCallback):
        """
        Custom TTS streaming callback.
        """
        def __init__(self):
            self.complete_event = threading.Event()
            self._player = pyaudio.PyAudio()
            self._stream = self._player.open(
                format=pyaudio.paInt16, channels=1, rate=24000, output=True
            )
    
        def on_open(self) -> None:
            print('[TTS] Connection established')
    
        def on_close(self, close_status_code, close_msg) -> None:
            self._stream.stop_stream()
            self._stream.close()
            self._player.terminate()
            print(f'[TTS] Connection closed. Code={close_status_code}, msg={close_msg}')
    
        def on_event(self, response: dict) -> None:
            try:
                event_type = response.get('type', '')
                if event_type == 'session.created':
                    print(f'[TTS] Session started: {response["session"]["id"]}')
                elif event_type == 'response.audio.delta':
                    audio_data = base64.b64decode(response['delta'])
                    self._stream.write(audio_data)
                elif event_type == 'response.done':
                    print(f'[TTS] Response completed. Response ID: {qwen_tts_realtime.get_last_response_id()}')
                elif event_type == 'session.finished':
                    print('[TTS] Session ended')
                    self.complete_event.set()
            except Exception as e:
                print(f'[Error] Callback event processing error: {e}')
    
        def wait_for_finished(self):
            self.complete_event.wait()
    
    # ======= Main execution logic =======
    if __name__ == '__main__':
        init_dashscope_api_key()
        print('[System] Initializing Qwen TTS Realtime...')
    
        callback = MyCallback()
        qwen_tts_realtime = QwenTtsRealtime(
            # Use the same model for voice design and speech synthesis
            model="qwen3-tts-vd-realtime-2026-01-15",
            callback=callback,
            # URL for the Singapore region. For the Beijing region, use: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
            url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
        )
        qwen_tts_realtime.connect()
        
        qwen_tts_realtime.update_session(
            voice="myvoice", # Replace the voice parameter with your custom voice from voice design
            response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
            mode='server_commit'
        )
    
        for text_chunk in TEXT_TO_SYNTHESIZE:
            print(f'[Sending text]: {text_chunk}')
            qwen_tts_realtime.append_text(text_chunk)
            time.sleep(0.1)
    
        qwen_tts_realtime.finish()
        callback.wait_for_finished()
    
        print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, '
              f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')

    Java

    import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
    import com.alibaba.dashscope.exception.NoApiKeyException;
    import com.google.gson.JsonObject;
    
    import javax.sound.sampled.*;
    import java.io.*;
    import java.util.Base64;
    import java.util.Queue;
    import java.util.concurrent.CountDownLatch;
    import java.util.concurrent.atomic.AtomicReference;
    import java.util.concurrent.ConcurrentLinkedQueue;
    import java.util.concurrent.atomic.AtomicBoolean;
    
    public class Main {
        // ===== Constants =====
        private static String[] textToSynthesize = {
                "Right? I really love this kind of supermarket",
                "especially during the New Year holidays",
                "Going to the supermarket",
                "makes me feel",
                "super super happy!",
                "I want to buy so many things!"
        };
    
        // Real-time PCM player class
        public static class RealtimePcmPlayer {
            private int sampleRate;
            private SourceDataLine line;
            private AudioFormat audioFormat;
            private Thread decoderThread;
            private Thread playerThread;
            private AtomicBoolean stopped = new AtomicBoolean(false);
            private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
            private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
    
            // Constructor to initialize audio format and audio line
            public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
                this.sampleRate = sampleRate;
                this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
                DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
                line = (SourceDataLine) AudioSystem.getLine(info);
                line.open(audioFormat);
                line.start();
                decoderThread = new Thread(new Runnable() {
                    @Override
                    public void run() {
                        while (!stopped.get()) {
                            String b64Audio = b64AudioBuffer.poll();
                            if (b64Audio != null) {
                                byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
                                RawAudioBuffer.add(rawAudio);
                            } else {
                                try {
                                    Thread.sleep(100);
                                } catch (InterruptedException e) {
                                    throw new RuntimeException(e);
                                }
                            }
                        }
                    }
                });
                playerThread = new Thread(new Runnable() {
                    @Override
                    public void run() {
                        while (!stopped.get()) {
                            byte[] rawAudio = RawAudioBuffer.poll();
                            if (rawAudio != null) {
                                try {
                                    playChunk(rawAudio);
                                } catch (IOException e) {
                                    throw new RuntimeException(e);
                                } catch (InterruptedException e) {
                                    throw new RuntimeException(e);
                                }
                            } else {
                                try {
                                    Thread.sleep(100);
                                } catch (InterruptedException e) {
                                    throw new RuntimeException(e);
                                }
                            }
                        }
                    }
                });
                decoderThread.start();
                playerThread.start();
            }
    
            // Play an audio chunk and block until playback completes
            private void playChunk(byte[] chunk) throws IOException, InterruptedException {
                if (chunk == null || chunk.length == 0) return;
    
                int bytesWritten = 0;
                while (bytesWritten < chunk.length) {
                    bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
                }
                int audioLength = chunk.length / (this.sampleRate*2/1000);
                // Wait for audio in buffer to finish playing
                Thread.sleep(audioLength - 10);
            }
    
            public void write(String b64Audio) {
                b64AudioBuffer.add(b64Audio);
            }
    
            public void cancel() {
                b64AudioBuffer.clear();
                RawAudioBuffer.clear();
            }
    
            public void waitForComplete() throws InterruptedException {
                while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
                    Thread.sleep(100);
                }
                line.drain();
            }
    
            public void shutdown() throws InterruptedException {
                stopped.set(true);
                decoderThread.join();
                playerThread.join();
                if (line != null && line.isRunning()) {
                    line.drain();
                    line.close();
                }
            }
        }
    
        public static void main(String[] args) throws Exception {
            QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
                    // Use the same model for voice design and speech synthesis
                    .model("qwen3-tts-vd-realtime-2026-01-15")
                    // URL for the Singapore region. For the Beijing region, use: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
                    .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                    // API keys differ between the Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                    // If you haven't set an environment variable, replace the following line with: .apikey("sk-xxx")
                    .apikey(System.getenv("DASHSCOPE_API_KEY"))
                    .build();
            AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
            final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
    
            // Create real-time audio player instance
            RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
    
            QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
                @Override
                public void onOpen() {
                    // Handle connection established
                }
                @Override
                public void onEvent(JsonObject message) {
                    String type = message.get("type").getAsString();
                    switch(type) {
                        case "session.created":
                            // Handle session created
                            break;
                        case "response.audio.delta":
                            String recvAudioB64 = message.get("delta").getAsString();
                            // Play audio in real time
                            audioPlayer.write(recvAudioB64);
                            break;
                        case "response.done":
                            // Handle response completed
                            break;
                        case "session.finished":
                            // Handle session finished
                            completeLatch.get().countDown();
                        default:
                            break;
                    }
                }
                @Override
                public void onClose(int code, String reason) {
                    // Handle connection closed
                }
            });
            qwenTtsRef.set(qwenTtsRealtime);
            try {
                qwenTtsRealtime.connect();
            } catch (NoApiKeyException e) {
                throw new RuntimeException(e);
            }
            QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
                    .voice("myvoice") // Replace the voice parameter with your custom voice from voice design
                    .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
                    .mode("server_commit")
                    .build();
            qwenTtsRealtime.updateSession(config);
            for (String text:textToSynthesize) {
                qwenTtsRealtime.appendText(text);
                Thread.sleep(100);
            }
            qwenTtsRealtime.finish();
            completeLatch.get().await();
    
            // Wait for audio playback to complete and shut down player
            audioPlayer.waitForComplete();
            audioPlayer.shutdown();
            System.exit(0);
        }
    }

For more sample code, see GitHub.

Interaction flow

server_commit mode

Set the session.mode of the session.update event to "server_commit" to enable this mode. The server then automatically manages the timing for text segmentation and synthesis.

The interaction flow is as follows:

  1. When the client sends a session.update event, the server responds with the session.created and session.updated events.

  2. The client uses the input_text_buffer.append event to append text to the server-side buffer.

  3. The server intelligently manages text segmentation and synthesis timing, returning the response.created, response.output_item.added, response.content_part.added, and response.audio.delta events.

  4. The server sends the response.audio.done, response.content_part.done, response.output_item.done, and response.done events after completing the response.

  5. The server ends the session by sending the session.finished event.

Lifecycle

Client events

Server events

Session initialization

session.update

Session configuration

session.created

Session created

session.updated

Session configuration updated

User text input

input_text_buffer.append

Appends text to the server

input_text_buffer.commit

Immediately synthesizes the text cached on the server

session.finish

Notifies the server that there is no more text input

input_text_buffer.committed

Server received the submitted text

Server audio output

None

response.created

Server starts generating a response

response.output_item.added

New output content is available in the response

response.content_part.added

New output content is added to the assistant message

response.audio.delta

Incrementally generated audio from the model

response.content_part.done

Streaming of text or audio content for the assistant message is complete

response.output_item.done

Streaming of the entire output item for the assistant message is complete

response.audio.done

Audio generation is complete

response.done

Response is complete

commit mode

Set the session.mode for the session.update event to "commit" to enable this mode. In this mode, the client must submit the text buffer to the server to receive a response.

The interaction flow is as follows:

  1. When the client sends the session.update event, the server responds with the session.created and session.updated events.

  2. The client appends text to the server-side buffer by sending the input_text_buffer.append event.

  3. The client sends the input_text_buffer.commit event to commit the buffer to the server and the session.finish event to indicate that text input is complete.

  4. The server sends the response.created event to initiate response generation.

  5. The server sends the response.output_item.added, response.content_part.added, and response.audio.delta events.

  6. After the server completes its response, it returns response.audio.done, response.content_part.done, response.output_item.done, and response.done.

  7. The server responds with session.finished to end the session.

Lifecycle

Client events

Server events

Session initialization

session.update

Session configuration

session.created

Session created

session.updated

Session configuration updated

User text input

input_text_buffer.append

Appends text to the buffer

input_text_buffer.commit

Commits the buffer to the server

input_text_buffer.clear

Clears the buffer

input_text_buffer.committed

Server received the committed text

Server audio output

None

response.created

Server starts generating a response

response.output_item.added

New output content is available in the response

response.content_part.added

New output content is added to the assistant message

response.audio.delta

Incrementally generated audio from the model

response.content_part.done

Streaming of text or audio content for the assistant message is complete

response.output_item.done

Streaming of the entire output item for the assistant message is complete

response.audio.done

Audio generation is complete

response.done

Response is complete

API reference

Real-time speech synthesis - Qwen API reference

Voice cloning - API reference

Voice design - API reference

Feature comparison

Feature

qwen3-tts-vd-realtime-2026-01-15, qwen3-tts-vd-realtime-2025-12-16

qwen3-tts-vc-realtime-2026-01-15, qwen3-tts-vc-realtime-2025-11-27

qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18

qwen-tts-realtime, qwen-tts-realtime-latest, qwen-tts-realtime-2025-07-15

Supported languages

Chinese, English, Spanish, Russian, Italian, French, Korean, Japanese, German, and Portuguese

Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, and Cantonese, varies by voice), English, Spanish, Russian, Italian, French, Korean, Japanese, German, and Portuguese

Chinese and English

Audio formats

pcm, wav, mp3, and opus

pcm

Audio sampling rates

8 kHz, 16 kHz, 24 kHz, and 48 kHz

24 kHz

Voice cloning

Not supported

Supported

Not supported

Voice design

Supported

Not supported

SSML

Not supported

LaTeX

Not supported

Volume adjustment

Supported

Not supported

Speed adjustment

Supported

Not supported

Pitch adjustment

Supported

Not supported

Bitrate adjustment

Supported

Not supported

Timestamp

Not supported

Emotion setting

Not supported

Streaming input

Supported

Streaming output

Supported

Rate limit

Requests per minute (RPM): 180

qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 RPM: 180

qwen3-tts-flash-realtime-2025-09-18 RPM: 10

RPM: 10

Tokens per minute (TPM): 100,000

Access methods

Java/Python SDK, WebSocket API

Pricing

International: $0.143353 per 10,000 characters

Mainland China: $0.143353 per 10,000 characters

International: $0.13 per 10,000 characters

Mainland China: $0.143353 per 10,000 characters

Mainland China:

  • Input cost: $0.345 per 1,000 tokens

  • Output cost: $1.721 per 1,000 tokens

Supported voices

Supported voices vary by model. Set the voice request parameter to the corresponding value from the voice parameter column in the table.

voice parameter

Details

Supported languages

Supported models

Cherry

Name: Cherry

Description: A cheerful, positive, friendly, and natural young woman.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18

  • Qwen-TTS-Realtime: qwen-tts-realtime, qwen-tts-realtime-latest, qwen-tts-realtime-2025-07-15

Serena

Name: Serena

Description: Gentle female

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

  • Qwen-TTS-Realtime: qwen-tts-realtime, qwen-tts-realtime-latest, qwen-tts-realtime-2025-07-15

Ethan

Name: Ethan

Description: A bright, warm, energetic, and vibrant male voice with a standard Mandarin pronunciation and a slight northern accent.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18

  • Qwen-TTS-Realtime: qwen-tts-realtime, qwen-tts-realtime-latest, qwen-tts-realtime-2025-07-15

Chelsie

Name: Chelsie

Description: 2D virtual girlfriend

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

  • Qwen-TTS-Realtime: qwen-tts-realtime, qwen-tts-realtime-latest, qwen-tts-realtime-2025-07-15

Momo

Name: Momo

Description: A playful and cute female voice designed to be cheerful.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Vivian

Name: Vivian

Description: A cool, cute, and slightly feisty female voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Moon

Name: Moon

Description: Moon White (male), spirited and handsome

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Maia

Name: Maia

Description: A female voice that blends intelligence with gentleness.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Kai

Name: Kai

Description: A soothing voice that is like a spa for your ears.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Nofish

Name: Nofish

Description: A male designer who cannot pronounce the 'sh' or 'zh' sounds.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18

Bella

Name: Bella

Description: A young girl who drinks alcohol but does not practice Drunken Fist.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Jennifer

Name: Jennifer

Description: A premium, cinematic American English female voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18

Ryan

Name: Ryan

Description: A rhythmic and dramatic voice with a sense of realism and tension.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18

Katerina

Name: Katerina

Description: A mature female voice with a rich rhythm and lingering resonance.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18

Aiden

Name: Aiden

Description: The voice of a young American man who is skilled in cooking.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Eldric Sage

Name: Eldric Sage

Description: A calm and wise old man, with the weathered appearance of a pine tree but a mind as clear as a mirror (male)

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Mia

Name: Mia

Description: Gentle as spring water and pure as the first snow (female)

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Mochi

Name: Mochi

Description: The voice of a clever and bright "little adult" who retains childlike innocence yet possesses Zen-like wisdom.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Bellona

Name: Bellona

Description: A powerful and sonorous voice with clear articulation that brings characters to life and stirs passion in the listener. The clash of swords and the thunder of hooves echo in your dreams, revealing a world of countless voices through perfectly clear and resonant tones.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Vincent

Name: Vincent

Description: A uniquely raspy and smoky voice that instantly evokes tales of vast armies and heroic adventures.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Bunny

Name: Bunny

Description: A female character brimming with "moe" traits.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Neil

Name: Neil

Description: A professional news anchor's voice with a flat baseline intonation and precise, clear pronunciation.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Elias

Name: Elias

Description: Maintains academic rigor and uses narrative techniques to break down complex topics into digestible modules (female).

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18

Arthur

Name: Arthur

Description: A rustic voice, weathered by time and dry tobacco, that leisurely recounts village tales and oddities.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Nini

Name: Nini

Description: A soft and sticky voice, like mochi, whose drawn-out calls of "older brother" are sweet enough to melt your bones.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Ebona

Name: Ebona

Description: A whispery voice that is like a rusty key slowly turning in the darkest corners of your innermost self, where all your unacknowledged childhood shadows and unknown fears lie hidden.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Seren

Name: Seren

Description: A gentle and soothing voice to help you fall asleep faster. Good night and sweet dreams.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Pip

Name: Pip

Description: Naughty and mischievous, yet retaining a childlike innocence. Is this the Shin-chan you remember? (male)

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Stella

Name: Stella

Description: A voice that is normally sickeningly sweet and dazed, but when shouting "In the name of the moon, I'll punish you!", it instantly fills with undeniable love and justice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Bodega

Name: Bodega

Description: Enthusiastic Spanish uncle

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Sonrisa

Name: Sonrisa

Description: A warm and outgoing Latin American woman.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Alek

Name: Alek

Description: A voice that sounds cold at first, like Russia, yet is warm beneath the wool coat.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Dolce

Name: Dolce

Description: A laid-back, middle-aged Italian man

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Sohee

Name: Sohee

Description: A gentle, cheerful, and emotionally expressive Korean older-sister figure.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Ono Anna

Name: Ono Anna

Description: A spirited and mischievous young woman and childhood sweetheart.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Lenn

Name: Lenn

Description: Rational at the core, but rebellious in the details—a young German man who wears a suit and listens to post-punk.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Emilien

Name: Emilien

Description: A romantic and mature French male

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Andre

Name: Andre

Description: A magnetic, natural, comfortable, and calm male voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Radio Gol

Name: Radio Gol

Description: The voice of the football poet Rádio Gol! "Today I will call the football match for you using names."

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27

Jada

Name: Shanghai-Jada

Description: An energetic woman from Shanghai

Chinese (Shanghainese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18

Dylan

Name: Beijing-Dylan

Description: A teenage boy who grew up in the hutongs of Beijing.

Chinese (Beijing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18

Li

Name: Nanjing-Li

Description: A patient, male yoga teacher.

Chinese (Nanjing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18

Marcus

Name: Shaanxi-Marcus

Description: A voice that is broad-faced and brief-spoken, sincere-hearted and deep-voiced—the authentic flavor of Shaanxi.

Chinese (Shaanxi dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18

Roy

Name: Minnan-Roy

Description: The voice of a humorous, straightforward, and lively young Taiwanese man.

Chinese (Min Nan), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18

Peter

Name: Tianjin-Peter

Description: The voice of a professional straight man in Tianjin crosstalk.

Chinese (Tianjin dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18

Sunny

Name: Sichuan-Sunny

Description: The voice of a Sichuan girl whose sweetness melts your heart.

Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18

Eric

Name: Sichuan-Eric

Description: A man from Chengdu, Sichuan, who is detached from the mundane.

Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18

Rocky

Name: Cantonese-Rocky

Description: The voice of the humorous and witty Rocky, here for online chatting.

Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18

Kiki

Name: Cantonese-Kiki

Description: A sweet best female friend from Hong Kong.

Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean

  • Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18