Qwen's real-time speech synthesis model supports streaming text input and audio output. It offers multiple lifelike voice options, supports multilingual and dialect synthesis, can output multiple languages with the same voice, automatically adjusts intonation, and smoothly processes complex text.
Core features
-
Generates high-fidelity speech in real time with natural pronunciation in multiple languages, such as Chinese and English
-
Provides two voice customization methods: Voice cloning (Qwen) and Voice design (Qwen)
-
Supports streaming input and output with low-latency responses for real-time interactive scenarios
-
Adjustable speech rate, pitch, volume, and bitrate for fine-grained control over vocal expression
-
Compatible with mainstream audio formats, supporting output up to 48 kHz sample rate
-
Supports instruction control, enabling natural language instructions to control vocal expressiveness
Availability
Supported models:
International
In the international deployment mode, the access point and data storage are both located in the Singapore region. Model inference compute resources are dynamically scheduled globally (excluding Mainland China).
Select an API Key from the Singapore region when calling the following models:
-
Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime (stable version, equivalent to qwen3-tts-instruct-flash-realtime-2026-01-22), qwen3-tts-instruct-flash-realtime-2026-01-22 (latest snapshot)
-
Qwen3-TTS-VD-Realtime: qwen3-tts-vd-realtime-2026-01-15 (latest snapshot), qwen3-tts-vd-realtime-2025-12-16 (snapshot)
-
Qwen3-TTS-VC-Realtime: qwen3-tts-vc-realtime-2026-01-15 (latest snapshot), qwen3-tts-vc-realtime-2025-11-27 (snapshot)
-
Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime (stable version, equivalent to qwen3-tts-flash-realtime-2025-11-27), qwen3-tts-flash-realtime-2025-11-27 (latest snapshot), qwen3-tts-flash-realtime-2025-09-18 (snapshot)
Mainland China
In the Mainland China deployment mode, the access point and data storage are both located in the Beijing region. Model inference compute resources are restricted to Mainland China.
Select the API key for the Beijing region when invoking the following models: API key
-
Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime (stable version, equivalent to qwen3-tts-instruct-flash-realtime-2026-01-22), qwen3-tts-instruct-flash-realtime-2026-01-22 (latest snapshot)
-
Qwen3-TTS-VD-Realtime: qwen3-tts-vd-realtime-2026-01-15 (latest snapshot), qwen3-tts-vd-realtime-2025-12-16 (snapshot)
-
Qwen3-TTS-VC-Realtime: qwen3-tts-vc-realtime-2026-01-15 (latest snapshot), qwen3-tts-vc-realtime-2025-11-27 (snapshot)
-
Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime (stable version, equivalent to qwen3-tts-flash-realtime-2025-11-27), qwen3-tts-flash-realtime-2025-11-27 (latest snapshot), qwen3-tts-flash-realtime-2025-09-18 (snapshot)
-
Qwen-TTS-Realtime: qwen-tts-realtime (stable version, equivalent to qwen-tts-realtime-2025-07-15), qwen-tts-realtime-latest (latest version, equivalent to qwen-tts-realtime-2025-07-15), qwen-tts-realtime-2025-07-15 (snapshot)
For more information, see Model list.
Model selection guide
|
Scenario |
Recommended model |
Reason |
|
Voice customization for brand identity, exclusive voices, or extended system voices (based on text descriptions) |
qwen3-tts-vd-realtime-2026-01-15 |
Supports voice design. Creates customized voices from text descriptions without audio samples. Ideal for designing brand-exclusive voices from scratch. |
|
Voice customization for brand identity, exclusive voices, or extended system voices (based on audio samples) |
qwen3-tts-vc-realtime-2026-01-15 |
Supports voice cloning. Quickly replicates voices from real audio samples to create lifelike brand voiceprints with high fidelity and consistency. |
|
Emotional content production (audiobooks, radio dramas, game/animation dubbing) |
qwen3-tts-instruct-flash-realtime |
Supports instruction control. Precisely controls tone, speed, emotion, and character personality through natural language descriptions. Ideal for scenarios requiring rich expressiveness and character development. |
|
Professional broadcasting (news, documentaries, advertising) |
qwen3-tts-instruct-flash-realtime |
Supports instruction control. Describes broadcasting styles and tonal characteristics (such as "authoritative and solemn" or "casual and friendly"). Suitable for professional content production. |
|
Intelligent customer service and conversational bots |
qwen3-tts-flash-realtime, qwen3-tts-instruct-flash-realtime |
Supports streaming input and output with adjustable speech rate and pitch. The instruct version supports instruction control to dynamically adjust tone (such as reassuring, enthusiastic, or professional) based on conversation context. |
|
Multilingual content broadcasting |
qwen3-tts-flash-realtime, qwen3-tts-instruct-flash-realtime |
Supports multiple languages and Chinese dialects, meeting global content distribution needs. |
|
Audiobook reading and general content production |
qwen3-tts-flash-realtime, qwen3-tts-instruct-flash-realtime |
Adjustable volume, speech rate, and pitch to meet fine-grained production requirements for audiobooks, podcasts, and similar content. |
|
E-commerce livestreaming and short video dubbing |
qwen3-tts-flash-realtime, qwen3-tts-instruct-flash-realtime |
Supports mp3/opus compressed formats, suitable for bandwidth-constrained scenarios. |
For more details, see Feature comparison.
Getting started
Get your API Key and install the latest DashScope SDK before running the code.
Use system voice
The following example performs speech synthesis using system voices (see Supported voices).
Replace the model parameter with qwen3-tts-instruct-flash-realtime and set instructions using the instructions parameter to use the instruction control feature.
DashScope SDK
Python
Server commit mode
import os
import base64
import threading
import time
import dashscope
from dashscope.audio.qwen_tts_realtime import *
qwen_tts_realtime: QwenTtsRealtime = None
text_to_synthesize = [
'Right? I love supermarkets like this.',
'Especially during Chinese New Year,',
'I go shopping at supermarkets.',
'And I feel',
'absolutely thrilled!',
'I want to buy so many things!'
]
DO_VIDEO_TEST = False
def init_dashscope_api_key():
"""
Set your DashScope API key. More information:
https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
"""
# API keys differ between the Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
if 'DASHSCOPE_API_KEY' in os.environ:
dashscope.api_key = os.environ[
'DASHSCOPE_API_KEY'] # Load API key from environment variable DASHSCOPE_API_KEY
else:
dashscope.api_key = 'your-dashscope-api-key' # Set API key manually
class MyCallback(QwenTtsRealtimeCallback):
def __init__(self):
self.complete_event = threading.Event()
self.file = open('result_24k.pcm', 'wb')
def on_open(self) -> None:
print('connection opened, init player')
def on_close(self, close_status_code, close_msg) -> None:
self.file.close()
print('connection closed with code: {}, msg: {}, destroy player'.format(close_status_code, close_msg))
def on_event(self, response: str) -> None:
try:
global qwen_tts_realtime
type = response['type']
if 'session.created' == type:
print('start session: {}'.format(response['session']['id']))
if 'response.audio.delta' == type:
recv_audio_b64 = response['delta']
self.file.write(base64.b64decode(recv_audio_b64))
if 'response.done' == type:
print(f'response {qwen_tts_realtime.get_last_response_id()} done')
if 'session.finished' == type:
print('session finished')
self.complete_event.set()
except Exception as e:
print('[Error] {}'.format(e))
return
def wait_for_finished(self):
self.complete_event.wait()
if __name__ == '__main__':
init_dashscope_api_key()
print('Initializing ...')
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
# To use instruction control, replace the model with qwen3-tts-instruct-flash-realtime
model='qwen3-tts-flash-realtime',
callback=callback,
# This URL is for the Singapore region. If you use the Beijing region, replace it with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice = 'Cherry',
response_format = AudioFormat.PCM_24000HZ_MONO_16BIT,
# To use instruction control, uncomment the following lines and replace the model with qwen3-tts-instruct-flash-realtime
# instructions='Speak quickly with a rising intonation, suitable for introducing fashion products.',
# optimize_instructions=True,
mode = 'server_commit'
)
for text_chunk in text_to_synthesize:
print(f'send text: {text_chunk}')
qwen_tts_realtime.append_text(text_chunk)
time.sleep(0.1)
qwen_tts_realtime.finish()
callback.wait_for_finished()
print('[Metric] session: {}, first audio delay: {}'.format(
qwen_tts_realtime.get_session_id(),
qwen_tts_realtime.get_first_audio_delay(),
))
Commit mode
import base64
import os
import threading
import dashscope
from dashscope.audio.qwen_tts_realtime import *
qwen_tts_realtime: QwenTtsRealtime = None
text_to_synthesize = [
'This is the first sentence.',
'This is the second sentence.',
'This is the third sentence.',
]
DO_VIDEO_TEST = False
def init_dashscope_api_key():
"""
Set your DashScope API key. More information:
https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
"""
# API keys differ between the Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
if 'DASHSCOPE_API_KEY' in os.environ:
dashscope.api_key = os.environ[
'DASHSCOPE_API_KEY'] # Load API key from environment variable DASHSCOPE_API_KEY
else:
dashscope.api_key = 'your-dashscope-api-key' # Set API key manually
class MyCallback(QwenTtsRealtimeCallback):
def __init__(self):
super().__init__()
self.response_counter = 0
self.complete_event = threading.Event()
self.file = open(f'result_{self.response_counter}_24k.pcm', 'wb')
def reset_event(self):
self.response_counter += 1
self.file = open(f'result_{self.response_counter}_24k.pcm', 'wb')
self.complete_event = threading.Event()
def on_open(self) -> None:
print('connection opened, init player')
def on_close(self, close_status_code, close_msg) -> None:
print('connection closed with code: {}, msg: {}, destroy player'.format(close_status_code, close_msg))
def on_event(self, response: str) -> None:
try:
global qwen_tts_realtime
type = response['type']
if 'session.created' == type:
print('start session: {}'.format(response['session']['id']))
if 'response.audio.delta' == type:
recv_audio_b64 = response['delta']
self.file.write(base64.b64decode(recv_audio_b64))
if 'response.done' == type:
print(f'response {qwen_tts_realtime.get_last_response_id()} done')
self.complete_event.set()
self.file.close()
if 'session.finished' == type:
print('session finished')
self.complete_event.set()
except Exception as e:
print('[Error] {}'.format(e))
return
def wait_for_response_done(self):
self.complete_event.wait()
if __name__ == '__main__':
init_dashscope_api_key()
print('Initializing ...')
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
# To use instruction control, replace the model with qwen3-tts-instruct-flash-realtime
model='qwen3-tts-flash-realtime',
callback=callback,
# This URL is for the Singapore region. If you use the Beijing region, replace it with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice = 'Cherry',
response_format = AudioFormat.PCM_24000HZ_MONO_16BIT,
# To use instruction control, uncomment the following lines and replace the model with qwen3-tts-instruct-flash-realtime
# instructions='Speak quickly with a rising intonation, suitable for introducing fashion products.',
# optimize_instructions=True,
mode = 'commit'
)
print(f'send text: {text_to_synthesize[0]}')
qwen_tts_realtime.append_text(text_to_synthesize[0])
qwen_tts_realtime.commit()
callback.wait_for_response_done()
callback.reset_event()
print(f'send text: {text_to_synthesize[1]}')
qwen_tts_realtime.append_text(text_to_synthesize[1])
qwen_tts_realtime.commit()
callback.wait_for_response_done()
callback.reset_event()
print(f'send text: {text_to_synthesize[2]}')
qwen_tts_realtime.append_text(text_to_synthesize[2])
qwen_tts_realtime.commit()
callback.wait_for_response_done()
qwen_tts_realtime.finish()
print('[Metric] session: {}, first audio delay: {}'.format(
qwen_tts_realtime.get_session_id(),
qwen_tts_realtime.get_first_audio_delay(),
))Java
Server commit mode
// The Dashscope SDK version must be 2.21.16 or later.
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.AudioSystem;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class Main {
static String[] textToSynthesize = {
"Right? I especially love this kind of supermarket.",
"Especially during the New Year.",
"Going to the supermarket.",
"It just makes me feel.",
"Super, super happy!",
"I want to buy so many things!"
};
// Real-time PCM audio player class
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private AudioFormat audioFormat;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
// The constructor initializes the audio format and audio line.
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
RawAudioBuffer.add(rawAudio);
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
playerThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
byte[] rawAudio = RawAudioBuffer.poll();
if (rawAudio != null) {
try {
playChunk(rawAudio);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
decoderThread.start();
playerThread.start();
}
// Play an audio chunk and block until playback is complete.
private void playChunk(byte[] chunk) throws IOException, InterruptedException {
if (chunk == null || chunk.length == 0) return;
int bytesWritten = 0;
while (bytesWritten < chunk.length) {
bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
}
int audioLength = chunk.length / (this.sampleRate*2/1000);
// Wait for the audio in the buffer to finish playing.
Thread.sleep(audioLength - 10);
}
public void write(String b64Audio) {
b64AudioBuffer.add(b64Audio);
}
public void cancel() {
b64AudioBuffer.clear();
RawAudioBuffer.clear();
}
public void waitForComplete() throws InterruptedException {
while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
Thread.sleep(100);
}
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) {
line.drain();
line.close();
}
}
}
public static void main(String[] args) throws InterruptedException, LineUnavailableException, FileNotFoundException {
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
.model("qwen3-tts-flash-realtime")
// The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with wss://dashscope.aliyuncs.com/api-ws/v1/realtime.
.url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
// The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
// Create a real-time audio player instance.
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
@Override
public void onOpen() {
// Handle the event when the connection is established.
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
// Handle the event when the session is created.
break;
case "response.audio.delta":
String recvAudioB64 = message.get("delta").getAsString();
// Play the audio in real time.
audioPlayer.write(recvAudioB64);
break;
case "response.done":
// Handle the event when the response is complete.
break;
case "session.finished":
// Handle the event when the session is finished.
completeLatch.get().countDown();
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
// Handle the event when the connection is closed.
}
});
qwenTtsRef.set(qwenTtsRealtime);
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice("Cherry")
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("server_commit")
.build();
qwenTtsRealtime.updateSession(config);
for (String text:textToSynthesize) {
qwenTtsRealtime.appendText(text);
Thread.sleep(100);
}
qwenTtsRealtime.finish();
completeLatch.get().await();
qwenTtsRealtime.close();
// Wait for the audio to finish playing and then shut down the player.
audioPlayer.waitForComplete();
audioPlayer.shutdown();
System.exit(0);
}
}Commit mode
// The Dashscope SDK version must be 2.21.16 or later.
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.AudioSystem;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Base64;
import java.util.Queue;
import java.util.Scanner;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class commit {
// Real-time PCM audio player class
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private AudioFormat audioFormat;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
// The constructor initializes the audio format and audio line.
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
RawAudioBuffer.add(rawAudio);
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
playerThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
byte[] rawAudio = RawAudioBuffer.poll();
if (rawAudio != null) {
try {
playChunk(rawAudio);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
decoderThread.start();
playerThread.start();
}
// Play an audio chunk and block until playback is complete.
private void playChunk(byte[] chunk) throws IOException, InterruptedException {
if (chunk == null || chunk.length == 0) return;
int bytesWritten = 0;
while (bytesWritten < chunk.length) {
bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
}
int audioLength = chunk.length / (this.sampleRate*2/1000);
// Wait for the audio in the buffer to finish playing.
Thread.sleep(audioLength - 10);
}
public void write(String b64Audio) {
b64AudioBuffer.add(b64Audio);
}
public void cancel() {
b64AudioBuffer.clear();
RawAudioBuffer.clear();
}
public void waitForComplete() throws InterruptedException {
// Wait for all audio data in the buffers to finish playing.
while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
Thread.sleep(100);
}
// Wait for the audio line to finish playing.
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) {
line.drain();
line.close();
}
}
}
public static void main(String[] args) throws InterruptedException, LineUnavailableException, FileNotFoundException {
Scanner scanner = new Scanner(System.in);
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
.model("qwen3-tts-flash-realtime")
// The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with wss://dashscope.aliyuncs.com/api-ws/v1/realtime.
.url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
// The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
// Create a real-time player instance.
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
// File file = new File("result_24k.pcm");
// FileOutputStream fos = new FileOutputStream(file);
@Override
public void onOpen() {
System.out.println("connection opened");
System.out.println("Enter text and press Enter to send. Enter 'quit' to exit the program.");
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
System.out.println("start session: " + message.get("session").getAsJsonObject().get("id").getAsString());
break;
case "response.audio.delta":
String recvAudioB64 = message.get("delta").getAsString();
byte[] rawAudio = Base64.getDecoder().decode(recvAudioB64);
// fos.write(rawAudio);
// Play the audio in real time.
audioPlayer.write(recvAudioB64);
break;
case "response.done":
System.out.println("response done");
// Wait for the audio to finish playing.
try {
audioPlayer.waitForComplete();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
// Prepare for the next input.
completeLatch.get().countDown();
break;
case "session.finished":
System.out.println("session finished");
if (qwenTtsRef.get() != null) {
System.out.println("[Metric] response: " + qwenTtsRef.get().getResponseId() +
", first audio delay: " + qwenTtsRef.get().getFirstAudioDelay() + " ms");
}
completeLatch.get().countDown();
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
System.out.println("connection closed code: " + code + ", reason: " + reason);
try {
// fos.close();
// Wait for playback to complete and then shut down the player.
audioPlayer.waitForComplete();
audioPlayer.shutdown();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
});
qwenTtsRef.set(qwenTtsRealtime);
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice("Cherry")
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("commit")
.build();
qwenTtsRealtime.updateSession(config);
// Loop to read user input.
while (true) {
System.out.print("Enter the text to synthesize: ");
String text = scanner.nextLine();
// If the user enters 'quit', exit the program.
if ("quit".equalsIgnoreCase(text.trim())) {
System.out.println("Closing the connection...");
qwenTtsRealtime.finish();
completeLatch.get().await();
break;
}
// If the user input is empty, skip.
if (text.trim().isEmpty()) {
continue;
}
// Reinitialize the countdown latch.
completeLatch.set(new CountDownLatch(1));
// Send the text.
qwenTtsRealtime.appendText(text);
qwenTtsRealtime.commit();
// Wait for the current synthesis to complete.
completeLatch.get().await();
}
// Clean up resources.
audioPlayer.waitForComplete();
audioPlayer.shutdown();
scanner.close();
System.exit(0);
}
}WebSocket API
-
Prepare runtime environment
Install pyaudio based on your operating system.
macOS
brew install portaudio && pip install pyaudioDebian/Ubuntu
sudo apt-get install python3-pyaudio or pip install pyaudioCentOS
sudo yum install -y portaudio portaudio-devel && pip install pyaudioWindows
pip install pyaudioThen, install WebSocket dependencies using pip:
pip install websocket-client==1.8.0 websockets -
Create client
Create a new Python file locally named
tts_realtime_client.pyand copy the following code into the file: -
Select speech synthesis mode
The Realtime API supports two modes:
-
Server commit mode
The client sends text only. The server intelligently determines text segmentation and synthesis timing. Use this mode for low-latency scenarios without manual synthesis control, such as GPS navigation.
-
Commit mode
Add text to a buffer first, then trigger the server to synthesize the specified text. Use this mode for scenarios requiring fine-grained control over pauses and sentence breaks, such as news broadcasting.
Server commit mode
Create another Python file named
server_commit.pyin the same directory astts_realtime_client.py, and copy the following code into the file:Run
server_commit.pyto listen to real-time audio generated by the Realtime API.Commit mode
Create another Python file named
commit.pyin the same directory astts_realtime_client.py, and copy the following code into the file:Run
commit.pyto input multiple texts for synthesis. Press Enter without entering text to listen to the audio returned by the Realtime API through your speakers. -
Use cloned voice
The voice cloning service does not provide preview audio. Test and evaluate the effect through the speech synthesis interface. Use short text for initial testing.
The following example demonstrates how to use a custom voice generated through voice cloning in speech synthesis to produce output highly similar to the original voice. This example references the "server commit mode" example code for system voice synthesis using the DashScope SDK, replacing the voice parameter with the cloned custom voice.
-
Key principle: Match the voice cloning model (
target_model) with the speech synthesis model (model). Otherwise, synthesis fails. -
The example uses a local audio file
voice.mp3for voice cloning. Replace it when running the code.
Python
# coding=utf-8
# Installation instructions for pyaudio:
# APPLE Mac OS X
# brew install portaudio
# pip install pyaudio
# Debian/Ubuntu
# sudo apt-get install python-pyaudio python3-pyaudio
# or
# pip install pyaudio
# CentOS
# sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
# python -m pip install pyaudio
import pyaudio
import os
import requests
import base64
import pathlib
import threading
import time
import dashscope # DashScope Python SDK version must be at least 1.23.9
from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat
# ======= Constants =======
DEFAULT_TARGET_MODEL = "qwen3-tts-vc-realtime-2026-01-15" # Use the same model for voice cloning and speech synthesis
DEFAULT_PREFERRED_NAME = "guanyu"
DEFAULT_AUDIO_MIME_TYPE = "audio/mpeg"
VOICE_FILE_PATH = "voice.mp3" # Relative path to local audio file for voice cloning
TEXT_TO_SYNTHESIZE = [
'Right? I really love this kind of supermarket,',
'especially during Chinese New Year',
'when I go shopping',
'I feel',
'super super happy!',
'I want to buy so many things!'
]
def create_voice(file_path: str,
target_model: str = DEFAULT_TARGET_MODEL,
preferred_name: str = DEFAULT_PREFERRED_NAME,
audio_mime_type: str = DEFAULT_AUDIO_MIME_TYPE) -> str:
"""
Create voice and return voice parameter
"""
# API Keys differ between Singapore and Beijing regions. Get your API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# Replace with your Model Studio API Key if environment variable is not configured: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
raise FileNotFoundError(f"Audio file not found: {file_path}")
base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"
# Singapore region URL. Replace with https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization for Beijing region models
url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
payload = {
"model": "qwen-voice-enrollment", # Do not modify this value
"input": {
"action": "create",
"target_model": target_model,
"preferred_name": preferred_name,
"audio": {"data": data_uri}
}
}
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
resp = requests.post(url, json=payload, headers=headers)
if resp.status_code != 200:
raise RuntimeError(f"Voice creation failed: {resp.status_code}, {resp.text}")
try:
return resp.json()["output"]["voice"]
except (KeyError, ValueError) as e:
raise RuntimeError(f"Failed to parse voice response: {e}")
def init_dashscope_api_key():
"""
Initialize DashScope SDK API key
"""
# API Keys differ between Singapore and Beijing regions. Get your API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# Replace with your Model Studio API Key if environment variable is not configured: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
# ======= Callback class =======
class MyCallback(QwenTtsRealtimeCallback):
"""
Custom TTS streaming callback
"""
def __init__(self):
self.complete_event = threading.Event()
self._player = pyaudio.PyAudio()
self._stream = self._player.open(
format=pyaudio.paInt16, channels=1, rate=24000, output=True
)
def on_open(self) -> None:
print('[TTS] Connection established')
def on_close(self, close_status_code, close_msg) -> None:
self._stream.stop_stream()
self._stream.close()
self._player.terminate()
print(f'[TTS] Connection closed code={close_status_code}, msg={close_msg}')
def on_event(self, response: dict) -> None:
try:
event_type = response.get('type', '')
if event_type == 'session.created':
print(f'[TTS] Session started: {response["session"]["id"]}')
elif event_type == 'response.audio.delta':
audio_data = base64.b64decode(response['delta'])
self._stream.write(audio_data)
elif event_type == 'response.done':
print(f'[TTS] Response completed, Response ID: {qwen_tts_realtime.get_last_response_id()}')
elif event_type == 'session.finished':
print('[TTS] Session ended')
self.complete_event.set()
except Exception as e:
print(f'[Error] Error handling callback event: {e}')
def wait_for_finished(self):
self.complete_event.wait()
# ======= Main execution logic =======
if __name__ == '__main__':
init_dashscope_api_key()
print('[System] Initializing Qwen TTS Realtime ...')
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
model=DEFAULT_TARGET_MODEL,
callback=callback,
# Singapore region URL. Replace with wss://dashscope.aliyuncs.com/api-ws/v1/realtime for Beijing region models
url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice=create_voice(VOICE_FILE_PATH), # Replace voice parameter with cloned custom voice
response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
mode='server_commit'
)
for text_chunk in TEXT_TO_SYNTHESIZE:
print(f'[Sending text]: {text_chunk}')
qwen_tts_realtime.append_text(text_chunk)
time.sleep(0.1)
qwen_tts_realtime.finish()
callback.wait_for_finished()
print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, '
f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')
Java
You need to import the Gson dependency. If you use Maven or Gradle, add the dependency as follows:
Maven
Add the following to your pom.xml:
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.13.1</version>
</dependency>
Gradle
Add the following to your build.gradle:
// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import javax.sound.sampled.*;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class Main {
// ===== Constants =====
// Use the same model for voice cloning and speech synthesis
private static final String TARGET_MODEL = "qwen3-tts-vc-realtime-2026-01-15";
private static final String PREFERRED_NAME = "guanyu";
// Relative path to local audio file for voice cloning
private static final String AUDIO_FILE = "voice.mp3";
private static final String AUDIO_MIME_TYPE = "audio/mpeg";
private static String[] textToSynthesize = {
"Right? I really love this kind of supermarket",
"especially during Chinese New Year",
"when I go shopping",
"I feel",
"super super happy!",
"I want to buy so many things!"
};
// Generate data URI
public static String toDataUrl(String filePath) throws IOException {
byte[] bytes = Files.readAllBytes(Paths.get(filePath));
String encoded = Base64.getEncoder().encodeToString(bytes);
return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
}
// Call API to create voice
public static String createVoice() throws Exception {
// API Keys differ between Singapore and Beijing regions. Get your API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// Replace with your Model Studio API Key if environment variable is not configured: String apiKey = "sk-xxx"
String apiKey = System.getenv("DASHSCOPE_API_KEY");
String jsonPayload =
"{"
+ "\"model\": \"qwen-voice-enrollment\"," // Do not modify this value
+ "\"input\": {"
+ "\"action\": \"create\","
+ "\"target_model\": \"" + TARGET_MODEL + "\","
+ "\"preferred_name\": \"" + PREFERRED_NAME + "\","
+ "\"audio\": {"
+ "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
+ "}"
+ "}"
+ "}";
HttpURLConnection con = (HttpURLConnection) new URL("https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization").openConnection();
con.setRequestMethod("POST");
con.setRequestProperty("Authorization", "Bearer " + apiKey);
con.setRequestProperty("Content-Type", "application/json");
con.setDoOutput(true);
try (OutputStream os = con.getOutputStream()) {
os.write(jsonPayload.getBytes(StandardCharsets.UTF_8));
}
int status = con.getResponseCode();
System.out.println("HTTP status code: " + status);
try (BufferedReader br = new BufferedReader(
new InputStreamReader(status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(),
StandardCharsets.UTF_8))) {
StringBuilder response = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
response.append(line);
}
System.out.println("Response content: " + response);
if (status == 200) {
JsonObject jsonObj = new Gson().fromJson(response.toString(), JsonObject.class);
return jsonObj.getAsJsonObject("output").get("voice").getAsString();
}
throw new IOException("Voice creation failed: " + status + " - " + response);
}
}
// Real-time PCM audio player class
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private AudioFormat audioFormat;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
// Constructor to initialize audio format and audio line
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
RawAudioBuffer.add(rawAudio);
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
playerThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
byte[] rawAudio = RawAudioBuffer.poll();
if (rawAudio != null) {
try {
playChunk(rawAudio);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
decoderThread.start();
playerThread.start();
}
// Play an audio chunk and block until playback completes
private void playChunk(byte[] chunk) throws IOException, InterruptedException {
if (chunk == null || chunk.length == 0) return;
int bytesWritten = 0;
while (bytesWritten < chunk.length) {
bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
}
int audioLength = chunk.length / (this.sampleRate*2/1000);
// Wait for audio in buffer to finish playing
Thread.sleep(audioLength - 10);
}
public void write(String b64Audio) {
b64AudioBuffer.add(b64Audio);
}
public void cancel() {
b64AudioBuffer.clear();
RawAudioBuffer.clear();
}
public void waitForComplete() throws InterruptedException {
while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
Thread.sleep(100);
}
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) {
line.drain();
line.close();
}
}
}
public static void main(String[] args) throws Exception {
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
.model(TARGET_MODEL)
// Singapore region URL. Replace with wss://dashscope.aliyuncs.com/api-ws/v1/realtime for Beijing region models
.url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
// API Keys differ between Singapore and Beijing regions. Get your API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// Replace with your Model Studio API Key if environment variable is not configured: .apikey("sk-xxx")
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
// Create real-time audio player instance
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
@Override
public void onOpen() {
// Handle connection established
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
// Handle session created
break;
case "response.audio.delta":
String recvAudioB64 = message.get("delta").getAsString();
// Play audio in real time
audioPlayer.write(recvAudioB64);
break;
case "response.done":
// Handle response completed
break;
case "session.finished":
// Handle session finished
completeLatch.get().countDown();
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
// Handle connection closed
}
});
qwenTtsRef.set(qwenTtsRealtime);
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice(createVoice()) // Replace voice parameter with cloned custom voice
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("server_commit")
.build();
qwenTtsRealtime.updateSession(config);
for (String text:textToSynthesize) {
qwenTtsRealtime.appendText(text);
Thread.sleep(100);
}
qwenTtsRealtime.finish();
completeLatch.get().await();
// Wait for audio playback to complete and shut down player
audioPlayer.waitForComplete();
audioPlayer.shutdown();
System.exit(0);
}
}
Use designed voice
The voice design feature returns preview audio data. Listen to this preview audio first to confirm the effect meets your expectations before using it for speech synthesis.
Generate a custom voice and listen to the preview. If you are satisfied, proceed. Otherwise, regenerate the voice.
Python
import requests import base64 import os def create_voice_and_play(): # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If you haven't set an environment variable, replace the line below with: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") if not api_key: print("Error: DASHSCOPE_API_KEY environment variable not found. Please set your API key.") return None, None, None # Prepare request data headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-realtime-2026-01-15", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "en" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } } # URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" try: # Send request response = requests.post( url, headers=headers, json=data, timeout=60 # Add timeout setting ) if response.status_code == 200: result = response.json() # Get voice name voice_name = result["output"]["voice"] print(f"Voice name: {voice_name}") # Get preview audio data base64_audio = result["output"]["preview_audio"]["data"] # Decode Base64 audio data audio_bytes = base64.b64decode(base64_audio) # Save audio file locally filename = f"{voice_name}_preview.wav" # Write audio data to local file with open(filename, 'wb') as f: f.write(audio_bytes) print(f"Audio saved to local file: {filename}") print(f"File path: {os.path.abspath(filename)}") return voice_name, audio_bytes, filename else: print(f"Request failed. Status code: {response.status_code}") print(f"Response: {response.text}") return None, None, None except requests.exceptions.RequestException as e: print(f"Network request error: {e}") return None, None, None except KeyError as e: print(f"Response format error: missing required field: {e}") print(f"Response: {response.text if 'response' in locals() else 'No response'}") return None, None, None except Exception as e: print(f"Unexpected error: {e}") return None, None, None if __name__ == "__main__": print("Creating voice...") voice_name, audio_data, saved_filename = create_voice_and_play() if voice_name: print(f"\nSuccessfully created voice '{voice_name}'") print(f"Audio file saved: '{saved_filename}'") print(f"File size: {os.path.getsize(saved_filename)} bytes") else: print("\nVoice creation failed")Java
You need to import the Gson dependency. If you use Maven or Gradle, add the dependency:
Maven
Add the following content to
pom.xml:<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson --> <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.13.1</version> </dependency>Gradle
Add the following content to
build.gradle:// https://mvnrepository.com/artifact/com.google.code.gson/gson implementation("com.google.code.gson:gson:2.13.1")import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.*; import java.net.HttpURLConnection; import java.net.URL; import java.util.Base64; public class Main { public static void main(String[] args) { Main example = new Main(); example.createVoice(); } public void createVoice() { // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If you haven't set an environment variable, replace the line below with: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Create the JSON request body string String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"create\",\n" + " \"target_model\": \"qwen3-tts-vd-realtime-2026-01-15\",\n" + " \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" + " \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" + " \"preferred_name\": \"announcer\",\n" + " \"language\": \"en\"\n" + " },\n" + " \"parameters\": {\n" + " \"sample_rate\": 24000,\n" + " \"response_format\": \"wav\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // Set request method and headers connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // Send request body try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // Get response int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // Read response content StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // Parse JSON response JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); JsonObject outputObj = jsonResponse.getAsJsonObject("output"); JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio"); // Get voice name String voiceName = outputObj.get("voice").getAsString(); System.out.println("Voice name: " + voiceName); // Get Base64-encoded audio data String base64Audio = previewAudioObj.get("data").getAsString(); // Decode Base64 audio data byte[] audioBytes = Base64.getDecoder().decode(base64Audio); // Save audio to a local file String filename = voiceName + "_preview.wav"; saveAudioToFile(audioBytes, filename); System.out.println("Audio saved to local file: " + filename); } else { // Read error response StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("Request failed. Status code: " + responseCode); System.out.println("Error response: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("Request error: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } private void saveAudioToFile(byte[] audioBytes, String filename) { try { File file = new File(filename); try (FileOutputStream fos = new FileOutputStream(file)) { fos.write(audioBytes); } System.out.println("Audio saved to: " + file.getAbsolutePath()); } catch (IOException e) { System.err.println("Error saving audio file: " + e.getMessage()); e.printStackTrace(); } } }Use the custom voice generated in the previous step for speech synthesis.
This example is based on the "server commit mode" of the DashScope SDK for speech synthesis using a system voice. Replace the
voiceparameter with the custom voice generated by voice design.Key Principle: The model used during voice design (
target_model) must be the same as the model used for subsequent speech synthesis (model). Otherwise, the synthesis will fail.Python
# coding=utf-8 # Installation instructions for pyaudio: # APPLE Mac OS X # brew install portaudio # pip install pyaudio # Debian/Ubuntu # sudo apt-get install python-pyaudio python3-pyaudio # or # pip install pyaudio # CentOS # sudo yum install -y portaudio portaudio-devel && pip install pyaudio # Microsoft Windows # python -m pip install pyaudio import pyaudio import os import base64 import threading import time import dashscope # DashScope Python SDK version 1.23.9 or later is required from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat # ======= Constant Configuration ======= TEXT_TO_SYNTHESIZE = [ 'Right? I just love this kind of supermarket,', 'especially during the New Year.', 'Going to the supermarket', 'just makes me feel', 'super, super happy!', 'I want to buy so many things!' ] def init_dashscope_api_key(): """ Initializes the DashScope SDK API key """ # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If you haven't set an environment variable, replace the line below with: dashscope.api_key = "sk-xxx" dashscope.api_key = os.getenv("DASHSCOPE_API_KEY") # ======= Callback Class ======= class MyCallback(QwenTtsRealtimeCallback): """ Custom TTS streaming callback """ def __init__(self): self.complete_event = threading.Event() self._player = pyaudio.PyAudio() self._stream = self._player.open( format=pyaudio.paInt16, channels=1, rate=24000, output=True ) def on_open(self) -> None: print('[TTS] Connection established') def on_close(self, close_status_code, close_msg) -> None: self._stream.stop_stream() self._stream.close() self._player.terminate() print(f'[TTS] Connection closed, code={close_status_code}, msg={close_msg}') def on_event(self, response: dict) -> None: try: event_type = response.get('type', '') if event_type == 'session.created': print(f'[TTS] Session started: {response["session"]["id"]}') elif event_type == 'response.audio.delta': audio_data = base64.b64decode(response['delta']) self._stream.write(audio_data) elif event_type == 'response.done': print(f'[TTS] Response complete, Response ID: {qwen_tts_realtime.get_last_response_id()}') elif event_type == 'session.finished': print('[TTS] Session finished') self.complete_event.set() except Exception as e: print(f'[Error] Exception processing callback event: {e}') def wait_for_finished(self): self.complete_event.wait() # ======= Main Execution Logic ======= if __name__ == '__main__': init_dashscope_api_key() print('[System] Initializing Qwen TTS Realtime ...') callback = MyCallback() qwen_tts_realtime = QwenTtsRealtime( # Voice design and speech synthesis must use the same model model="qwen3-tts-vd-realtime-2026-01-15", callback=callback, # URL for Singapore region. For Beijing region, use: wss://dashscope.aliyuncs.com/api-ws/v1/realtime url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime' ) qwen_tts_realtime.connect() qwen_tts_realtime.update_session( voice="myvoice", # Replace the voice parameter with the custom voice generated by voice design response_format=AudioFormat.PCM_24000HZ_MONO_16BIT, mode='server_commit' ) for text_chunk in TEXT_TO_SYNTHESIZE: print(f'[Sending text]: {text_chunk}') qwen_tts_realtime.append_text(text_chunk) time.sleep(0.1) qwen_tts_realtime.finish() callback.wait_for_finished() print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, ' f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')Java
import com.alibaba.dashscope.audio.qwen_tts_realtime.*; import com.alibaba.dashscope.exception.NoApiKeyException; import com.google.gson.JsonObject; import javax.sound.sampled.*; import java.io.*; import java.util.Base64; import java.util.Queue; import java.util.concurrent.CountDownLatch; import java.util.concurrent.atomic.AtomicReference; import java.util.concurrent.ConcurrentLinkedQueue; import java.util.concurrent.atomic.AtomicBoolean; public class Main { // ===== Constant Definitions ===== private static String[] textToSynthesize = { "Right? I just love this kind of supermarket,", "especially during the New Year.", "Going to the supermarket", "just makes me feel", "super, super happy!", "I want to buy so many things!" }; // Real-time audio player class public static class RealtimePcmPlayer { private int sampleRate; private SourceDataLine line; private AudioFormat audioFormat; private Thread decoderThread; private Thread playerThread; private AtomicBoolean stopped = new AtomicBoolean(false); private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>(); private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>(); // Constructor initializes audio format and audio line public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException { this.sampleRate = sampleRate; this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false); DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat); line = (SourceDataLine) AudioSystem.getLine(info); line.open(audioFormat); line.start(); decoderThread = new Thread(new Runnable() { @Override public void run() { while (!stopped.get()) { String b64Audio = b64AudioBuffer.poll(); if (b64Audio != null) { byte[] rawAudio = Base64.getDecoder().decode(b64Audio); RawAudioBuffer.add(rawAudio); } else { try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); } } } } }); playerThread = new Thread(new Runnable() { @Override public void run() { while (!stopped.get()) { byte[] rawAudio = RawAudioBuffer.poll(); if (rawAudio != null) { try { playChunk(rawAudio); } catch (IOException e) { throw new RuntimeException(e); } catch (InterruptedException e) { throw new RuntimeException(e); } } else { try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); } } } } }); decoderThread.start(); playerThread.start(); } // Plays an audio chunk and blocks until playback is complete private void playChunk(byte[] chunk) throws IOException, InterruptedException { if (chunk == null || chunk.length == 0) return; int bytesWritten = 0; while (bytesWritten < chunk.length) { bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten); } int audioLength = chunk.length / (this.sampleRate*2/1000); // Wait for the audio in the buffer to finish playing Thread.sleep(audioLength - 10); } public void write(String b64Audio) { b64AudioBuffer.add(b64Audio); } public void cancel() { b64AudioBuffer.clear(); RawAudioBuffer.clear(); } public void waitForComplete() throws InterruptedException { while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) { Thread.sleep(100); } line.drain(); } public void shutdown() throws InterruptedException { stopped.set(true); decoderThread.join(); playerThread.join(); if (line != null && line.isRunning()) { line.drain(); line.close(); } } } public static void main(String[] args) throws Exception { QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder() // Voice design and speech synthesis must use the same model .model("qwen3-tts-vd-realtime-2026-01-15") // URL for Singapore region. For Beijing region, use: wss://dashscope.aliyuncs.com/api-ws/v1/realtime .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime") // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If you haven't set an environment variable, replace the line below with: .apikey("sk-xxx") .apikey(System.getenv("DASHSCOPE_API_KEY")) .build(); AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1)); final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null); // Create a real-time audio player instance RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000); QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() { @Override public void onOpen() { // Handle connection open } @Override public void onEvent(JsonObject message) { String type = message.get("type").getAsString(); switch(type) { case "session.created": // Handle session creation break; case "response.audio.delta": String recvAudioB64 = message.get("delta").getAsString(); // Play audio in real time audioPlayer.write(recvAudioB64); break; case "response.done": // Handle response completion break; case "session.finished": // Handle session finish completeLatch.get().countDown(); default: break; } } @Override public void onClose(int code, String reason) { // Handle connection close } }); qwenTtsRef.set(qwenTtsRealtime); try { qwenTtsRealtime.connect(); } catch (NoApiKeyException e) { throw new RuntimeException(e); } QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder() .voice("myvoice") // Replace the voice parameter with the custom voice generated by voice design .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT) .mode("server_commit") .build(); qwenTtsRealtime.updateSession(config); for (String text:textToSynthesize) { qwenTtsRealtime.appendText(text); Thread.sleep(100); } qwenTtsRealtime.finish(); completeLatch.get().await(); // Wait for audio playback to complete and then shut down the player audioPlayer.waitForComplete(); audioPlayer.shutdown(); System.exit(0); } }
For more example code, see github.
Interaction flow
Server commit mode
Set the session.mode property of the session.update event to "server_commit" to enable this mode. The server intelligently handles text segmentation and synthesis timing.
Interaction flow:
-
Client sends
session.updateevent. Server responds withsession.createdandsession.updatedevents. -
Client sends
input_text_buffer.appendevent to append text to the server buffer. -
Server intelligently handles text segmentation and synthesis timing, returning
response.created,response.output_item.added,response.content_part.added, andresponse.audio.deltaevents. -
After completing the response, the server returns
response.audio.done,response.content_part.done,response.output_item.done, andresponse.done. -
The server responds with
session.finishedto end the session.
|
Lifecycle |
Client events |
Server events |
|
Session initialization |
session.update Session configuration |
session.created Session created session.updated Session configuration updated |
|
User text input |
input_text_buffer.append Add text to server input_text_buffer.commit Immediately synthesize server-cached text session.finish Notify server no more text input |
input_text_buffer.committed Server received submitted text |
|
Server audio output |
None |
response.created Server starts generating response response.output_item.added New output content in response response.content_part.added New output content added to assistant message response.audio.delta Incremental audio generated by model response.content_part.done Text or audio content stream for assistant message completed response.output_item.done Entire output item stream for assistant message completed response.audio.done Audio generation completed response.done Response completed |
Commit mode
Set the session.mode property of the session.update event to "commit" to enable this mode. The client must actively submit the text buffer to the server to obtain a response.
Interaction flow:
-
Client sends
session.updateevent. Server responds withsession.createdandsession.updatedevents. -
Client sends
input_text_buffer.appendevent to append text to the server buffer. -
Client sends
input_text_buffer.commitevent to submit the buffer to the server, and sends asession.finishevent to indicate no more text input. -
The server responds with
response.created, starting response generation. -
The server responds with
response.output_item.added,response.content_part.added, andresponse.audio.deltaevents. -
After completing the response, the server returns
response.audio.done,response.content_part.done,response.output_item.done, andresponse.done. -
The server responds with
session.finishedto end the session.
|
Lifecycle |
Client events |
Server events |
|
Session initialization |
session.update Session configuration |
session.created Session created session.updated Session configuration updated |
|
User text input |
input_text_buffer.append Add text to buffer input_text_buffer.commit Submit buffer to server input_text_buffer.clear Clear buffer |
input_text_buffer.committed Server received submitted text |
|
Server audio output |
None |
response.created Server starts generating response response.output_item.added New output content in response response.content_part.added New output content added to assistant message response.audio.delta Incremental audio generated by model response.content_part.done Text or audio content stream for assistant message completed response.output_item.done Entire output item stream for assistant message completed response.audio.done Audio generation completed response.done Response completed |
Instruction control
Instruction control is an advanced speech synthesis feature that precisely controls vocal expression through natural language descriptions. Use simple text descriptions to make synthesized speech exhibit specific tones, speeds, emotions, and voice characteristics without adjusting complex audio parameters.
Supported models: Supported only by Qwen3-TTS-Instruct-Flash-Realtime models.
Usage: Specify instruction content using the instructions parameter, such as: "Speak quickly with a noticeably rising intonation, suitable for introducing fashion products."
Supported languages: Description text supports Chinese and English only.
Length limit: Must not exceed 1600 tokens.
Applicable scenarios:
-
Audiobook and radio drama dubbing
-
Advertising and promotional video dubbing
-
Game character and animation dubbing
-
Emotionally Intelligent Voice Assistant
-
Documentary and news broadcasting
How to write high-quality voice descriptions:
-
Core principles:
-
Be specific, not vague: Use words that describe concrete voice characteristics, such as "deep," "crisp," or "fast-paced." Avoid subjective terms lacking information, such as "nice-sounding" or "ordinary."
-
Be multidimensional, not single-dimensional: Good descriptions typically combine multiple dimensions (as described below: pitch, speed, emotion, and so on). Single-dimensional descriptions (such as just "high-pitched") are too broad to generate distinctive effects.
-
Be objective, not subjective: Focus on the physical and perceptual characteristics of the voice itself, not personal preferences. For example, you can use "slightly high-pitched with energy" instead of "my favorite voice."
-
Be original, not imitative: Describe voice characteristics rather than requesting imitation of specific people (such as celebrities or actors). Such requests involve copyright risks, and the model does not support direct imitation.
-
Be concise, not redundant: Ensure every word has meaning. Avoid repeating synonyms or using meaningless intensifiers (such as "very very great voice").
-
-
Description dimension reference: Combine multiple dimensions to create richer expressive effects.
Dimension
Description examples
Pitch
High, medium, low, slightly high, slightly low
Speed
Fast, medium, slow, slightly fast, slightly slow
Emotion
Cheerful, composed, gentle, serious, lively, calm, soothing
Characteristics
Magnetic, crisp, husky, mellow, sweet, rich, powerful
Purpose
News broadcasting, advertising voiceover, audiobooks, animation characters, voice assistants, documentary narration
-
Examples:
-
Standard broadcasting style: Clear and precise pronunciation, perfect articulation
-
Emotional progression effect: Volume quickly increases from normal conversation to shouting, straightforward personality, easily excited and expressive
-
Special emotional state: Slightly muffled pronunciation due to crying, slightly hoarse, with obvious tension from crying
-
Advertising voiceover style: Slightly high pitch, medium speed, full of energy and appeal, suitable for advertising
-
Gentle and soothing style: Slightly slow speed, gentle and sweet tone, caring and warm like a close friend
-
API reference
Feature comparison
|
Feature |
Qwen3-TTS-Instruct-Flash-Realtime |
Qwen3-TTS-VD-Realtime |
Qwen3-TTS-VC-Realtime |
Qwen3-TTS-Flash-Realtime |
Qwen-TTS-Realtime |
|
Supported languages |
Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese |
Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese |
Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese, varies by voice), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese |
Chinese, English |
|
|
Audio formats |
pcm, wav, mp3, opus |
pcm |
|||
|
Audio sample rates |
8kHz, 16kHz, 24kHz, 48kHz |
24kHz |
|||
|
Voice cloning |
|
|
|
||
|
Voice design |
|
|
|
||
|
SSML |
|
||||
|
LaTeX |
|
||||
|
Volume adjustment |
|
|
|||
|
Speed adjustment |
|
|
|||
|
Pitch adjustment |
|
|
|||
|
Bitrate adjustment |
|
|
|||
|
Timestamps |
|
||||
|
Instruct |
|
|
|||
|
Streaming input |
|
||||
|
Streaming output |
|
||||
|
Rate limits |
Requests per minute (RPM): 180 |
qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 RPM: 180 qwen3-tts-flash-realtime-2025-09-18 RPM: 10 |
RPM: 10 Tokens per minute (TPM): 100,000 |
||
|
Access methods |
Java/Python SDK, WebSocket API |
||||
|
Pricing |
International: $0.143 per 10,000 characters Mainland China: $0.143 per 10,000 characters |
International: $0.143353 per 10,000 characters Mainland China: $0.143353 per 10,000 characters |
International: $0.13 per 10,000 characters Mainland China: $0.143353 per 10,000 characters |
Mainland China:
|
|
Supported voices
Different models support different voices. Set the voice request parameter to the value listed in the voice parameter column of the voice list when making a request.
| Details | Supported languages | Supported models |
| Name: Cherry Description: A cheerful, positive, friendly, and natural young woman. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Serena Description: Gentle female | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Ethan Description: A bright, warm, energetic, and vibrant male voice with a standard Mandarin pronunciation and a slight northern accent. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Chelsie Description: 2D virtual girlfriend | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Momo Description: A playful and cute female voice designed to be cheerful. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Vivian Description: A cool, cute, and slightly feisty female voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Moon Description: Moon White (male), spirited and handsome | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Maia Description: A female voice that blends intelligence with gentleness. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Kai Description: A soothing voice that is like a spa for your ears. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Nofish Description: A male designer who cannot pronounce the 'sh' or 'zh' sounds. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Bella Description: A young girl who drinks alcohol but does not practice Drunken Fist. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Jennifer Description: A premium, cinematic American English female voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Ryan Description: A rhythmic and dramatic voice with a sense of realism and tension. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Katerina Description: A mature female voice with a rich rhythm and lingering resonance. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Aiden Description: The voice of a young American man who is skilled in cooking. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Eldric Sage Description: A calm and wise old man, with the weathered appearance of a pine tree but a mind as clear as a mirror (male) | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Mia Description: Gentle as spring water and pure as the first snow (female) | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Mochi Description: The voice of a clever and bright "little adult" who retains childlike innocence yet possesses Zen-like wisdom. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Bellona Description: A powerful and sonorous voice with clear articulation that brings characters to life and stirs passion in the listener. The clash of swords and the thunder of hooves echo in your dreams, revealing a world of countless voices through perfectly clear and resonant tones. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Vincent Description: A uniquely raspy and smoky voice that instantly evokes tales of vast armies and heroic adventures. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Bunny Description: A female character brimming with "moe" traits. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Neil Description: A professional news anchor's voice with a flat baseline intonation and precise, clear pronunciation. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Elias Description: Maintains academic rigor and uses narrative techniques to break down complex topics into digestible modules (female). | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Arthur Description: A rustic voice, weathered by time and dry tobacco, that leisurely recounts village tales and oddities. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Nini Description: A soft and sticky voice, like mochi, whose drawn-out calls of "older brother" are sweet enough to melt your bones. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Ebona Description: A whispery voice that is like a rusty key slowly turning in the darkest corners of your innermost self, where all your unacknowledged childhood shadows and unknown fears lie hidden. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Seren Description: A gentle and soothing voice to help you fall asleep faster. Good night and sweet dreams. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Pip Description: Naughty and mischievous, yet retaining a childlike innocence. Is this the Shin-chan you remember? (male) | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Stella Description: A voice that is normally sickeningly sweet and dazed, but when shouting "In the name of the moon, I'll punish you!", it instantly fills with undeniable love and justice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Bodega Description: Enthusiastic Spanish uncle | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Sonrisa Description: A warm and outgoing Latin American woman. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Alek Description: A voice that sounds cold at first, like Russia, yet is warm beneath the wool coat. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Dolce Description: A laid-back, middle-aged Italian man | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Sohee Description: A gentle, cheerful, and emotionally expressive Korean older-sister figure. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Ono Anna Description: A spirited and mischievous young woman and childhood sweetheart. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Lenn Description: Rational at the core, but rebellious in the details—a young German man who wears a suit and listens to post-punk. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Emilien Description: A romantic and mature French male | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Andre Description: A magnetic, natural, comfortable, and calm male voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Radio Gol Description: The voice of the football poet Rádio Gol! "Today I will call the football match for you using names." | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Shanghai-Jada Description: An energetic woman from Shanghai | Chinese (Shanghainese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Beijing-Dylan Description: A teenage boy who grew up in the hutongs of Beijing. | Chinese (Beijing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Nanjing-Li Description: A patient, male yoga teacher. | Chinese (Nanjing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Shaanxi-Marcus Description: A voice that is broad-faced and brief-spoken, sincere-hearted and deep-voiced—the authentic flavor of Shaanxi. | Chinese (Shaanxi dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Minnan-Roy Description: The voice of a humorous, straightforward, and lively young Taiwanese man. | Chinese (Min Nan), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Tianjin-Peter Description: The voice of a professional straight man in Tianjin crosstalk. | Chinese (Tianjin dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Sichuan-Sunny Description: The voice of a Sichuan girl whose sweetness melts your heart. | Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Sichuan-Eric Description: A man from Chengdu, Sichuan, who is detached from the mundane. | Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Cantonese-Rocky Description: The voice of the humorous and witty Rocky, here for online chatting. | Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Name: Cantonese-Kiki Description: A sweet best female friend from Hong Kong. | Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|