Model sintesis suara real-time Qwen menyediakan sintesis suara berlatensi rendah dengan input teks streaming dan output audio. Model ini menawarkan berbagai suara yang menyerupai manusia, mendukung berbagai bahasa dan dialek, serta memungkinkan penggunaan suara yang konsisten di berbagai bahasa. Model ini juga secara otomatis menyesuaikan nada dan memproses teks kompleks dengan lancar.
Fitur utama
Menghasilkan suara berkualitas tinggi secara real-time dan mendukung suara alami dalam berbagai bahasa, termasuk Tiongkok dan Inggris.
Menyediakan dua metode kustomisasi suara: voice cloning (mengkloning suara dari audio referensi) dan voice design (membuat suara dari deskripsi teks) untuk membuat suara kustom secara cepat.
Mendukung input dan output streaming untuk respons berlatensi rendah dalam skenario interaktif real-time.
Memungkinkan kontrol detail halus atas performa suara dengan menyesuaikan kecepatan, pitch, volume, dan bitrate.
Kompatibel dengan berbagai format audio utama dan mendukung output audio dengan laju sampel hingga 48 kHz.
Cakupan
Model yang didukung:
Internasional (Singapura)
Saat Anda memanggil model berikut, pilih API Key dari wilayah Singapura:
Qwen3-TTS-VD-Realtime: qwen3-tts-vd-realtime-2025-12-16 (snapshot)
Qwen3-TTS-VC-Realtime: qwen3-tts-vc-realtime-2025-11-27 (snapshot)
Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime (stable version, currently equivalent to qwen3-tts-flash-realtime-2025-11-27), qwen3-tts-flash-realtime-2025-11-27 (latest snapshot), qwen3-tts-flash-realtime-2025-09-18 (snapshot)
Daratan Tiongkok (Beijing)
Saat Anda memanggil model berikut, pilih API Key dari wilayah Beijing:
Qwen3-TTS-VD-Realtime: qwen3-tts-vd-realtime-2025-12-16 (snapshot)
Qwen3-TTS-VC-Realtime: qwen3-tts-vc-realtime-2025-11-27 (snapshot)
Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime (stable version, currently equivalent to qwen3-tts-flash-realtime-2025-11-27), qwen3-tts-flash-realtime-2025-11-27 (latest snapshot), qwen3-tts-flash-realtime-2025-09-18 (snapshot)
Qwen-TTS-Realtime: qwen-tts-realtime (stable version, currently equivalent to qwen-tts-realtime-2025-07-15), qwen-tts-realtime-latest (latest version, currently equivalent to qwen-tts-realtime-2025-07-15), qwen-tts-realtime-2025-07-15 (snapshot)
Untuk informasi lebih lanjut, lihat Models.
Pemilihan model
Skenario | Model yang direkomendasikan | Alasan | Catatan |
Kustomisasi suara untuk citra merek, suara eksklusif, atau memperluas suara sistem (berdasarkan deskripsi teks) | qwen3-tts-vd-realtime-2025-12-16 | Mendukung voice design. Membuat suara kustom dari deskripsi teks tanpa memerlukan sampel audio dan ideal untuk merancang suara merek unik dari awal. | Tidak mendukung system voices atau voice cloning. |
Kustomisasi suara untuk citra merek, suara eksklusif, atau memperluas suara sistem (berdasarkan sampel audio) | qwen3-tts-vc-realtime-2025-11-27 | Mendukung voice cloning. Mengkloning suara secara cepat dari sampel audio nyata untuk menciptakan sidik suara merek yang menyerupai manusia, memastikan kesetiaan tinggi dan konsistensi. | Voice design dan system voices tidak didukung. |
Layanan pelanggan cerdas dan bot percakapan | qwen3-tts-flash-realtime-2025-11-27 | Mendukung input dan output streaming. Kecepatan dan pitch yang dapat disesuaikan memberikan pengalaman interaktif alami. Output audio multi-format beradaptasi dengan berbagai perangkat. | Hanya system voices yang didukung. Voice cloning atau voice design tidak didukung. |
Penyiaran konten multibahasa | qwen3-tts-flash-realtime-2025-11-27 | Mendukung berbagai bahasa dan dialek Tiongkok untuk memenuhi kebutuhan pengiriman konten global. | Hanya system voices yang didukung. Voice cloning dan voice design tidak didukung. |
Pembacaan audio dan produksi konten | qwen3-tts-flash-realtime-2025-11-27 | Volume, kecepatan, dan pitch yang dapat disesuaikan memenuhi persyaratan produksi detail halus untuk konten seperti buku audio dan podcast. | Hanya system voices yang didukung. Voice cloning maupun voice design tidak didukung. |
Livestreaming E-dagang dan dubbing video pendek | qwen3-tts-flash-realtime-2025-11-27 | Mendukung format terkompresi seperti MP3 dan Opus, yang cocok untuk skenario dengan bandwidth terbatas. Parameter yang dapat disesuaikan memenuhi kebutuhan berbagai gaya dubbing. | Hanya system voices yang didukung. Voice cloning dan voice design tidak didukung. |
Untuk informasi lebih lanjut, lihat Feature comparison.
Memulai
Sebelum menjalankan kode, Anda harus mendapatkan dan mengonfigurasi Kunci API. Jika Anda menggunakan SDK untuk memanggil layanan, Anda juga harus menginstal versi terbaru SDK DashScope.
Sintesis suara menggunakan suara sistem
Contoh berikut menunjukkan cara menggunakan suara sistem untuk sintesis suara. Untuk informasi lebih lanjut, lihat Supported voices.
Menggunakan SDK DashScope
Python
Mode server_commit
import os
import base64
import threading
import time
import dashscope
from dashscope.audio.qwen_tts_realtime import *
qwen_tts_realtime: QwenTtsRealtime = None
text_to_synthesize = [
'Right? I really love this kind of supermarket,',
'especially during the Chinese New Year.',
'Going to the supermarket',
'makes me feel',
'super, super happy!',
'I want to buy so many things!'
]
DO_VIDEO_TEST = False
def init_dashscope_api_key():
"""
Set your DashScope API-key. More information:
https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
"""
# The API keys for the Singapore and Beijing regions are different. To get an API key, visit: https://www.alibabacloud.com/help/en/model-studio/get-api-key
if 'DASHSCOPE_API_KEY' in os.environ:
dashscope.api_key = os.environ[
'DASHSCOPE_API_KEY'] # load API-key from environment variable DASHSCOPE_API_KEY
else:
dashscope.api_key = 'your-dashscope-api-key' # set API-key manually
class MyCallback(QwenTtsRealtimeCallback):
def __init__(self):
self.complete_event = threading.Event()
self.file = open('result_24k.pcm', 'wb')
def on_open(self) -> None:
print('connection opened, init player')
def on_close(self, close_status_code, close_msg) -> None:
self.file.close()
print('connection closed with code: {}, msg: {}, destroy player'.format(close_status_code, close_msg))
def on_event(self, response: str) -> None:
try:
global qwen_tts_realtime
type = response['type']
if 'session.created' == type:
print('start session: {}'.format(response['session']['id']))
if 'response.audio.delta' == type:
recv_audio_b64 = response['delta']
self.file.write(base64.b64decode(recv_audio_b64))
if 'response.done' == type:
print(f'response {qwen_tts_realtime.get_last_response_id()} done')
if 'session.finished' == type:
print('session finished')
self.complete_event.set()
except Exception as e:
print('[Error] {}'.format(e))
return
def wait_for_finished(self):
self.complete_event.wait()
if __name__ == '__main__':
init_dashscope_api_key()
print('Initializing ...')
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
model='qwen3-tts-flash-realtime',
callback=callback,
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice = 'Cherry',
response_format = AudioFormat.PCM_24000HZ_MONO_16BIT,
mode = 'server_commit'
)
for text_chunk in text_to_synthesize:
print(f'send text: {text_chunk}')
qwen_tts_realtime.append_text(text_chunk)
time.sleep(0.1)
qwen_tts_realtime.finish()
callback.wait_for_finished()
print('[Metric] session: {}, first audio delay: {}'.format(
qwen_tts_realtime.get_session_id(),
qwen_tts_realtime.get_first_audio_delay(),
))
Mode commit
import base64
import os
import threading
import dashscope
from dashscope.audio.qwen_tts_realtime import *
qwen_tts_realtime: QwenTtsRealtime = None
text_to_synthesize = [
'This is the first sentence.',
'This is the second sentence.',
'This is the third sentence.',
]
DO_VIDEO_TEST = False
def init_dashscope_api_key():
"""
Set your DashScope API-key. More information:
https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
"""
# The API keys for the Singapore and Beijing regions are different. To get an API key, visit: https://www.alibabacloud.com/help/en/model-studio/get-api-key
if 'DASHSCOPE_API_KEY' in os.environ:
dashscope.api_key = os.environ[
'DASHSCOPE_API_KEY'] # load API-key from environment variable DASHSCOPE_API_KEY
else:
dashscope.api_key = 'your-dashscope-api-key' # set API-key manually
class MyCallback(QwenTtsRealtimeCallback):
def __init__(self):
super().__init__()
self.response_counter = 0
self.complete_event = threading.Event()
self.file = open(f'result_{self.response_counter}_24k.pcm', 'wb')
def reset_event(self):
self.response_counter += 1
self.file = open(f'result_{self.response_counter}_24k.pcm', 'wb')
self.complete_event = threading.Event()
def on_open(self) -> None:
print('connection opened, init player')
def on_close(self, close_status_code, close_msg) -> None:
print('connection closed with code: {}, msg: {}, destroy player'.format(close_status_code, close_msg))
def on_event(self, response: str) -> None:
try:
global qwen_tts_realtime
type = response['type']
if 'session.created' == type:
print('start session: {}'.format(response['session']['id']))
if 'response.audio.delta' == type:
recv_audio_b64 = response['delta']
self.file.write(base64.b64decode(recv_audio_b64))
if 'response.done' == type:
print(f'response {qwen_tts_realtime.get_last_response_id()} done')
self.complete_event.set()
self.file.close()
if 'session.finished' == type:
print('session finished')
self.complete_event.set()
except Exception as e:
print('[Error] {}'.format(e))
return
def wait_for_response_done(self):
self.complete_event.wait()
if __name__ == '__main__':
init_dashscope_api_key()
print('Initializing ...')
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
model='qwen3-tts-flash-realtime',
callback=callback,
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice = 'Cherry',
response_format = AudioFormat.PCM_24000HZ_MONO_16BIT,
mode = 'commit'
)
print(f'send text: {text_to_synthesize[0]}')
qwen_tts_realtime.append_text(text_to_synthesize[0])
qwen_tts_realtime.commit()
callback.wait_for_response_done()
callback.reset_event()
print(f'send text: {text_to_synthesize[1]}')
qwen_tts_realtime.append_text(text_to_synthesize[1])
qwen_tts_realtime.commit()
callback.wait_for_response_done()
callback.reset_event()
print(f'send text: {text_to_synthesize[2]}')
qwen_tts_realtime.append_text(text_to_synthesize[2])
qwen_tts_realtime.commit()
callback.wait_for_response_done()
qwen_tts_realtime.finish()
print('[Metric] session: {}, first audio delay: {}'.format(
qwen_tts_realtime.get_session_id(),
qwen_tts_realtime.get_first_audio_delay(),
))Java
Mode komit server
// The Dashscope SDK version must be 2.21.16 or later.
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.AudioSystem;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class Main {
static String[] textToSynthesize = {
"Right? I especially love this kind of supermarket.",
"Especially during the New Year.",
"Going to the supermarket.",
"It just makes me feel.",
"Super, super happy!",
"I want to buy so many things!"
};
// Real-time PCM audio player class
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private AudioFormat audioFormat;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
// The constructor initializes the audio format and audio line.
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
RawAudioBuffer.add(rawAudio);
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
playerThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
byte[] rawAudio = RawAudioBuffer.poll();
if (rawAudio != null) {
try {
playChunk(rawAudio);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
decoderThread.start();
playerThread.start();
}
// Play an audio chunk and block until playback is complete.
private void playChunk(byte[] chunk) throws IOException, InterruptedException {
if (chunk == null || chunk.length == 0) return;
int bytesWritten = 0;
while (bytesWritten < chunk.length) {
bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
}
int audioLength = chunk.length / (this.sampleRate*2/1000);
// Wait for the audio in the buffer to finish playing.
Thread.sleep(audioLength - 10);
}
public void write(String b64Audio) {
b64AudioBuffer.add(b64Audio);
}
public void cancel() {
b64AudioBuffer.clear();
RawAudioBuffer.clear();
}
public void waitForComplete() throws InterruptedException {
while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
Thread.sleep(100);
}
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) {
line.drain();
line.close();
}
}
}
public static void main(String[] args) throws InterruptedException, LineUnavailableException, FileNotFoundException {
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
.model("qwen3-tts-flash-realtime")
// The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with wss://dashscope.aliyuncs.com/api-ws/v1/realtime.
.url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
// The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
// Create a real-time audio player instance.
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
@Override
public void onOpen() {
// Handle the event when the connection is established.
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
// Handle the event when the session is created.
break;
case "response.audio.delta":
String recvAudioB64 = message.get("delta").getAsString();
// Play the audio in real time.
audioPlayer.write(recvAudioB64);
break;
case "response.done":
// Handle the event when the response is complete.
break;
case "session.finished":
// Handle the event when the session is finished.
completeLatch.get().countDown();
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
// Handle the event when the connection is closed.
}
});
qwenTtsRef.set(qwenTtsRealtime);
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice("Cherry")
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("server_commit")
.build();
qwenTtsRealtime.updateSession(config);
for (String text:textToSynthesize) {
qwenTtsRealtime.appendText(text);
Thread.sleep(100);
}
qwenTtsRealtime.finish();
completeLatch.get().await();
qwenTtsRealtime.close();
// Wait for the audio to finish playing and then shut down the player.
audioPlayer.waitForComplete();
audioPlayer.shutdown();
System.exit(0);
}
}Mode commit
// The Dashscope SDK version must be 2.21.16 or later.
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.AudioSystem;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Base64;
import java.util.Queue;
import java.util.Scanner;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class commit {
// Real-time PCM audio player class
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private AudioFormat audioFormat;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
// The constructor initializes the audio format and audio line.
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
RawAudioBuffer.add(rawAudio);
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
playerThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
byte[] rawAudio = RawAudioBuffer.poll();
if (rawAudio != null) {
try {
playChunk(rawAudio);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
decoderThread.start();
playerThread.start();
}
// Play an audio chunk and block until playback is complete.
private void playChunk(byte[] chunk) throws IOException, InterruptedException {
if (chunk == null || chunk.length == 0) return;
int bytesWritten = 0;
while (bytesWritten < chunk.length) {
bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
}
int audioLength = chunk.length / (this.sampleRate*2/1000);
// Wait for the audio in the buffer to finish playing.
Thread.sleep(audioLength - 10);
}
public void write(String b64Audio) {
b64AudioBuffer.add(b64Audio);
}
public void cancel() {
b64AudioBuffer.clear();
RawAudioBuffer.clear();
}
public void waitForComplete() throws InterruptedException {
// Wait for all audio data in the buffers to finish playing.
while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
Thread.sleep(100);
}
// Wait for the audio line to finish playing.
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) {
line.drain();
line.close();
}
}
}
public static void main(String[] args) throws InterruptedException, LineUnavailableException, FileNotFoundException {
Scanner scanner = new Scanner(System.in);
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
.model("qwen3-tts-flash-realtime")
// The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with wss://dashscope.aliyuncs.com/api-ws/v1/realtime.
.url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
// The API keys for the Singapore and China (Beijing) regions are different. To obtain an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
// Create a real-time player instance.
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
// File file = new File("result_24k.pcm");
// FileOutputStream fos = new FileOutputStream(file);
@Override
public void onOpen() {
System.out.println("connection opened");
System.out.println("Enter text and press Enter to send. Enter 'quit' to exit the program.");
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
System.out.println("start session: " + message.get("session").getAsJsonObject().get("id").getAsString());
break;
case "response.audio.delta":
String recvAudioB64 = message.get("delta").getAsString();
byte[] rawAudio = Base64.getDecoder().decode(recvAudioB64);
// fos.write(rawAudio);
// Play the audio in real time.
audioPlayer.write(recvAudioB64);
break;
case "response.done":
System.out.println("response done");
// Wait for the audio to finish playing.
try {
audioPlayer.waitForComplete();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
// Prepare for the next input.
completeLatch.get().countDown();
break;
case "session.finished":
System.out.println("session finished");
if (qwenTtsRef.get() != null) {
System.out.println("[Metric] response: " + qwenTtsRef.get().getResponseId() +
", first audio delay: " + qwenTtsRef.get().getFirstAudioDelay() + " ms");
}
completeLatch.get().countDown();
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
System.out.println("connection closed code: " + code + ", reason: " + reason);
try {
// fos.close();
// Wait for playback to complete and then shut down the player.
audioPlayer.waitForComplete();
audioPlayer.shutdown();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
});
qwenTtsRef.set(qwenTtsRealtime);
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice("Cherry")
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("commit")
.build();
qwenTtsRealtime.updateSession(config);
// Loop to read user input.
while (true) {
System.out.print("Enter the text to synthesize: ");
String text = scanner.nextLine();
// If the user enters 'quit', exit the program.
if ("quit".equalsIgnoreCase(text.trim())) {
System.out.println("Closing the connection...");
qwenTtsRealtime.finish();
completeLatch.get().await();
break;
}
// If the user input is empty, skip.
if (text.trim().isEmpty()) {
continue;
}
// Reinitialize the countdown latch.
completeLatch.set(new CountDownLatch(1));
// Send the text.
qwenTtsRealtime.appendText(text);
qwenTtsRealtime.commit();
// Wait for the current synthesis to complete.
completeLatch.get().await();
}
// Clean up resources.
audioPlayer.waitForComplete();
audioPlayer.shutdown();
scanner.close();
System.exit(0);
}
}Gunakan API WebSocket
Persiapkan lingkungan runtime
Instal pyaudio untuk sistem operasi Anda.
macOS
brew install portaudio && pip install pyaudioDebian/Ubuntu
sudo apt-get install python3-pyaudio or pip install pyaudioCentOS
sudo yum install -y portaudio portaudio-devel && pip install pyaudioWindows
pip install pyaudioSetelah instalasi, instal dependensi WebSocket menggunakan pip:
pip install websocket-client==1.8.0 websocketsBuat klien
Buat file Python lokal bernama
tts_realtime_client.pydan salin kode berikut ke dalam file tersebut:Pilih mode sintesis suara
API Realtime mendukung dua mode berikut:
server_commit mode
Klien hanya mengirim teks. Server secara cerdas menentukan cara membagi teks dan kapan melakukan sintesis. Mode ini cocok untuk skenario latensi rendah di mana Anda tidak perlu mengontrol waktu sintesis secara manual, seperti navigasi GPS.
Mode commit
Klien terlebih dahulu menambahkan teks ke buffer lalu secara aktif memicu server untuk mensintesis teks yang ditentukan. Mode ini cocok untuk skenario yang memerlukan kontrol detail halus atas jeda kalimat dan jeda, seperti penyiaran berita.
server_commit mode
Di direktori yang sama dengan
tts_realtime_client.py, buat file Python lain bernamaserver_commit.py, dan salin kode berikut ke dalam file tersebut:Jalankan
server_commit.pyuntuk mendengarkan audio yang dihasilkan secara real-time oleh API Realtime.mode commit
Di direktori yang sama dengan
tts_realtime_client.py, buat file Python lain bernamacommit.py, dan salin kode berikut ke dalam file tersebut:Jalankan
commit.py. Anda dapat memasukkan teks untuk sintesis beberapa kali. Untuk mendengarkan audio yang dikembalikan oleh API Realtime, tekan Enter pada baris kosong.
Sintesis suara menggunakan suara yang dikloning
Layanan voice cloning tidak menyediakan pratinjau audio. Untuk mendengarkan dan mengevaluasi suara yang dikloning, Anda harus menerapkannya pada sintesis suara.
Contoh berikut menunjukkan cara menggunakan suara kustom yang dihasilkan oleh voice cloning untuk sintesis suara, menghasilkan output yang sangat mirip dengan suara aslinya. Contoh ini didasarkan pada kode sampel untuk mode "server commit" dari kit pengembangan perangkat lunak (SDK) DashScope dan mengganti parameter voice dengan suara kustom yang dikloning.
Prinsip utama: Model yang digunakan untuk voice cloning (
target_model) harus sama dengan model yang digunakan untuk sintesis suara selanjutnya (model). Jika tidak, sintesis akan gagal.Contoh ini menggunakan file audio lokal
voice.mp3untuk voice cloning. Anda harus mengganti file ini dengan file audio Anda sendiri saat menjalankan kode.
Python
# coding=utf-8
# Installation instructions for pyaudio:
# APPLE Mac OS X
# brew install portaudio
# pip install pyaudio
# Debian/Ubuntu
# sudo apt-get install python-pyaudio python3-pyaudio
# or
# pip install pyaudio
# CentOS
# sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
# python -m pip install pyaudio
import pyaudio
import os
import requests
import base64
import pathlib
import threading
import time
import dashscope # The DashScope Python SDK version must be 1.23.9 or later.
from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat
# ======= Constant configuration =======
DEFAULT_TARGET_MODEL = "qwen3-tts-vc-realtime-2025-11-27" # The same model must be used for voice cloning and speech synthesis.
DEFAULT_PREFERRED_NAME = "guanyu"
DEFAULT_AUDIO_MIME_TYPE = "audio/mpeg"
VOICE_FILE_PATH = "voice.mp3" # The relative path of the local audio file for voice cloning.
TEXT_TO_SYNTHESIZE = [
'Right? I really like this kind of supermarket,',
'especially during the New Year.',
'Going to the supermarket',
'just makes me feel',
'super, super happy!',
'I want to buy so many things!'
]
def create_voice(file_path: str,
target_model: str = DEFAULT_TARGET_MODEL,
preferred_name: str = DEFAULT_PREFERRED_NAME,
audio_mime_type: str = DEFAULT_AUDIO_MIME_TYPE) -> str:
"""
Create a voice and return the voice parameter.
"""
# API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key/
# If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
raise FileNotFoundError(f"Audio file not found: {file_path}")
base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"
# The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
payload = {
"model": "qwen-voice-enrollment", # Do not modify this value.
"input": {
"action": "create",
"target_model": target_model,
"preferred_name": preferred_name,
"audio": {"data": data_uri}
}
}
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
resp = requests.post(url, json=payload, headers=headers)
if resp.status_code != 200:
raise RuntimeError(f"Failed to create voice: {resp.status_code}, {resp.text}")
try:
return resp.json()["output"]["voice"]
except (KeyError, ValueError) as e:
raise RuntimeError(f"Failed to parse voice response: {e}")
def init_dashscope_api_key():
"""
Initialize the API key for the DashScope SDK.
"""
# API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key/
# If you have not configured an environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
# ======= Callback class =======
class MyCallback(QwenTtsRealtimeCallback):
"""
Custom TTS streaming callback.
"""
def __init__(self):
self.complete_event = threading.Event()
self._player = pyaudio.PyAudio()
self._stream = self._player.open(
format=pyaudio.paInt16, channels=1, rate=24000, output=True
)
def on_open(self) -> None:
print('[TTS] Connection established')
def on_close(self, close_status_code, close_msg) -> None:
self._stream.stop_stream()
self._stream.close()
self._player.terminate()
print(f'[TTS] Connection closed code={close_status_code}, msg={close_msg}')
def on_event(self, response: dict) -> None:
try:
event_type = response.get('type', '')
if event_type == 'session.created':
print(f'[TTS] Session started: {response["session"]["id"]}')
elif event_type == 'response.audio.delta':
audio_data = base64.b64decode(response['delta'])
self._stream.write(audio_data)
elif event_type == 'response.done':
print(f'[TTS] Response complete, Response ID: {qwen_tts_realtime.get_last_response_id()}')
elif event_type == 'session.finished':
print('[TTS] Session finished')
self.complete_event.set()
except Exception as e:
print(f'[Error] Failed to process callback event: {e}')
def wait_for_finished(self):
self.complete_event.wait()
# ======= Main execution logic =======
if __name__ == '__main__':
init_dashscope_api_key()
print('[System] Initializing Qwen TTS Realtime ...')
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
model=DEFAULT_TARGET_MODEL,
callback=callback,
# The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with wss://dashscope.aliyuncs.com/api-ws/v1/realtime
url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice=create_voice(VOICE_FILE_PATH), # Replace the voice parameter with the custom voice generated by cloning.
response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
mode='server_commit'
)
for text_chunk in TEXT_TO_SYNTHESIZE:
print(f'[Send text]: {text_chunk}')
qwen_tts_realtime.append_text(text_chunk)
time.sleep(0.1)
qwen_tts_realtime.finish()
callback.wait_for_finished()
print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, '
f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')
Java
Impor dependensi Gson. Jika Anda menggunakan Maven atau Gradle, tambahkan dependensi sebagai berikut:
Maven
Tambahkan konten berikut ke file pom.xml:
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.13.1</version>
</dependency>Gradle
Tambahkan konten berikut ke file build.gradle:
// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import javax.sound.sampled.*;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class Main {
// ===== Constant definitions =====
// The same model must be used for voice cloning and speech synthesis.
private static final String TARGET_MODEL = "qwen3-tts-vc-realtime-2025-11-27";
private static final String PREFERRED_NAME = "guanyu";
// The relative path of the local audio file for voice cloning.
private static final String AUDIO_FILE = "voice.mp3";
private static final String AUDIO_MIME_TYPE = "audio/mpeg";
private static String[] textToSynthesize = {
"Right? I really like this kind of supermarket,",
"especially during the New Year.",
"Going to the supermarket",
"just makes me feel",
"super, super happy!",
"I want to buy so many things!"
};
// Generate a data URI.
public static String toDataUrl(String filePath) throws IOException {
byte[] bytes = Files.readAllBytes(Paths.get(filePath));
String encoded = Base64.getEncoder().encodeToString(bytes);
return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
}
// Call the API to create a voice.
public static String createVoice() throws Exception {
// API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key/
// If you have not configured an environment variable, replace the following line with your Model Studio API key: String apiKey = "sk-xxx"
String apiKey = System.getenv("DASHSCOPE_API_KEY");
String jsonPayload =
"{"
+ "\"model\": \"qwen-voice-enrollment\"," // Do not modify this value.
+ "\"input\": {"
+ "\"action\": \"create\","
+ "\"target_model\": \"" + TARGET_MODEL + "\","
+ "\"preferred_name\": \"" + PREFERRED_NAME + "\","
+ "\"audio\": {"
+ "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
+ "}"
+ "}"
+ "}";
HttpURLConnection con = (HttpURLConnection) new URL("https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization").openConnection();
con.setRequestMethod("POST");
con.setRequestProperty("Authorization", "Bearer " + apiKey);
con.setRequestProperty("Content-Type", "application/json");
con.setDoOutput(true);
try (OutputStream os = con.getOutputStream()) {
os.write(jsonPayload.getBytes(StandardCharsets.UTF_8));
}
int status = con.getResponseCode();
System.out.println("HTTP status code: " + status);
try (BufferedReader br = new BufferedReader(
new InputStreamReader(status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(),
StandardCharsets.UTF_8))) {
StringBuilder response = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
response.append(line);
}
System.out.println("Response content: " + response);
if (status == 200) {
JsonObject jsonObj = new Gson().fromJson(response.toString(), JsonObject.class);
return jsonObj.getAsJsonObject("output").get("voice").getAsString();
}
throw new IOException("Failed to create voice: " + status + " - " + response);
}
}
// Real-time PCM player class
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private AudioFormat audioFormat;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
// The constructor initializes the audio format and audio line.
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
RawAudioBuffer.add(rawAudio);
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
playerThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
byte[] rawAudio = RawAudioBuffer.poll();
if (rawAudio != null) {
try {
playChunk(rawAudio);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
decoderThread.start();
playerThread.start();
}
// Play an audio chunk and block until playback is complete.
private void playChunk(byte[] chunk) throws IOException, InterruptedException {
if (chunk == null || chunk.length == 0) return;
int bytesWritten = 0;
while (bytesWritten < chunk.length) {
bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
}
int audioLength = chunk.length / (this.sampleRate*2/1000);
// Wait for the audio in the buffer to finish playing.
Thread.sleep(audioLength - 10);
}
public void write(String b64Audio) {
b64AudioBuffer.add(b64Audio);
}
public void cancel() {
b64AudioBuffer.clear();
RawAudioBuffer.clear();
}
public void waitForComplete() throws InterruptedException {
while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
Thread.sleep(100);
}
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) {
line.drain();
line.close();
}
}
}
public static void main(String[] args) throws Exception {
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
.model(TARGET_MODEL)
// The following URL is for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with wss://dashscope.aliyuncs.com/api-ws/v1/realtime
.url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
// API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key/
// If you have not configured an environment variable, replace the following line with your Model Studio API key: .apikey("sk-xxx")
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
// Create a real-time audio player instance.
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
@Override
public void onOpen() {
// Handle connection establishment.
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
// Handle session creation.
break;
case "response.audio.delta":
String recvAudioB64 = message.get("delta").getAsString();
// Play the audio in real time.
audioPlayer.write(recvAudioB64);
break;
case "response.done":
// Handle response completion.
break;
case "session.finished":
// Handle session termination.
completeLatch.get().countDown();
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
// Handle connection closure.
}
});
qwenTtsRef.set(qwenTtsRealtime);
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice(createVoice()) // Replace the voice parameter with the custom voice generated by cloning.
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("server_commit")
.build();
qwenTtsRealtime.updateSession(config);
for (String text:textToSynthesize) {
qwenTtsRealtime.appendText(text);
Thread.sleep(100);
}
qwenTtsRealtime.finish();
completeLatch.get().await();
// Wait for the audio to finish playing and then shut down the player.
audioPlayer.waitForComplete();
audioPlayer.shutdown();
System.exit(0);
}
}Sintesis suara menggunakan suara yang dirancang
Saat Anda menggunakan fitur voice design, layanan mengembalikan data audio pratinjau. Anda dapat mendengarkan audio pratinjau untuk memastikan bahwa audio tersebut memenuhi kebutuhan Anda sebelum menggunakannya untuk sintesis suara. Praktik ini membantu mengurangi biaya panggilan.
Buat suara kustom dan dengarkan pratinjaunya. Jika Anda puas, lanjutkan. Jika tidak, buat ulang suaranya.
Python
import requests import base64 import os def create_voice_and_play(): # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If you haven't set an environment variable, replace the line below with: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") if not api_key: print("Error: DASHSCOPE_API_KEY environment variable not found. Please set your API key.") return None, None, None # Prepare request data headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-realtime-2025-12-16", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "en" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } } # URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" try: # Send request response = requests.post( url, headers=headers, json=data, timeout=60 # Add timeout setting ) if response.status_code == 200: result = response.json() # Get voice name voice_name = result["output"]["voice"] print(f"Voice name: {voice_name}") # Get preview audio data base64_audio = result["output"]["preview_audio"]["data"] # Decode Base64 audio data audio_bytes = base64.b64decode(base64_audio) # Save audio file locally filename = f"{voice_name}_preview.wav" # Write audio data to local file with open(filename, 'wb') as f: f.write(audio_bytes) print(f"Audio saved to local file: {filename}") print(f"File path: {os.path.abspath(filename)}") return voice_name, audio_bytes, filename else: print(f"Request failed. Status code: {response.status_code}") print(f"Response: {response.text}") return None, None, None except requests.exceptions.RequestException as e: print(f"Network request error: {e}") return None, None, None except KeyError as e: print(f"Response format error: missing required field: {e}") print(f"Response: {response.text if 'response' in locals() else 'No response'}") return None, None, None except Exception as e: print(f"Unexpected error: {e}") return None, None, None if __name__ == "__main__": print("Creating voice...") voice_name, audio_data, saved_filename = create_voice_and_play() if voice_name: print(f"\nSuccessfully created voice '{voice_name}'") print(f"Audio file saved: '{saved_filename}'") print(f"File size: {os.path.getsize(saved_filename)} bytes") else: print("\nVoice creation failed")Java
Anda perlu mengimpor dependensi Gson. Jika Anda menggunakan Maven atau Gradle, tambahkan dependensi:
Maven
Tambahkan konten berikut ke
pom.xml:<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson --> <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.13.1</version> </dependency>Gradle
Tambahkan konten berikut ke
build.gradle:// https://mvnrepository.com/artifact/com.google.code.gson/gson implementation("com.google.code.gson:gson:2.13.1")import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.*; import java.net.HttpURLConnection; import java.net.URL; import java.util.Base64; public class Main { public static void main(String[] args) { Main example = new Main(); example.createVoice(); } public void createVoice() { // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If you haven't set an environment variable, replace the line below with: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Create the JSON request body string String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"create\",\n" + " \"target_model\": \"qwen3-tts-vd-realtime-2025-12-16\",\n" + " \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" + " \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" + " \"preferred_name\": \"announcer\",\n" + " \"language\": \"en\"\n" + " },\n" + " \"parameters\": {\n" + " \"sample_rate\": 24000,\n" + " \"response_format\": \"wav\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // Set request method and headers connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // Send request body try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // Get response int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // Read response content StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // Parse JSON response JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); JsonObject outputObj = jsonResponse.getAsJsonObject("output"); JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio"); // Get voice name String voiceName = outputObj.get("voice").getAsString(); System.out.println("Voice name: " + voiceName); // Get Base64-encoded audio data String base64Audio = previewAudioObj.get("data").getAsString(); // Decode Base64 audio data byte[] audioBytes = Base64.getDecoder().decode(base64Audio); // Save audio to a local file String filename = voiceName + "_preview.wav"; saveAudioToFile(audioBytes, filename); System.out.println("Audio saved to local file: " + filename); } else { // Read error response StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("Request failed. Status code: " + responseCode); System.out.println("Error response: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("Request error: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } private void saveAudioToFile(byte[] audioBytes, String filename) { try { File file = new File(filename); try (FileOutputStream fos = new FileOutputStream(file)) { fos.write(audioBytes); } System.out.println("Audio saved to: " + file.getAbsolutePath()); } catch (IOException e) { System.err.println("Error saving audio file: " + e.getMessage()); e.printStackTrace(); } } }Gunakan suara kustom yang dihasilkan pada langkah sebelumnya untuk sintesis suara.
Contoh ini didasarkan pada "mode server commit" SDK DashScope untuk sintesis suara menggunakan suara sistem. Ganti parameter
voicedengan suara kustom yang dihasilkan oleh voice design.Prinsip Utama: Model yang digunakan selama voice design (
target_model) harus sama dengan model yang digunakan untuk sintesis suara selanjutnya (model). Jika tidak, sintesis akan gagal.Python
# coding=utf-8 # Installation instructions for pyaudio: # APPLE Mac OS X # brew install portaudio # pip install pyaudio # Debian/Ubuntu # sudo apt-get install python-pyaudio python3-pyaudio # or # pip install pyaudio # CentOS # sudo yum install -y portaudio portaudio-devel && pip install pyaudio # Microsoft Windows # python -m pip install pyaudio import pyaudio import os import base64 import threading import time import dashscope # DashScope Python SDK version 1.23.9 or later is required from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat # ======= Constant Configuration ======= TEXT_TO_SYNTHESIZE = [ 'Right? I just love this kind of supermarket,', 'especially during the New Year.', 'Going to the supermarket', 'just makes me feel', 'super, super happy!', 'I want to buy so many things!' ] def init_dashscope_api_key(): """ Initializes the DashScope SDK API key """ # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If you haven't set an environment variable, replace the line below with: dashscope.api_key = "sk-xxx" dashscope.api_key = os.getenv("DASHSCOPE_API_KEY") # ======= Callback Class ======= class MyCallback(QwenTtsRealtimeCallback): """ Custom TTS streaming callback """ def __init__(self): self.complete_event = threading.Event() self._player = pyaudio.PyAudio() self._stream = self._player.open( format=pyaudio.paInt16, channels=1, rate=24000, output=True ) def on_open(self) -> None: print('[TTS] Connection established') def on_close(self, close_status_code, close_msg) -> None: self._stream.stop_stream() self._stream.close() self._player.terminate() print(f'[TTS] Connection closed, code={close_status_code}, msg={close_msg}') def on_event(self, response: dict) -> None: try: event_type = response.get('type', '') if event_type == 'session.created': print(f'[TTS] Session started: {response["session"]["id"]}') elif event_type == 'response.audio.delta': audio_data = base64.b64decode(response['delta']) self._stream.write(audio_data) elif event_type == 'response.done': print(f'[TTS] Response complete, Response ID: {qwen_tts_realtime.get_last_response_id()}') elif event_type == 'session.finished': print('[TTS] Session finished') self.complete_event.set() except Exception as e: print(f'[Error] Exception processing callback event: {e}') def wait_for_finished(self): self.complete_event.wait() # ======= Main Execution Logic ======= if __name__ == '__main__': init_dashscope_api_key() print('[System] Initializing Qwen TTS Realtime ...') callback = MyCallback() qwen_tts_realtime = QwenTtsRealtime( # Voice design and speech synthesis must use the same model model="qwen3-tts-vd-realtime-2025-12-16", callback=callback, # URL for Singapore region. For Beijing region, use: wss://dashscope.aliyuncs.com/api-ws/v1/realtime url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime' ) qwen_tts_realtime.connect() qwen_tts_realtime.update_session( voice="myvoice", # Replace the voice parameter with the custom voice generated by voice design response_format=AudioFormat.PCM_24000HZ_MONO_16BIT, mode='server_commit' ) for text_chunk in TEXT_TO_SYNTHESIZE: print(f'[Sending text]: {text_chunk}') qwen_tts_realtime.append_text(text_chunk) time.sleep(0.1) qwen_tts_realtime.finish() callback.wait_for_finished() print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, ' f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')Java
import com.alibaba.dashscope.audio.qwen_tts_realtime.*; import com.alibaba.dashscope.exception.NoApiKeyException; import com.google.gson.JsonObject; import javax.sound.sampled.*; import java.io.*; import java.util.Base64; import java.util.Queue; import java.util.concurrent.CountDownLatch; import java.util.concurrent.atomic.AtomicReference; import java.util.concurrent.ConcurrentLinkedQueue; import java.util.concurrent.atomic.AtomicBoolean; public class Main { // ===== Constant Definitions ===== private static String[] textToSynthesize = { "Right? I just love this kind of supermarket,", "especially during the New Year.", "Going to the supermarket", "just makes me feel", "super, super happy!", "I want to buy so many things!" }; // Real-time audio player class public static class RealtimePcmPlayer { private int sampleRate; private SourceDataLine line; private AudioFormat audioFormat; private Thread decoderThread; private Thread playerThread; private AtomicBoolean stopped = new AtomicBoolean(false); private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>(); private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>(); // Constructor initializes audio format and audio line public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException { this.sampleRate = sampleRate; this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false); DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat); line = (SourceDataLine) AudioSystem.getLine(info); line.open(audioFormat); line.start(); decoderThread = new Thread(new Runnable() { @Override public void run() { while (!stopped.get()) { String b64Audio = b64AudioBuffer.poll(); if (b64Audio != null) { byte[] rawAudio = Base64.getDecoder().decode(b64Audio); RawAudioBuffer.add(rawAudio); } else { try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); } } } } }); playerThread = new Thread(new Runnable() { @Override public void run() { while (!stopped.get()) { byte[] rawAudio = RawAudioBuffer.poll(); if (rawAudio != null) { try { playChunk(rawAudio); } catch (IOException e) { throw new RuntimeException(e); } catch (InterruptedException e) { throw new RuntimeException(e); } } else { try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); } } } } }); decoderThread.start(); playerThread.start(); } // Plays an audio chunk and blocks until playback is complete private void playChunk(byte[] chunk) throws IOException, InterruptedException { if (chunk == null || chunk.length == 0) return; int bytesWritten = 0; while (bytesWritten < chunk.length) { bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten); } int audioLength = chunk.length / (this.sampleRate*2/1000); // Wait for the audio in the buffer to finish playing Thread.sleep(audioLength - 10); } public void write(String b64Audio) { b64AudioBuffer.add(b64Audio); } public void cancel() { b64AudioBuffer.clear(); RawAudioBuffer.clear(); } public void waitForComplete() throws InterruptedException { while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) { Thread.sleep(100); } line.drain(); } public void shutdown() throws InterruptedException { stopped.set(true); decoderThread.join(); playerThread.join(); if (line != null && line.isRunning()) { line.drain(); line.close(); } } } public static void main(String[] args) throws Exception { QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder() // Voice design and speech synthesis must use the same model .model("qwen3-tts-vd-realtime-2025-12-16") // URL for Singapore region. For Beijing region, use: wss://dashscope.aliyuncs.com/api-ws/v1/realtime .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime") // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If you haven't set an environment variable, replace the line below with: .apikey("sk-xxx") .apikey(System.getenv("DASHSCOPE_API_KEY")) .build(); AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1)); final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null); // Create a real-time audio player instance RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000); QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() { @Override public void onOpen() { // Handle connection open } @Override public void onEvent(JsonObject message) { String type = message.get("type").getAsString(); switch(type) { case "session.created": // Handle session creation break; case "response.audio.delta": String recvAudioB64 = message.get("delta").getAsString(); // Play audio in real time audioPlayer.write(recvAudioB64); break; case "response.done": // Handle response completion break; case "session.finished": // Handle session finish completeLatch.get().countDown(); default: break; } } @Override public void onClose(int code, String reason) { // Handle connection close } }); qwenTtsRef.set(qwenTtsRealtime); try { qwenTtsRealtime.connect(); } catch (NoApiKeyException e) { throw new RuntimeException(e); } QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder() .voice("myvoice") // Replace the voice parameter with the custom voice generated by voice design .response_format(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT) .mode("server_commit") .build(); qwenTtsRealtime.updateSession(config); for (String text:textToSynthesize) { qwenTtsRealtime.appendText(text); Thread.sleep(100); } qwenTtsRealtime.finish(); completeLatch.get().await(); // Wait for audio playback to complete and then shut down the player audioPlayer.waitForComplete(); audioPlayer.shutdown(); System.exit(0); } }
Untuk kode sampel lainnya, lihat GitHub.
Alur interaksi
Mode Server_commit
Atur session.mode dari event session.update ke "server_commit" untuk mengaktifkan mode ini. Server kemudian secara otomatis mengelola waktu untuk segmentasi teks dan sintesis.
Alur interaksinya adalah sebagai berikut:
Saat klien mengirim event
session.update, server merespons dengan eventsession.createddansession.updated.Klien menggunakan event
input_text_buffer.appenduntuk menambahkan teks ke buffer sisi server.Server secara cerdas mengelola waktu segmentasi dan sintesis teks, mengembalikan event
response.created,response.output_item.added,response.content_part.added, danresponse.audio.delta.Server mengirim event
response.audio.done,response.content_part.done,response.output_item.done, danresponse.donesetelah menyelesaikan respons.Server mengakhiri sesi dengan mengirim event
session.finished.
Siklus hidup | Event klien | Event server |
Inisialisasi sesi | session.update Konfigurasi sesi | session.created Sesi dibuat session.updated Konfigurasi sesi diperbarui |
Input teks pengguna | input_text_buffer.append Menambahkan teks ke server input_text_buffer.commit Segera mensintesis teks yang di-cache di server session.finish Memberi tahu server bahwa tidak ada lagi input teks | input_text_buffer.committed Server menerima teks yang dikirim |
Output audio server | Tidak ada | response.created Server mulai menghasilkan respons response.output_item.added Konten output baru tersedia dalam respons response.content_part.added Konten output baru ditambahkan ke pesan asisten response.audio.delta Audio yang dihasilkan secara inkremental dari model response.content_part.done Streaming konten teks atau audio untuk pesan asisten selesai response.output_item.done Streaming seluruh item output untuk pesan asisten selesai response.audio.done Generasi audio selesai response.done Respons selesai |
Mode Commit
Atur session.mode untuk event session.update ke "commit" untuk mengaktifkan mode ini. Dalam mode ini, klien harus mengirimkan buffer teks ke server untuk menerima respons.
Alur interaksinya adalah sebagai berikut:
Saat klien mengirim event
session.update, server merespons dengan eventsession.createddansession.updated.Klien menambahkan teks ke buffer sisi server dengan mengirim event
input_text_buffer.append.Klien mengirim event
input_text_buffer.commituntuk mengirimkan buffer ke server dan eventsession.finishuntuk menunjukkan bahwa input teks telah selesai.Server mengirim event
response.createduntuk memulai generasi respons.Server mengirim event
response.output_item.added,response.content_part.added, danresponse.audio.delta.Setelah server merespons, server mengirim event
response.audio.done,response.content_part.done,response.output_item.done, danresponse.done.Server mengirim event
session.finished, yang mengakhiri sesi.
Siklus hidup | Event klien | Event server |
Inisialisasi sesi | session.update Konfigurasi sesi | session.created Sesi dibuat session.updated Konfigurasi sesi diperbarui |
Input teks pengguna | input_text_buffer.append Menambahkan teks ke buffer input_text_buffer.commit Mengirimkan buffer ke server input_text_buffer.clear Menghapus buffer | input_text_buffer.committed Server menerima teks yang dikirimkan |
Output audio server | Tidak ada | response.created Server mulai menghasilkan respons response.output_item.added Konten output baru tersedia dalam respons response.content_part.added Konten output baru ditambahkan ke pesan asisten response.audio.delta Audio yang dihasilkan secara inkremental dari model response.content_part.done Streaming konten teks atau audio untuk pesan asisten selesai response.output_item.done Streaming seluruh item output untuk pesan asisten selesai response.audio.done Generasi audio selesai response.done Respons selesai |
Referensi API
Perbandingan fitur
Fitur | qwen3-tts-vd-realtime-2025-12-16 | qwen3-tts-vc-realtime-2025-11-27 | qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 | qwen-tts-realtime, qwen-tts-realtime-latest, qwen-tts-realtime-2025-07-15 |
Bahasa yang didukung | Tiongkok, Inggris, Spanyol, Rusia, Italia, Prancis, Korea, Jepang, Jerman, dan Portugis | Tiongkok (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, dan Kanton, bervariasi tergantung voice), Inggris, Spanyol, Rusia, Italia, Prancis, Korea, Jepang, Jerman, dan Portugis | Tiongkok dan Inggris | |
Format audio | pcm, wav, mp3, dan opus | pcm | ||
Laju sampling audio | 8 kHz, 16 kHz, 24 kHz, dan 48 kHz | 24 kHz | ||
Voice cloning | ||||
Voice design | ||||
SSML | ||||
LaTeX | ||||
Penyesuaian volume | ||||
Penyesuaian kecepatan | ||||
Penyesuaian nada | ||||
Penyesuaian bitrate | ||||
Timestamp | ||||
Pengaturan emosi | ||||
Streaming Input | ||||
Output streaming | ||||
Rate limit | Permintaan per menit (RPM): 180 | qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 RPM: 180 qwen3-tts-flash-realtime-2025-09-18 RPM: 10 | RPM: 10 Token per menit (TPM): 100.000 | |
Metode akses | Java/Python/ SDK, WebSocket API | |||
Harga | Internasional (Singapura): $0,143353 per 10.000 karakter Daratan Tiongkok (Beijing): $0,143353 per 10.000 karakter | Internasional (Singapura): $0,13 per 10.000 karakter Daratan Tiongkok (Beijing): $0,143353 per 10.000 karakter | Daratan Tiongkok (Beijing):
| |
Suara yang didukung
Suara yang didukung bervariasi tergantung model. Atur parameter permintaan voice ke nilai yang sesuai dari kolom parameter suara dalam tabel.