即時語音合成-通義千問提供低延遲、流式文本輸入與流式音訊輸出能力,提供多種擬人音色,支援多語種/方言合成,可在同一音色下輸出多語種,並能自適應調節語氣,流暢處理複雜文本。
核心功能
即時產生高保真語音,支援中英等多語種自然發聲
提供聲音複刻(基於參考音頻複刻音色)與聲音設計(通過文本描述產生音色)兩種音色定製方式,快速定製個人化音色
支援流式輸入輸出,低延遲響應即時互動情境
可調節語速、語調、音量與碼率,精細控制語音表現
相容主流音頻格式,最高支援48kHz採樣率輸出
適用範圍
支援的地區:
支援的模型
國際(新加坡)
通義千問3-TTS-VD-Realtime:qwen3-tts-vd-realtime-2025-12-16(快照版)
通義千問3-TTS-VC-Realtime:qwen3-tts-vc-realtime-2025-11-27(快照版)
通義千問3-TTS-Flash-Realtime:qwen3-tts-flash-realtime(穩定版,當前等同qwen3-tts-flash-realtime-2025-09-18)、qwen3-tts-flash-realtime-2025-11-27(最新快照版)、qwen3-tts-flash-realtime-2025-09-18(快照版)
中國大陸(北京)
通義千問3-TTS-VD-Realtime:qwen3-tts-vd-realtime-2025-12-16(快照版)
通義千問3-TTS-VC-Realtime:qwen3-tts-vc-realtime-2025-11-27(快照版)
通義千問3-TTS-Flash-Realtime:qwen3-tts-flash-realtime(穩定版,當前等同qwen3-tts-flash-realtime-2025-09-18)、qwen3-tts-flash-realtime-2025-11-27(最新快照版)、qwen3-tts-flash-realtime-2025-09-18(快照版)
通義千問-TTS-Realtime:qwen-tts-realtime(穩定版,當前等同qwen-tts-realtime-2025-07-15)、qwen-tts-realtime-latest(最新版,當前等同qwen-tts-realtime-2025-07-15)、qwen-tts-realtime-2025-07-15(快照版)
更多資訊請參見模型列表
模型選型
情境 | 推薦模型 | 理由 | 注意事項 |
品牌形象、專屬聲音、擴充系統音色等語音定製(基於文本描述) | qwen3-tts-vd-realtime-2025-12-16 | 支援聲音設計,無需音頻樣本,通過文本描述建立定製化音色,適合從零開始設計品牌專屬聲音 | 不支援使用系統音色,不支援聲音複刻 |
品牌形象、專屬聲音、擴充系統音色等語音定製(基於音頻樣本) | qwen3-tts-vc-realtime-2025-11-27 | 支援聲音複刻,基於真實音頻樣本快速複刻音色,打造擬人化品牌聲紋,確保音色高度還原與一致性 | 不支援使用系統音色,不支援聲音設計 |
智能客服與對話機器人 | qwen3-tts-flash-realtime-2025-11-27 | 支援流式輸入輸出,可調節語速音高,提供自然互動體驗;多音頻格式輸出適配不同終端 | 僅支援系統音色,不支援聲音複刻/設計 |
多語種內容播報 | qwen3-tts-flash-realtime-2025-11-27 | 支援多種語言與中文方言,覆蓋全球化內容分發需求 | 僅支援系統音色,不支援聲音複刻/設計 |
有聲閱讀與內容生產 | qwen3-tts-flash-realtime-2025-11-27 | 可調節音量、語速、音高,滿足有聲書、播客等內容精細化製作需求 | 僅支援系統音色,不支援聲音複刻/設計 |
電商直播與短視頻配音 | qwen3-tts-flash-realtime-2025-11-27 | 支援 mp3/opus 壓縮格式,適合頻寬受限情境;可調節參數滿足不同風格配音需求 | 僅支援系統音色,不支援聲音複刻/設計 |
更多說明請參見模型功能特性對比
快速開始
運行代碼前,需要擷取並配置 API Key。如果通過SDK調用,還需要安裝最新版DashScope SDK。
使用系統音色進行語音合成
以下樣本示範如何使用系統音色(參見音色列表)進行語音合成。
使用DashScope SDK
Python
server commit模式
import os
import base64
import threading
import time
import dashscope
from dashscope.audio.qwen_tts_realtime import *
qwen_tts_realtime: QwenTtsRealtime = None
text_to_synthesize = [
'對吧~我就特別喜歡這種超市,',
'尤其是過年的時候',
'去逛超市',
'就會覺得',
'超級超級開心!',
'想買好多好多的東西呢!'
]
DO_VIDEO_TEST = False
def init_dashscope_api_key():
"""
Set your DashScope API-key. More information:
https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
"""
# 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
if 'DASHSCOPE_API_KEY' in os.environ:
dashscope.api_key = os.environ[
'DASHSCOPE_API_KEY'] # load API-key from environment variable DASHSCOPE_API_KEY
else:
dashscope.api_key = 'your-dashscope-api-key' # set API-key manually
class MyCallback(QwenTtsRealtimeCallback):
def __init__(self):
self.complete_event = threading.Event()
self.file = open('result_24k.pcm', 'wb')
def on_open(self) -> None:
print('connection opened, init player')
def on_close(self, close_status_code, close_msg) -> None:
self.file.close()
print('connection closed with code: {}, msg: {}, destroy player'.format(close_status_code, close_msg))
def on_event(self, response: str) -> None:
try:
global qwen_tts_realtime
type = response['type']
if 'session.created' == type:
print('start session: {}'.format(response['session']['id']))
if 'response.audio.delta' == type:
recv_audio_b64 = response['delta']
self.file.write(base64.b64decode(recv_audio_b64))
if 'response.done' == type:
print(f'response {qwen_tts_realtime.get_last_response_id()} done')
if 'session.finished' == type:
print('session finished')
self.complete_event.set()
except Exception as e:
print('[Error] {}'.format(e))
return
def wait_for_finished(self):
self.complete_event.wait()
if __name__ == '__main__':
init_dashscope_api_key()
print('Initializing ...')
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
model='qwen3-tts-flash-realtime',
callback=callback,
# 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:wss://dashscope.aliyuncs.com/api-ws/v1/realtime
url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice = 'Cherry',
response_format = AudioFormat.PCM_24000HZ_MONO_16BIT,
mode = 'server_commit'
)
for text_chunk in text_to_synthesize:
print(f'send texd: {text_chunk}')
qwen_tts_realtime.append_text(text_chunk)
time.sleep(0.1)
qwen_tts_realtime.finish()
callback.wait_for_finished()
print('[Metric] session: {}, first audio delay: {}'.format(
qwen_tts_realtime.get_session_id(),
qwen_tts_realtime.get_first_audio_delay(),
))
commit模式
import base64
import os
import threading
import dashscope
from dashscope.audio.qwen_tts_realtime import *
qwen_tts_realtime: QwenTtsRealtime = None
text_to_synthesize = [
'這是第一句話。',
'這是第二句話。',
'這是第三句話。',
]
DO_VIDEO_TEST = False
def init_dashscope_api_key():
"""
Set your DashScope API-key. More information:
https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
"""
# 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
if 'DASHSCOPE_API_KEY' in os.environ:
dashscope.api_key = os.environ[
'DASHSCOPE_API_KEY'] # load API-key from environment variable DASHSCOPE_API_KEY
else:
dashscope.api_key = 'your-dashscope-api-key' # set API-key manually
class MyCallback(QwenTtsRealtimeCallback):
def __init__(self):
super().__init__()
self.response_counter = 0
self.complete_event = threading.Event()
self.file = open(f'result_{self.response_counter}_24k.pcm', 'wb')
def reset_event(self):
self.response_counter += 1
self.file = open(f'result_{self.response_counter}_24k.pcm', 'wb')
self.complete_event = threading.Event()
def on_open(self) -> None:
print('connection opened, init player')
def on_close(self, close_status_code, close_msg) -> None:
print('connection closed with code: {}, msg: {}, destroy player'.format(close_status_code, close_msg))
def on_event(self, response: str) -> None:
try:
global qwen_tts_realtime
type = response['type']
if 'session.created' == type:
print('start session: {}'.format(response['session']['id']))
if 'response.audio.delta' == type:
recv_audio_b64 = response['delta']
self.file.write(base64.b64decode(recv_audio_b64))
if 'response.done' == type:
print(f'response {qwen_tts_realtime.get_last_response_id()} done')
self.complete_event.set()
self.file.close()
if 'session.finished' == type:
print('session finished')
self.complete_event.set()
except Exception as e:
print('[Error] {}'.format(e))
return
def wait_for_response_done(self):
self.complete_event.wait()
if __name__ == '__main__':
init_dashscope_api_key()
print('Initializing ...')
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
model='qwen3-tts-flash-realtime',
callback=callback,
# 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:wss://dashscope.aliyuncs.com/api-ws/v1/realtime
url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice = 'Cherry',
response_format = AudioFormat.PCM_24000HZ_MONO_16BIT,
mode = 'commit'
)
print(f'send texd: {text_to_synthesize[0]}')
qwen_tts_realtime.append_text(text_to_synthesize[0])
qwen_tts_realtime.commit()
callback.wait_for_response_done()
callback.reset_event()
print(f'send texd: {text_to_synthesize[1]}')
qwen_tts_realtime.append_text(text_to_synthesize[1])
qwen_tts_realtime.commit()
callback.wait_for_response_done()
callback.reset_event()
print(f'send texd: {text_to_synthesize[2]}')
qwen_tts_realtime.append_text(text_to_synthesize[2])
qwen_tts_realtime.commit()
callback.wait_for_response_done()
qwen_tts_realtime.finish()
print('[Metric] session: {}, first audio delay: {}'.format(
qwen_tts_realtime.get_session_id(),
qwen_tts_realtime.get_first_audio_delay(),
))Java
server commit模式
// Dashscope SDK 版本不低於2.21.16
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.AudioSystem;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class Main {
static String[] textToSynthesize = {
"對吧~我就特別喜歡這種超市",
"尤其是過年的時候",
"去逛超市",
"就會覺得",
"超級超級開心!",
"想買好多好多的東西呢!"
};
// 即時PCM音頻播放器類
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private AudioFormat audioFormat;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
// 建構函式初始化音頻格式和音頻線路
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
RawAudioBuffer.add(rawAudio);
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
playerThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
byte[] rawAudio = RawAudioBuffer.poll();
if (rawAudio != null) {
try {
playChunk(rawAudio);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
decoderThread.start();
playerThread.start();
}
// 播放一個音頻塊並阻塞直到播放完成
private void playChunk(byte[] chunk) throws IOException, InterruptedException {
if (chunk == null || chunk.length == 0) return;
int bytesWritten = 0;
while (bytesWritten < chunk.length) {
bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
}
int audioLength = chunk.length / (this.sampleRate*2/1000);
// 等待緩衝區中的音頻播放完成
Thread.sleep(audioLength - 10);
}
public void write(String b64Audio) {
b64AudioBuffer.add(b64Audio);
}
public void cancel() {
b64AudioBuffer.clear();
RawAudioBuffer.clear();
}
public void waitForComplete() throws InterruptedException {
while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
Thread.sleep(100);
}
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) {
line.drain();
line.close();
}
}
}
public static void main(String[] args) throws InterruptedException, LineUnavailableException, FileNotFoundException {
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
.model("qwen3-tts-flash-realtime")
// 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:wss://dashscope.aliyuncs.com/api-ws/v1/realtime
.url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
// 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
// 建立即時音頻播放器執行個體
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
@Override
public void onOpen() {
// 串連建立時的處理
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
// 會話建立時的處理
break;
case "response.audio.delta":
String recvAudioB64 = message.get("delta").getAsString();
// 即時播放音頻
audioPlayer.write(recvAudioB64);
break;
case "response.done":
// 響應完成時的處理
break;
case "session.finished":
// 會話結束時的處理
completeLatch.get().countDown();
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
// 串連關閉時的處理
}
});
qwenTtsRef.set(qwenTtsRealtime);
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice("Cherry")
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("server_commit")
.build();
qwenTtsRealtime.updateSession(config);
for (String text:textToSynthesize) {
qwenTtsRealtime.appendText(text);
Thread.sleep(100);
}
qwenTtsRealtime.finish();
completeLatch.get().await();
qwenTtsRealtime.close();
// 等待音頻播放完成並關閉播放器
audioPlayer.waitForComplete();
audioPlayer.shutdown();
System.exit(0);
}
}commit模式
// Dashscope SDK 版本不低於2.21.16
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.AudioSystem;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.Base64;
import java.util.Queue;
import java.util.Scanner;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class commit {
// 即時PCM音頻播放器類
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private AudioFormat audioFormat;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
// 建構函式初始化音頻格式和音頻線路
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
RawAudioBuffer.add(rawAudio);
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
playerThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
byte[] rawAudio = RawAudioBuffer.poll();
if (rawAudio != null) {
try {
playChunk(rawAudio);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
decoderThread.start();
playerThread.start();
}
// 播放一個音頻塊並阻塞直到播放完成
private void playChunk(byte[] chunk) throws IOException, InterruptedException {
if (chunk == null || chunk.length == 0) return;
int bytesWritten = 0;
while (bytesWritten < chunk.length) {
bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
}
int audioLength = chunk.length / (this.sampleRate*2/1000);
// 等待緩衝區中的音頻播放完成
Thread.sleep(audioLength - 10);
}
public void write(String b64Audio) {
b64AudioBuffer.add(b64Audio);
}
public void cancel() {
b64AudioBuffer.clear();
RawAudioBuffer.clear();
}
public void waitForComplete() throws InterruptedException {
// 等待所有緩衝區中的音頻資料播放完成
while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
Thread.sleep(100);
}
// 等待音頻線路播放完成
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) {
line.drain();
line.close();
}
}
}
public static void main(String[] args) throws InterruptedException, LineUnavailableException, FileNotFoundException {
Scanner scanner = new Scanner(System.in);
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
.model("qwen3-tts-flash-realtime")
// 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:wss://dashscope.aliyuncs.com/api-ws/v1/realtime
.url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
// 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
// 建立即時播放器執行個體
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
// File file = new File("result_24k.pcm");
// FileOutputStream fos = new FileOutputStream(file);
@Override
public void onOpen() {
System.out.println("connection opened");
System.out.println("輸入文本並按Enter發送,輸入'quit'退出程式");
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
System.out.println("start session: " + message.get("session").getAsJsonObject().get("id").getAsString());
break;
case "response.audio.delta":
String recvAudioB64 = message.get("delta").getAsString();
byte[] rawAudio = Base64.getDecoder().decode(recvAudioB64);
// fos.write(rawAudio);
// 即時播放音頻
audioPlayer.write(recvAudioB64);
break;
case "response.done":
System.out.println("response done");
// 等待音頻播放完成
try {
audioPlayer.waitForComplete();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
// 為下一次輸入做準備
completeLatch.get().countDown();
break;
case "session.finished":
System.out.println("session finished");
if (qwenTtsRef.get() != null) {
System.out.println("[Metric] response: " + qwenTtsRef.get().getResponseId() +
", first audio delay: " + qwenTtsRef.get().getFirstAudioDelay() + " ms");
}
completeLatch.get().countDown();
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
System.out.println("connection closed code: " + code + ", reason: " + reason);
try {
// fos.close();
// 等待播放完成並關閉播放器
audioPlayer.waitForComplete();
audioPlayer.shutdown();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
});
qwenTtsRef.set(qwenTtsRealtime);
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice("Cherry")
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("commit")
.build();
qwenTtsRealtime.updateSession(config);
// 迴圈讀取使用者輸入
while (true) {
System.out.print("請輸入要合成的文本: ");
String text = scanner.nextLine();
// 如果使用者輸入quit,則退出程式
if ("quit".equalsIgnoreCase(text.trim())) {
System.out.println("正在關閉串連...");
qwenTtsRealtime.finish();
completeLatch.get().await();
break;
}
// 如果使用者輸入為空白,跳過
if (text.trim().isEmpty()) {
continue;
}
// 重新初始化倒計時鎖存器
completeLatch.set(new CountDownLatch(1));
// 發送文本
qwenTtsRealtime.appendText(text);
qwenTtsRealtime.commit();
// 等待本次合成完成
completeLatch.get().await();
}
// 清理資源
audioPlayer.waitForComplete();
audioPlayer.shutdown();
scanner.close();
System.exit(0);
}
}使用WebSocket API
準備運行環境
根據您的作業系統安裝 pyaudio。
macOS
brew install portaudio && pip install pyaudioDebian/Ubuntu
sudo apt-get install python3-pyaudio 或者 pip install pyaudioCentOS
sudo yum install -y portaudio portaudio-devel && pip install pyaudioWindows
pip install pyaudio安裝完成後,通過 pip 安裝 websocket 相關的依賴:
pip install websocket-client==1.8.0 websockets建立用戶端
在本地建立 python 檔案,命名為
tts_realtime_client.py並複製以下代碼到檔案中:選擇語音合成模式
Realtime API 支援以下兩種模式:
server_commit 模式
用戶端僅發送文本。服務端會智能判斷文本分段方式與合成時機。適合低延遲且無需手動控制合成節奏的情境,例如 GPS 導航。
commit 模式
用戶端先將文本添加至緩衝區,再主動觸發服務端合成指定文本。適合需精細控制斷句和停頓的情境,例如新聞播報。
server_commit 模式
在
tts_realtime_client.py的同級目錄下建立另一個 Python 檔案,命名為server_commit.py,並將以下代碼複製進檔案中:運行
server_commit.py,即可聽到 Realtime API 即時產生的音頻。commit 模式
在
tts_realtime_client.py的同級目錄下建立另一個 python 檔案,命名為commit.py,並將以下代碼複製進檔案中:運行
commit.py,可多次輸入要合成的文本。在未輸入文本的情況下單擊 Enter 鍵,您將從擴音器聽到 Realtime API 返回的音頻。
使用聲音複刻音色進行語音合成
聲音複刻服務不提供預覽音頻。需將複刻產生的音色應用於語音合成後,才能試聽並評估效果。
以下樣本示範了如何在語音合成中使用聲音複刻產生的專屬音色,實現與原音高度相似的輸出效果。這裡參考了使用系統音色進行語音合成DashScope SDK的“server commit模式”範例程式碼,將voice參數替換為複刻產生的專屬音色進行語音合成。
關鍵原則:聲音複刻時使用的模型 (
target_model) 必須與後續進行語音合成時使用的模型 (model) 保持一致,否則會導致合成失敗。樣本使用本地音頻檔案
voice.mp3進行聲音複刻,運行代碼時,請注意替換。
Python
# coding=utf-8
# Installation instructions for pyaudio:
# APPLE Mac OS X
# brew install portaudio
# pip install pyaudio
# Debian/Ubuntu
# sudo apt-get install python-pyaudio python3-pyaudio
# or
# pip install pyaudio
# CentOS
# sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
# python -m pip install pyaudio
import pyaudio
import os
import requests
import base64
import pathlib
import threading
import time
import dashscope # DashScope Python SDK 版本需要不低於1.23.9
from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat
# ======= 常量配置 =======
DEFAULT_TARGET_MODEL = "qwen3-tts-vc-realtime-2025-11-27" # 聲音複刻、語音合成要使用相同的模型
DEFAULT_PREFERRED_NAME = "guanyu"
DEFAULT_AUDIO_MIME_TYPE = "audio/mpeg"
VOICE_FILE_PATH = "voice.mp3" # 用於聲音複刻的本地音頻檔案的相對路徑
TEXT_TO_SYNTHESIZE = [
'對吧~我就特別喜歡這種超市,',
'尤其是過年的時候',
'去逛超市',
'就會覺得',
'超級超級開心!',
'想買好多好多的東西呢!'
]
def create_voice(file_path: str,
target_model: str = DEFAULT_TARGET_MODEL,
preferred_name: str = DEFAULT_PREFERRED_NAME,
audio_mime_type: str = DEFAULT_AUDIO_MIME_TYPE) -> str:
"""
建立音色,並返回 voice 參數
"""
# 新加坡地區和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# 若沒有配置環境變數,請用百鍊API Key將下行替換為:api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
raise FileNotFoundError(f"音頻檔案不存在: {file_path}")
base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"
# 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
payload = {
"model": "qwen-voice-enrollment", # 不要修改該值
"input": {
"action": "create",
"target_model": target_model,
"preferred_name": preferred_name,
"audio": {"data": data_uri}
}
}
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
resp = requests.post(url, json=payload, headers=headers)
if resp.status_code != 200:
raise RuntimeError(f"建立 voice 失敗: {resp.status_code}, {resp.text}")
try:
return resp.json()["output"]["voice"]
except (KeyError, ValueError) as e:
raise RuntimeError(f"解析 voice 響應失敗: {e}")
def init_dashscope_api_key():
"""
初始化 dashscope SDK 的 API key
"""
# 新加坡地區和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# 若沒有配置環境變數,請用百鍊API Key將下行替換為:dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
# ======= 回調類 =======
class MyCallback(QwenTtsRealtimeCallback):
"""
自訂 TTS 流式回調
"""
def __init__(self):
self.complete_event = threading.Event()
self._player = pyaudio.PyAudio()
self._stream = self._player.open(
format=pyaudio.paInt16, channels=1, rate=24000, output=True
)
def on_open(self) -> None:
print('[TTS] 串連已建立')
def on_close(self, close_status_code, close_msg) -> None:
self._stream.stop_stream()
self._stream.close()
self._player.terminate()
print(f'[TTS] 串連關閉 code={close_status_code}, msg={close_msg}')
def on_event(self, response: dict) -> None:
try:
event_type = response.get('type', '')
if event_type == 'session.created':
print(f'[TTS] 會話開始: {response["session"]["id"]}')
elif event_type == 'response.audio.delta':
audio_data = base64.b64decode(response['delta'])
self._stream.write(audio_data)
elif event_type == 'response.done':
print(f'[TTS] 響應完成, Response ID: {qwen_tts_realtime.get_last_response_id()}')
elif event_type == 'session.finished':
print('[TTS] 會話結束')
self.complete_event.set()
except Exception as e:
print(f'[Error] 處理回調事件異常: {e}')
def wait_for_finished(self):
self.complete_event.wait()
# ======= 主執行邏輯 =======
if __name__ == '__main__':
init_dashscope_api_key()
print('[系統] 初始化 Qwen TTS Realtime ...')
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
model=DEFAULT_TARGET_MODEL,
callback=callback,
# 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:wss://dashscope.aliyuncs.com/api-ws/v1/realtime
url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice=create_voice(VOICE_FILE_PATH), # 將voice參數替換為複刻產生的專屬音色
response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
mode='server_commit'
)
for text_chunk in TEXT_TO_SYNTHESIZE:
print(f'[發送文本]: {text_chunk}')
qwen_tts_realtime.append_text(text_chunk)
time.sleep(0.1)
qwen_tts_realtime.finish()
callback.wait_for_finished()
print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, '
f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')
Java
需要匯入Gson依賴,若是使用Maven或者Gradle,添加依賴方式如下:
Maven
在pom.xml中添加如下內容:
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.13.1</version>
</dependency>Gradle
在build.gradle中添加如下內容:
// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import javax.sound.sampled.*;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class Main {
// ===== 常量定義 =====
// 聲音複刻、語音合成要使用相同的模型
private static final String TARGET_MODEL = "qwen3-tts-vc-realtime-2025-11-27";
private static final String PREFERRED_NAME = "guanyu";
// 用於聲音複刻的本地音頻檔案的相對路徑
private static final String AUDIO_FILE = "voice.mp3";
private static final String AUDIO_MIME_TYPE = "audio/mpeg";
private static String[] textToSynthesize = {
"對吧~我就特別喜歡這種超市",
"尤其是過年的時候",
"去逛超市",
"就會覺得",
"超級超級開心!",
"想買好多好多的東西呢!"
};
// 產生 data URI
public static String toDataUrl(String filePath) throws IOException {
byte[] bytes = Files.readAllBytes(Paths.get(filePath));
String encoded = Base64.getEncoder().encodeToString(bytes);
return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
}
// 調用 API 建立 voice
public static String createVoice() throws Exception {
// 新加坡地區和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// 若沒有配置環境變數,請用百鍊API Key將下行替換為:String apiKey = "sk-xxx"
String apiKey = System.getenv("DASHSCOPE_API_KEY");
String jsonPayload =
"{"
+ "\"model\": \"qwen-voice-enrollment\"," // 不要修改該值
+ "\"input\": {"
+ "\"action\": \"create\","
+ "\"target_model\": \"" + TARGET_MODEL + "\","
+ "\"preferred_name\": \"" + PREFERRED_NAME + "\","
+ "\"audio\": {"
+ "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
+ "}"
+ "}"
+ "}";
HttpURLConnection con = (HttpURLConnection) new URL("https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization").openConnection();
con.setRequestMethod("POST");
con.setRequestProperty("Authorization", "Bearer " + apiKey);
con.setRequestProperty("Content-Type", "application/json");
con.setDoOutput(true);
try (OutputStream os = con.getOutputStream()) {
os.write(jsonPayload.getBytes(StandardCharsets.UTF_8));
}
int status = con.getResponseCode();
System.out.println("HTTP 狀態代碼: " + status);
try (BufferedReader br = new BufferedReader(
new InputStreamReader(status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(),
StandardCharsets.UTF_8))) {
StringBuilder response = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
response.append(line);
}
System.out.println("返回內容: " + response);
if (status == 200) {
JsonObject jsonObj = new Gson().fromJson(response.toString(), JsonObject.class);
return jsonObj.getAsJsonObject("output").get("voice").getAsString();
}
throw new IOException("建立語音失敗: " + status + " - " + response);
}
}
// 即時PCM音頻播放器類
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private AudioFormat audioFormat;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
// 建構函式初始化音頻格式和音頻線路
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
RawAudioBuffer.add(rawAudio);
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
playerThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
byte[] rawAudio = RawAudioBuffer.poll();
if (rawAudio != null) {
try {
playChunk(rawAudio);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
decoderThread.start();
playerThread.start();
}
// 播放一個音頻塊並阻塞直到播放完成
private void playChunk(byte[] chunk) throws IOException, InterruptedException {
if (chunk == null || chunk.length == 0) return;
int bytesWritten = 0;
while (bytesWritten < chunk.length) {
bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
}
int audioLength = chunk.length / (this.sampleRate*2/1000);
// 等待緩衝區中的音頻播放完成
Thread.sleep(audioLength - 10);
}
public void write(String b64Audio) {
b64AudioBuffer.add(b64Audio);
}
public void cancel() {
b64AudioBuffer.clear();
RawAudioBuffer.clear();
}
public void waitForComplete() throws InterruptedException {
while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
Thread.sleep(100);
}
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) {
line.drain();
line.close();
}
}
}
public static void main(String[] args) throws Exception {
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
.model(TARGET_MODEL)
// 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:wss://dashscope.aliyuncs.com/api-ws/v1/realtime
.url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
// 新加坡地區和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// 若沒有配置環境變數,請用百鍊API Key將下行替換為:.apikey("sk-xxx")
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
// 建立即時音頻播放器執行個體
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
@Override
public void onOpen() {
// 串連建立時的處理
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
// 會話建立時的處理
break;
case "response.audio.delta":
String recvAudioB64 = message.get("delta").getAsString();
// 即時播放音頻
audioPlayer.write(recvAudioB64);
break;
case "response.done":
// 響應完成時的處理
break;
case "session.finished":
// 會話結束時的處理
completeLatch.get().countDown();
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
// 串連關閉時的處理
}
});
qwenTtsRef.set(qwenTtsRealtime);
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice(createVoice()) // 將voice參數替換為複刻產生的專屬音色
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("server_commit")
.build();
qwenTtsRealtime.updateSession(config);
for (String text:textToSynthesize) {
qwenTtsRealtime.appendText(text);
Thread.sleep(100);
}
qwenTtsRealtime.finish();
completeLatch.get().await();
// 等待音頻播放完成並關閉播放器
audioPlayer.waitForComplete();
audioPlayer.shutdown();
System.exit(0);
}
}使用聲音設計音色進行語音合成
使用聲音設計功能時,服務會返回預覽音頻資料。建議先試聽該預覽音頻,確認效果符合預期後再用於語音合成,降低調用成本。
產生專屬音色並試聽效果,若對效果滿意,進行下一步;否則重建。
Python
import requests import base64 import os def create_voice_and_play(): # 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key # 若沒有配置環境變數,請用百鍊API Key將下行替換為:api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") if not api_key: print("錯誤: 未找到DASHSCOPE_API_KEY環境變數,請先設定API Key") return None, None, None # 準備請求資料 headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-realtime-2025-12-16", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "en" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } } # 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" try: # 發送請求 response = requests.post( url, headers=headers, json=data, timeout=60 # 添加逾時設定 ) if response.status_code == 200: result = response.json() # 擷取音色名稱 voice_name = result["output"]["voice"] print(f"音色名稱: {voice_name}") # 擷取預覽音頻資料 base64_audio = result["output"]["preview_audio"]["data"] # 解碼Base64音頻資料 audio_bytes = base64.b64decode(base64_audio) # 儲存音頻檔案到本地 filename = f"{voice_name}_preview.wav" # 將音頻資料寫入本地檔案 with open(filename, 'wb') as f: f.write(audio_bytes) print(f"音頻已儲存到本地檔案: {filename}") print(f"檔案路徑: {os.path.abspath(filename)}") return voice_name, audio_bytes, filename else: print(f"請求失敗,狀態代碼: {response.status_code}") print(f"響應內容: {response.text}") return None, None, None except requests.exceptions.RequestException as e: print(f"網路請求發生錯誤: {e}") return None, None, None except KeyError as e: print(f"響應資料格式錯誤,缺少必要的欄位: {e}") print(f"響應內容: {response.text if 'response' in locals() else 'No response'}") return None, None, None except Exception as e: print(f"發生未知錯誤: {e}") return None, None, None if __name__ == "__main__": print("開始建立語音...") voice_name, audio_data, saved_filename = create_voice_and_play() if voice_name: print(f"\n成功建立音色 '{voice_name}'") print(f"音頻檔案已儲存: '{saved_filename}'") print(f"檔案大小: {os.path.getsize(saved_filename)} 位元組") else: print("\n音色建立失敗")Java
需要匯入Gson依賴,若是使用Maven或者Gradle,添加依賴方式如下:
Maven
在
pom.xml中添加如下內容:<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson --> <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.13.1</version> </dependency>Gradle
在
build.gradle中添加如下內容:// https://mvnrepository.com/artifact/com.google.code.gson/gson implementation("com.google.code.gson:gson:2.13.1")import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.*; import java.net.HttpURLConnection; import java.net.URL; import java.util.Base64; public class Main { public static void main(String[] args) { Main example = new Main(); example.createVoice(); } public void createVoice() { // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key // 若沒有配置環境變數,請用百鍊API Key將下行替換為:String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // 建立JSON請求體字串 String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"create\",\n" + " \"target_model\": \"qwen3-tts-vd-realtime-2025-12-16\",\n" + " \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" + " \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" + " \"preferred_name\": \"announcer\",\n" + " \"language\": \"en\"\n" + " },\n" + " \"parameters\": {\n" + " \"sample_rate\": 24000,\n" + " \"response_format\": \"wav\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // 佈建要求方法和頭部 connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // 發送請求體 try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // 擷取響應 int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // 讀取響應內容 StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // 解析JSON響應 JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); JsonObject outputObj = jsonResponse.getAsJsonObject("output"); JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio"); // 擷取音色名稱 String voiceName = outputObj.get("voice").getAsString(); System.out.println("音色名稱: " + voiceName); // 擷取Base64編碼的音頻資料 String base64Audio = previewAudioObj.get("data").getAsString(); // 解碼Base64音頻資料 byte[] audioBytes = Base64.getDecoder().decode(base64Audio); // 儲存音頻到本地檔案 String filename = voiceName + "_preview.wav"; saveAudioToFile(audioBytes, filename); System.out.println("音頻已儲存到本地檔案: " + filename); } else { // 讀取錯誤響應 StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("請求失敗,狀態代碼: " + responseCode); System.out.println("錯誤響應: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("請求發生錯誤: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } private void saveAudioToFile(byte[] audioBytes, String filename) { try { File file = new File(filename); try (FileOutputStream fos = new FileOutputStream(file)) { fos.write(audioBytes); } System.out.println("音頻已儲存到: " + file.getAbsolutePath()); } catch (IOException e) { System.err.println("儲存音頻檔案時發生錯誤: " + e.getMessage()); e.printStackTrace(); } } }使用上一步產生的專屬音色進行語音合成。
這裡參考了使用系統音色進行語音合成DashScope SDK的“server commit模式”範例程式碼,將
voice參數替換為聲音設計產生的專屬音色進行語音合成。關鍵原則:聲音設計時使用的模型 (
target_model) 必須與後續進行語音合成時使用的模型 (model) 保持一致,否則會導致合成失敗。Python
# coding=utf-8 # Installation instructions for pyaudio: # APPLE Mac OS X # brew install portaudio # pip install pyaudio # Debian/Ubuntu # sudo apt-get install python-pyaudio python3-pyaudio # or # pip install pyaudio # CentOS # sudo yum install -y portaudio portaudio-devel && pip install pyaudio # Microsoft Windows # python -m pip install pyaudio import pyaudio import os import base64 import threading import time import dashscope # DashScope Python SDK 版本需要不低於1.23.9 from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat # ======= 常量配置 ======= TEXT_TO_SYNTHESIZE = [ '對吧~我就特別喜歡這種超市,', '尤其是過年的時候', '去逛超市', '就會覺得', '超級超級開心!', '想買好多好多的東西呢!' ] def init_dashscope_api_key(): """ 初始化 dashscope SDK 的 API key """ # 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key # 若沒有配置環境變數,請用百鍊API Key將下行替換為:dashscope.api_key = "sk-xxx" dashscope.api_key = os.getenv("DASHSCOPE_API_KEY") # ======= 回調類 ======= class MyCallback(QwenTtsRealtimeCallback): """ 自訂 TTS 流式回調 """ def __init__(self): self.complete_event = threading.Event() self._player = pyaudio.PyAudio() self._stream = self._player.open( format=pyaudio.paInt16, channels=1, rate=24000, output=True ) def on_open(self) -> None: print('[TTS] 串連已建立') def on_close(self, close_status_code, close_msg) -> None: self._stream.stop_stream() self._stream.close() self._player.terminate() print(f'[TTS] 串連關閉 code={close_status_code}, msg={close_msg}') def on_event(self, response: dict) -> None: try: event_type = response.get('type', '') if event_type == 'session.created': print(f'[TTS] 會話開始: {response["session"]["id"]}') elif event_type == 'response.audio.delta': audio_data = base64.b64decode(response['delta']) self._stream.write(audio_data) elif event_type == 'response.done': print(f'[TTS] 響應完成, Response ID: {qwen_tts_realtime.get_last_response_id()}') elif event_type == 'session.finished': print('[TTS] 會話結束') self.complete_event.set() except Exception as e: print(f'[Error] 處理回調事件異常: {e}') def wait_for_finished(self): self.complete_event.wait() # ======= 主執行邏輯 ======= if __name__ == '__main__': init_dashscope_api_key() print('[系統] 初始化 Qwen TTS Realtime ...') callback = MyCallback() qwen_tts_realtime = QwenTtsRealtime( # 聲音設計、語音合成要使用相同的模型 model="qwen3-tts-vd-realtime-2025-12-16", callback=callback, # 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:wss://dashscope.aliyuncs.com/api-ws/v1/realtime url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime' ) qwen_tts_realtime.connect() qwen_tts_realtime.update_session( voice="myvoice", # 將voice參數替換為聲音設計產生的專屬音色 response_format=AudioFormat.PCM_24000HZ_MONO_16BIT, mode='server_commit' ) for text_chunk in TEXT_TO_SYNTHESIZE: print(f'[發送文本]: {text_chunk}') qwen_tts_realtime.append_text(text_chunk) time.sleep(0.1) qwen_tts_realtime.finish() callback.wait_for_finished() print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, ' f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')Java
import com.alibaba.dashscope.audio.qwen_tts_realtime.*; import com.alibaba.dashscope.exception.NoApiKeyException; import com.google.gson.JsonObject; import javax.sound.sampled.*; import java.io.*; import java.util.Base64; import java.util.Queue; import java.util.concurrent.CountDownLatch; import java.util.concurrent.atomic.AtomicReference; import java.util.concurrent.ConcurrentLinkedQueue; import java.util.concurrent.atomic.AtomicBoolean; public class Main { // ===== 常量定義 ===== private static String[] textToSynthesize = { "對吧~我就特別喜歡這種超市", "尤其是過年的時候", "去逛超市", "就會覺得", "超級超級開心!", "想買好多好多的東西呢!" }; // 即時音頻播放器類 public static class RealtimePcmPlayer { private int sampleRate; private SourceDataLine line; private AudioFormat audioFormat; private Thread decoderThread; private Thread playerThread; private AtomicBoolean stopped = new AtomicBoolean(false); private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>(); private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>(); // 建構函式初始化音頻格式和音頻線路 public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException { this.sampleRate = sampleRate; this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false); DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat); line = (SourceDataLine) AudioSystem.getLine(info); line.open(audioFormat); line.start(); decoderThread = new Thread(new Runnable() { @Override public void run() { while (!stopped.get()) { String b64Audio = b64AudioBuffer.poll(); if (b64Audio != null) { byte[] rawAudio = Base64.getDecoder().decode(b64Audio); RawAudioBuffer.add(rawAudio); } else { try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); } } } } }); playerThread = new Thread(new Runnable() { @Override public void run() { while (!stopped.get()) { byte[] rawAudio = RawAudioBuffer.poll(); if (rawAudio != null) { try { playChunk(rawAudio); } catch (IOException e) { throw new RuntimeException(e); } catch (InterruptedException e) { throw new RuntimeException(e); } } else { try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); } } } } }); decoderThread.start(); playerThread.start(); } // 播放一個音頻塊並阻塞直到播放完成 private void playChunk(byte[] chunk) throws IOException, InterruptedException { if (chunk == null || chunk.length == 0) return; int bytesWritten = 0; while (bytesWritten < chunk.length) { bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten); } int audioLength = chunk.length / (this.sampleRate*2/1000); // 等待緩衝區中的音頻播放完成 Thread.sleep(audioLength - 10); } public void write(String b64Audio) { b64AudioBuffer.add(b64Audio); } public void cancel() { b64AudioBuffer.clear(); RawAudioBuffer.clear(); } public void waitForComplete() throws InterruptedException { while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) { Thread.sleep(100); } line.drain(); } public void shutdown() throws InterruptedException { stopped.set(true); decoderThread.join(); playerThread.join(); if (line != null && line.isRunning()) { line.drain(); line.close(); } } } public static void main(String[] args) throws Exception { QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder() // 聲音設計、語音合成要使用相同的模型 .model("qwen3-tts-vd-realtime-2025-12-16") // 以下為新加坡地區url,若使用北京地區的模型,需將url替換為:wss://dashscope.aliyuncs.com/api-ws/v1/realtime .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime") // 新加坡和北京地區的API Key不同。擷取API Key:https://www.alibabacloud.com/help/zh/model-studio/get-api-key // 若沒有配置環境變數,請用百鍊API Key將下行替換為:.apikey("sk-xxx") .apikey(System.getenv("DASHSCOPE_API_KEY")) .build(); AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1)); final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null); // 建立即時音頻播放器執行個體 RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000); QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() { @Override public void onOpen() { // 串連建立時的處理 } @Override public void onEvent(JsonObject message) { String type = message.get("type").getAsString(); switch(type) { case "session.created": // 會話建立時的處理 break; case "response.audio.delta": String recvAudioB64 = message.get("delta").getAsString(); // 即時播放音頻 audioPlayer.write(recvAudioB64); break; case "response.done": // 響應完成時的處理 break; case "session.finished": // 會話結束時的處理 completeLatch.get().countDown(); default: break; } } @Override public void onClose(int code, String reason) { // 串連關閉時的處理 } }); qwenTtsRef.set(qwenTtsRealtime); try { qwenTtsRealtime.connect(); } catch (NoApiKeyException e) { throw new RuntimeException(e); } QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder() .voice("myvoice") // 將voice參數替換為聲音設計產生的專屬音色 .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT) .mode("server_commit") .build(); qwenTtsRealtime.updateSession(config); for (String text:textToSynthesize) { qwenTtsRealtime.appendText(text); Thread.sleep(100); } qwenTtsRealtime.finish(); completeLatch.get().await(); // 等待音頻播放完成並關閉播放器 audioPlayer.waitForComplete(); audioPlayer.shutdown(); System.exit(0); } }
更多範例程式碼請參見github。
互動流程
server_commit 模式
將session.update事件的session.mode 設為"server_commit"以啟用該模式,服務端會智能處理文本分段和合成時機。
互動流程如下:
用戶端發送
session.update事件,服務端響應session.created與session.updated事件。用戶端發送
input_text_buffer.append事件追加文本至服務端緩衝區。服務端智能處理文本分段和合成時機,並返回
response.created、response.output_item.added、response.content_part.added、response.audio.delta事件。服務端響應完成後響應
response.audio.done、response.content_part.done、response.output_item.done、response.done。服務端響應
session.finished來結束會話。
生命週期 | 用戶端事件 | 伺服器事件 |
會話初始化 | session.update 會話配置 | session.created 會話已建立 session.updated 會話配置已更新 |
使用者文本輸入 | input_text_buffer.append 添加文本到服務端 input_text_buffer.commit 立即合成服務端緩衝的文本 session.finish 通知服務端不再有文本輸入 | input_text_buffer.committed 服務端收到提交的文本 |
伺服器音訊輸出 | 無 | response.created 服務端開始產生響應 response.output_item.added 響應時有新的輸出內容 response.content_part.added 新的輸出內容添加到assistant message response.audio.delta 模型增量產生的音頻 response.content_part.done Assistant mesasge 的文本或音頻內容流式輸出完成 response.output_item.done Assistant mesasge 的整個輸出項串流完成 response.audio.done 音頻產生完成 response.done 響應完成 |
commit 模式
將session.update事件的session.mode 設為"commit"以啟用該模式,用戶端需主動提交文本緩衝區至服務端來擷取響應。
互動流程如下:
用戶端發送
session.update事件,服務端響應session.created與session.updated事件。用戶端發送
input_text_buffer.append事件追加文本至服務端緩衝區。用戶端發送
input_text_buffer.commit事件將緩衝區提交至服務端,並發送session.finish事件表示後續無文本輸入。服務端響應
response.created,開始產生響應。服務端響應
response.output_item.added、response.content_part.added、response.audio.delta事件。服務端響應完成後返回
response.audio.done、response.content_part.done、response.output_item.done、response.done。服務端響應
session.finished來結束會話。
生命週期 | 用戶端事件 | 伺服器事件 |
會話初始化 | session.update 會話配置 | session.created 會話已建立 session.updated 會話配置已更新 |
使用者文本輸入 | input_text_buffer.append 添加文本到緩衝區 input_text_buffer.commit 提交緩衝區到服務端 input_text_buffer.clear 清除緩衝區 | input_text_buffer.committed 伺服器收到提交的文本 |
伺服器音訊輸出 | 無 | response.created 服務端開始產生響應 response.output_item.added 響應時有新的輸出內容 response.content_part.added 新的輸出內容添加到assistant message response.audio.delta 模型增量產生的音頻 response.content_part.done Assistant mesasge 的文本或音頻內容流式輸出完成 response.output_item.done Assistant mesasge 的整個輸出項串流完成 response.audio.done 音頻產生完成 response.done 響應完成 |
API參考
模型功能特性對比
功能/特性 | qwen3-tts-vd-realtime-2025-12-16 | qwen3-tts-vc-realtime-2025-11-27 | qwen3-tts-flash-realtime、qwen3-tts-flash-realtime-2025-11-27、qwen3-tts-flash-realtime-2025-09-18 | qwen-tts-realtime、qwen-tts-realtime-latest、qwen-tts-realtime-2025-07-15 |
支援語言 | 中文、英文、西班牙語、俄語、意大利語、法語、韓語、日語、德語、葡萄牙語 | 中文(普通話、北京、上海、四川、南京、陝西、閩南、天津、粵語,因音色而異)、英文、西班牙語、俄語、意大利語、法語、韓語、日語、德語、葡萄牙語 | 中文、英文 | |
音頻格式 | pcm、wav、mp3、opus | pcm | ||
音頻採樣率 | 8kHz、16kHz、24kHz、48kHz | 24kHz | ||
聲音複刻 | ||||
聲音設計 | ||||
SSML | ||||
LaTeX | ||||
音量大小 | ||||
語速調節 | ||||
語調(音高)調節 | ||||
碼率調節 | ||||
時間戳記 | ||||
設定情感 | ||||
流式輸入 | ||||
流式輸出 | ||||
限流 | 每分鐘調用次數(RPM):180 | qwen3-tts-flash-realtime-2025-11-27每分鐘調用次數(RPM):180 qwen3-tts-flash-realtime、qwen3-tts-flash-realtime-2025-09-18每分鐘調用次數(RPM):10 | 每分鐘調用次數(RPM):10 每分鐘消耗Token數(TPM):100,000 | |
接入方式 | Java/Python/ SDK、WebSocket API | |||
價格 | 國際(新加坡):$0.143353/萬字元 中國大陸(北京):$0.143353/萬字元 | 國際(新加坡):$0.13/萬字元 中國大陸(北京):$0.143353/萬字元 | 中國大陸(北京):
| |
支援的音色
通義千問3-TTS-Flash-Realtime
不同模型支援的音色有所差異,使用時將請求參數voice設定為音色列表中voice參數列對應的值。
通義千問-TTS-Realtime
所有模型使用相同音色,使用時將請求參數voice設定為音色列表中voice參數列對應的值。