リアルタイム音声合成 - Qwen TTS - Alibaba Cloud Model Studio - Alibaba Cloud ドキュメントセンター

Qwen リアルタイム TTS は、WebSocket を介してテキストを入力し、オーディオを出力します。単一の音声内で複数の言語および方言に対応した自然な人間らしい音声を提供し、複雑なテキストを自然に処理するためにイントネーションを自動的に調整します。

主な特徴

中国語および英語を含む自然な多言語出力による高忠実度のリアルタイム音声を生成
次の 2 種類の音声カスタマイズ方法を提供：Qwen 音声クローニングおよび Qwen 音声デザイン
低遅延のリアルタイムインタラクションを実現するための入出力ストリーミング
音声速度、ピッチ、ボリューム、ビットレートを詳細に制御可能
最大 48 kHz のサンプルレートで PCM、WAV、MP3、Opus 形式をサポート
命令制御をサポート。自然言語による命令を通じて音声の表現力を調整可能

サポートされるモデル

中国本土

デプロイメント範囲として中国本土を選択すると、モデル推論の計算リソースは中国本土内に限定されます。静的データは選択したリージョンに保存されます。サポートされるリージョン：中国 (北京)。

以下のモデルを呼び出すには、北京リージョンからAPI キーを選択してください：

Qwen3-TTS-Instruct-Flash-Realtime：qwen3-tts-instruct-flash-realtime（安定版、現在は qwen3-tts-instruct-flash-realtime-2026-01-22 と同等）、qwen3-tts-instruct-flash-realtime-2026-01-22（最新スナップショット）
Qwen3-TTS-VD-Realtime：qwen3-tts-vd-realtime-2026-01-15（最新スナップショット）、qwen3-tts-vd-realtime-2025-12-16（スナップショット）
Qwen3-TTS-VC-Realtime：qwen3-tts-vc-realtime-2026-01-15（最新スナップショット）、qwen3-tts-vc-realtime-2025-11-27（スナップショット）
Qwen3-TTS-Flash-Realtime：qwen3-tts-flash-realtime（安定版、現在は qwen3-tts-flash-realtime-2025-11-27 と同等）、qwen3-tts-flash-realtime-2025-11-27（最新スナップショット版）、qwen3-tts-flash-realtime-2025-09-18（スナップショット版）
Qwen-TTS-Realtime：qwen-tts-realtime（安定版、現在は qwen-tts-realtime-2025-07-15 と同等）、qwen-tts-realtime-latest（最新版、現在は qwen-tts-realtime-2025-07-15 と同等）、qwen-tts-realtime-2025-07-15（スナップショット版）

国際

デプロイメント範囲として国際を選択すると、モデル推論の計算リソースは中国本土を除く世界中で動的にスケジュールされます。静的データは選択したリージョンに保存されます。サポートされるリージョン：シンガポール。

以下のモデルを呼び出すには、シンガポールリージョンからAPI キーを選択してください：

Qwen3-TTS-Instruct-Flash-Realtime：qwen3-tts-instruct-flash-realtime（安定版、現在は qwen3-tts-instruct-flash-realtime-2026-01-22 と同等）、qwen3-tts-instruct-flash-realtime-2026-01-22（最新スナップショット）
Qwen3-TTS-VD-Realtime：qwen3-tts-vd-realtime-2026-01-15（最新スナップショット）、qwen3-tts-vd-realtime-2025-12-16（スナップショット）
Qwen3-TTS-VC-Realtime：qwen3-tts-vc-realtime-2026-01-15（最新スナップショット）、qwen3-tts-vc-realtime-2025-11-27（スナップショット）
Qwen3-TTS-Flash-Realtime：qwen3-tts-flash-realtime（安定版、現在は qwen3-tts-flash-realtime-2025-11-27 と同等）、qwen3-tts-flash-realtime-2025-11-27（最新スナップショット版）、qwen3-tts-flash-realtime-2025-09-18（スナップショット版）

国際

以下のモデルを呼び出すには、シンガポールリージョンからAPI キーを選択してください：

Qwen3-TTS-Instruct-Flash-Realtime：qwen3-tts-instruct-flash-realtime（安定版、現在は qwen3-tts-instruct-flash-realtime-2026-01-22 と同等）、qwen3-tts-instruct-flash-realtime-2026-01-22（最新スナップショット版）
Qwen3-TTS-VD-Realtime： qwen3-tts-vd-realtime-2026-01-15（最新スナップショット版）、qwen3-tts-vd-realtime-2025-12-16（スナップショット版）
Qwen3-TTS-VC-Realtime： qwen3-tts-vc-realtime-2026-01-15（最新スナップショット版）、qwen3-tts-vc-realtime-2025-11-27（スナップショット版）
Qwen3-TTS-Flash-Realtime： qwen3-tts-flash-realtime（安定版、現在は qwen3-tts-flash-realtime-2025-11-27 と同等）、qwen3-tts-flash-realtime-2025-11-27（最新スナップショット版）、qwen3-tts-flash-realtime-2025-09-18（スナップショット版）

中国本土

以下のモデルを呼び出すには、北京リージョンからAPI キーを選択してください：

Qwen3-TTS-Instruct-Flash-Realtime：qwen3-tts-instruct-flash-realtime（安定版、現在は qwen3-tts-instruct-flash-realtime-2026-01-22 と同等）、qwen3-tts-instruct-flash-realtime-2026-01-22（最新スナップショット版）
Qwen3-TTS-VD-Realtime： qwen3-tts-vd-realtime-2026-01-15（最新スナップショット版）、qwen3-tts-vd-realtime-2025-12-16（スナップショット版）
Qwen3-TTS-VC-Realtime： qwen3-tts-vc-realtime-2026-01-15（最新スナップショット版）、qwen3-tts-vc-realtime-2025-11-27（スナップショット版）
Qwen3-TTS-Flash-Realtime： qwen3-tts-flash-realtime（安定版、現在は qwen3-tts-flash-realtime-2025-11-27 と同等）、qwen3-tts-flash-realtime-2025-11-27（最新スナップショット版）、qwen3-tts-flash-realtime-2025-09-18（スナップショット版）
Qwen-TTS-Realtime： qwen-tts-realtime（安定版、現在は qwen-tts-realtime-2025-07-15 と同等）、qwen-tts-realtime-latest（最新版、現在は qwen-tts-realtime-2025-07-15 と同等）、qwen-tts-realtime-2025-07-15（スナップショット版）

モデルの選択

ユースケース	推奨モデル	なぜ
ブランドアイデンティティ、専用音声、または拡張システム音声（テキストベース）のためのカスタム音声	qwen3-tts-vd-realtime-2026-01-15	音声デザイン：音声サンプルなしでテキスト記述からカスタム音声を作成 — ブランド音声をゼロから構築するのに最適です。
ブランドアイデンティティ、専用音声、または拡張システム音声（音声ベース）のためのカスタム音声	qwen3-tts-vc-realtime-2026-01-15	音声クローニング：実際の音声サンプルから音声を迅速に複製し、一貫性があり人間らしいブランド音声を生成します。
表現豊かなコンテンツ制作（オーディオブック、ラジオドラマ、ゲームまたはアニメーションのボイスオーバー）	qwen3-tts-instruct-flash-realtime	命令制御：自然言語を通じてピッチ、音声速度、感情、キャラクター特性を指定し、豊かな表現力とロールプレイを実現します。
プロフェッショナルなナレーション（ニュース、ドキュメンタリー、広告）	qwen3-tts-instruct-flash-realtime	命令制御：「権威的かつフォーマル」または「カジュアルでフレンドリー」など、ナレーションスタイルとトーンを記述し、プロフェッショナルな品質の制作を実現します。
インテリジェントカスタマーサービスおよびチャットボット	qwen3-tts-flash-realtime、qwen3-tts-instruct-flash-realtime	ストリーミング入出力により音声速度とピッチを調整可能。Instruct バリアントでは会話の文脈に基づき、落ち着いたトーン、熱意あるトーン、プロフェッショナルなトーンなど、動的なトーン調整が可能です。
多言語コンテンツ配信	qwen3-tts-flash-realtime、qwen3-tts-instruct-flash-realtime	グローバルなコンテンツ配信のための複数言語および中国語方言をサポート。
音声読み上げおよび一般的なコンテンツ制作	qwen3-tts-flash-realtime、qwen3-tts-instruct-flash-realtime	オーディオブックおよびポッドキャスト制作のためのボリューム、音声速度、ピッチを調整可能。
E コマースライブストリーミングおよびショート動画のボイスオーバー	qwen3-tts-flash-realtime、qwen3-tts-instruct-flash-realtime	帯域幅が制限されたシナリオ向けに MP3 および Opus 圧縮形式をサポート。

詳細については、「モデル機能比較」をご参照ください。

クイックスタート

コードを実行する前に、API キーを取得および設定してください。SDK ベースの統合を行う場合は、最新の DashScope SDK をインストールしてください。

システム音声での音声合成

次の例では、システム音声を使用して音声合成を行います（サポートされる音声を参照）。

命令制御を有効にするには、model を qwen3-tts-instruct-flash-realtime に置き換え、instructions パラメーターを設定してください。

DashScope SDK

Python

サーバーコミットモード

import os
import base64
import threading
import time
import dashscope
from dashscope.audio.qwen_tts_realtime import *


qwen_tts_realtime: QwenTtsRealtime = None
text_to_synthesize = [
    'Right? I love supermarkets like this.',
    'Especially during Chinese New Year,',
    'I go shopping at supermarkets.',
    'And I feel',
    'absolutely thrilled!',
    'I want to buy so many things!'
]

DO_VIDEO_TEST = False

def init_dashscope_api_key():
    """
        Set your DashScope API key. More information:
        https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
    """

    # API keys differ between the Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    if 'DASHSCOPE_API_KEY' in os.environ:
        dashscope.api_key = os.environ[
            'DASHSCOPE_API_KEY']  # Load API key from environment variable DASHSCOPE_API_KEY
    else:
        dashscope.api_key = 'your-dashscope-api-key'  # Set API key manually



class MyCallback(QwenTtsRealtimeCallback):
    def __init__(self):
        self.complete_event = threading.Event()
        self.file = open('result_24k.pcm', 'wb')

    def on_open(self) -> None:
        print('connection opened, init player')

    def on_close(self, close_status_code, close_msg) -> None:
        self.file.close()
        print('connection closed with code: {}, msg: {}, destroy player'.format(close_status_code, close_msg))

    def on_event(self, response: str) -> None:
        try:
            global qwen_tts_realtime
            type = response['type']
            if 'session.created' == type:
                print('start session: {}'.format(response['session']['id']))
            if 'response.audio.delta' == type:
                recv_audio_b64 = response['delta']
                self.file.write(base64.b64decode(recv_audio_b64))
            if 'response.done' == type:
                print(f'response {qwen_tts_realtime.get_last_response_id()} done')
            if 'session.finished' == type:
                print('session finished')
                self.complete_event.set()
        except Exception as e:
            print('[Error] {}'.format(e))
            return

    def wait_for_finished(self):
        self.complete_event.wait()


if __name__  == '__main__':
    init_dashscope_api_key()

    print('Initializing ...')

    callback = MyCallback()

    qwen_tts_realtime = QwenTtsRealtime(
        # To use instruction control, replace the model with qwen3-tts-instruct-flash-realtime
        model='qwen3-tts-flash-realtime',
        callback=callback,
        # This URL is for the Singapore region. If you use the Beijing region, replace it with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
        url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
        )

    qwen_tts_realtime.connect()
    qwen_tts_realtime.update_session(
        voice = 'Cherry',
        response_format = AudioFormat.PCM_24000HZ_MONO_16BIT,
        # To use instruction control, uncomment the following lines and replace the model with qwen3-tts-instruct-flash-realtime
        # instructions='Speak quickly with a rising intonation, suitable for introducing fashion products.',
        # optimize_instructions=True,
        mode = 'server_commit'        
    )
    for text_chunk in text_to_synthesize:
        print(f'send text: {text_chunk}')
        qwen_tts_realtime.append_text(text_chunk)
        time.sleep(0.1)
    qwen_tts_realtime.finish()
    callback.wait_for_finished()
    print('[Metric] session: {}, first audio delay: {}'.format(
                    qwen_tts_realtime.get_session_id(), 
                    qwen_tts_realtime.get_first_audio_delay(),
                    ))

コミットモード

import base64
import os
import threading
import dashscope
from dashscope.audio.qwen_tts_realtime import *


qwen_tts_realtime: QwenTtsRealtime = None
text_to_synthesize = [
    'This is the first sentence.',
    'This is the second sentence.',
    'This is the third sentence.',
]

DO_VIDEO_TEST = False

def init_dashscope_api_key():
    """
        Set your DashScope API key. More information:
        https://github.com/aliyun/alibabacloud-bailian-speech-demo/blob/master/PREREQUISITES.md
    """

    # API keys differ between the Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    if 'DASHSCOPE_API_KEY' in os.environ:
        dashscope.api_key = os.environ[
            'DASHSCOPE_API_KEY']  # Load API key from environment variable DASHSCOPE_API_KEY
    else:
        dashscope.api_key = 'your-dashscope-api-key'  # Set API key manually



class MyCallback(QwenTtsRealtimeCallback):
    def __init__(self):
        super().__init__()
        self.response_counter = 0
        self.complete_event = threading.Event()
        self.file = open(f'result_{self.response_counter}_24k.pcm', 'wb')

    def reset_event(self):
        self.response_counter += 1
        self.file = open(f'result_{self.response_counter}_24k.pcm', 'wb')
        self.complete_event = threading.Event()

    def on_open(self) -> None:
        print('connection opened, init player')

    def on_close(self, close_status_code, close_msg) -> None:
        print('connection closed with code: {}, msg: {}, destroy player'.format(close_status_code, close_msg))

    def on_event(self, response: str) -> None:
        try:
            global qwen_tts_realtime
            type = response['type']
            if 'session.created' == type:
                print('start session: {}'.format(response['session']['id']))
            if 'response.audio.delta' == type:
                recv_audio_b64 = response['delta']
                self.file.write(base64.b64decode(recv_audio_b64))
            if 'response.done' == type:
                print(f'response {qwen_tts_realtime.get_last_response_id()} done')
                self.complete_event.set()
                self.file.close()
            if 'session.finished' == type:
                print('session finished')
                self.complete_event.set()
        except Exception as e:
            print('[Error] {}'.format(e))
            return

    def wait_for_response_done(self):
        self.complete_event.wait()


if __name__  == '__main__':
    init_dashscope_api_key()

    print('Initializing ...')

    callback = MyCallback()

    qwen_tts_realtime = QwenTtsRealtime(
        # To use instruction control, replace the model with qwen3-tts-instruct-flash-realtime
        model='qwen3-tts-flash-realtime',
        callback=callback, 
        # This URL is for the Singapore region. If you use the Beijing region, replace it with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
        url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
        )

    qwen_tts_realtime.connect()
    qwen_tts_realtime.update_session(
        voice = 'Cherry',
        response_format = AudioFormat.PCM_24000HZ_MONO_16BIT,
        # To use instruction control, uncomment the following lines and replace the model with qwen3-tts-instruct-flash-realtime
        # instructions='Speak quickly with a rising intonation, suitable for introducing fashion products.',
        # optimize_instructions=True,
        mode = 'commit'        
    )
    print(f'send text: {text_to_synthesize[0]}')
    qwen_tts_realtime.append_text(text_to_synthesize[0])
    qwen_tts_realtime.commit()
    callback.wait_for_response_done()
    callback.reset_event()
    
    print(f'send text: {text_to_synthesize[1]}')
    qwen_tts_realtime.append_text(text_to_synthesize[1])
    qwen_tts_realtime.commit()
    callback.wait_for_response_done()
    callback.reset_event()

    print(f'send text: {text_to_synthesize[2]}')
    qwen_tts_realtime.append_text(text_to_synthesize[2])
    qwen_tts_realtime.commit()
    callback.wait_for_response_done()
    
    qwen_tts_realtime.finish()
    print('[Metric] session: {}, first audio delay: {}'.format(
                    qwen_tts_realtime.get_session_id(), 
                    qwen_tts_realtime.get_first_audio_delay(),
                    ))

Java

サーバーコミットモード

appendText()

import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.AudioSystem;
import java.io.*;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class Main {
    static String[] textToSynthesize = {
            "Right? I really love this kind of supermarket.",
            "Especially during the Chinese New Year.",
            "Going to the supermarket.",
            "It just makes me feel.",
            "Super, super happy!",
            "I want to buy so many things!"
    };
    public static QwenTtsRealtimeAudioFormat ttsFormat = QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT;

    // Real-time PCM audio player
    public static class RealtimePcmPlayer {
        private int sampleRate;
        private SourceDataLine line;
        private AudioFormat audioFormat;
        private Thread decoderThread;
        private Thread playerThread;
        private AtomicBoolean stopped = new AtomicBoolean(false);
        private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
        private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
        private ByteArrayOutputStream totalAudioStream = new ByteArrayOutputStream();

        // Initialize the audio format and audio line.
        public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
            this.sampleRate = sampleRate;
            this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
            DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
            line = (SourceDataLine) AudioSystem.getLine(info);
            line.open(audioFormat);
            line.start();
            decoderThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        String b64Audio = b64AudioBuffer.poll();
                        if (b64Audio != null) {
                            byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
                            RawAudioBuffer.add(rawAudio);
                            // Write audio data to totalAudioStream.
                            try {
                                totalAudioStream.write(rawAudio);
                            } catch (IOException e) {
                                throw new RuntimeException(e);
                            }
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            playerThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        byte[] rawAudio = RawAudioBuffer.poll();
                        if (rawAudio != null) {
                            try {
                                playChunk(rawAudio);
                            } catch (IOException e) {
                                throw new RuntimeException(e);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            decoderThread.start();
            playerThread.start();
        }

        // Play an audio chunk and block until playback completes.
        private void playChunk(byte[] chunk) throws IOException, InterruptedException {
            if (chunk == null || chunk.length == 0) return;

            int bytesWritten = 0;
            while (bytesWritten < chunk.length) {
                bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
            }
            int audioLength = chunk.length / (this.sampleRate*2/1000);
            // Wait for the buffered audio to finish playing.
            Thread.sleep(audioLength - 10);
        }

        public void write(String b64Audio) {
            b64AudioBuffer.add(b64Audio);
        }

        public void cancel() {
            b64AudioBuffer.clear();
            RawAudioBuffer.clear();
        }

        public void waitForComplete() throws InterruptedException {
            while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
                Thread.sleep(100);
            }
            line.drain();
        }

        public void shutdown() throws InterruptedException, IOException {
            stopped.set(true);
            decoderThread.join();
            playerThread.join();

            // Save the complete audio file.
            File file = new File("TotalAudio_"+ttsFormat.getSampleRate()+"."+ttsFormat.getFormat());
            try (FileOutputStream fos = new FileOutputStream(file)) {
                fos.write(totalAudioStream.toByteArray());
            }

            if (line != null && line.isRunning()) {
                line.drain();
                line.close();
            }
        }
    }

    public static void main(String[] args) throws InterruptedException, LineUnavailableException, IOException {
        QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
                // To use instruction control, replace the model with qwen3-tts-instruct-flash-realtime.
                .model("qwen3-tts-flash-realtime")
                // Singapore endpoint. For China (Beijing), use wss://dashscope.aliyuncs.com/api-ws/v1/realtime.
                .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                // API keys differ between Singapore and China (Beijing). See https://www.alibabacloud.com/help/zh/model-studio/get-api-key.
                .apikey(System.getenv("DASHSCOPE_API_KEY"))
                .build();
        AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
        final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);

        // Create a real-time audio player instance.
        RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);

        QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
            @Override
            public void onOpen() {
                // Handle connection establishment.
            }
            @Override
            public void onEvent(JsonObject message) {
                String type = message.get("type").getAsString();
                switch(type) {
                    case "session.created":
                        // Handle session creation.
                        if (message.has("session")) {
                            String eventId = message.get("event_id").getAsString();
                            String sessionId = message.get("session").getAsJsonObject().get("id").getAsString();
                            System.out.println("[onEvent] session.created, session_id: "
                                    + sessionId + ", event_id: " + eventId);
                        }
                        break;
                    case "response.audio.delta":
                        String recvAudioB64 = message.get("delta").getAsString();
                        // Play audio in real time.
                        audioPlayer.write(recvAudioB64);
                        break;
                    case "response.done":
                        // Handle response completion.
                        break;
                    case "session.finished":
                        // Handle session termination.
                        completeLatch.get().countDown();
                    default:
                        break;
                }
            }
            @Override
            public void onClose(int code, String reason) {
                // Handle connection closure.
            }
        });
        qwenTtsRef.set(qwenTtsRealtime);
        try {
            qwenTtsRealtime.connect();
        } catch (NoApiKeyException e) {
            throw new RuntimeException(e);
        }
        QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
                .voice("Cherry")
                .responseFormat(ttsFormat)
                .mode("server_commit")
                // To use instruction control, uncomment the following lines and replace the model with qwen3-tts-instruct-flash-realtime.
                // .instructions("")
                // .optimizeInstructions(true)
                .build();
        qwenTtsRealtime.updateSession(config);
        for (String text:textToSynthesize) {
            qwenTtsRealtime.appendText(text);
            Thread.sleep(100);
        }
        qwenTtsRealtime.finish();
        completeLatch.get().await();
        qwenTtsRealtime.close();

        // Wait for audio playback to complete, then shut down the player.
        audioPlayer.waitForComplete();
        audioPlayer.shutdown();
        System.exit(0);
    }
}

コミットモード

commit()

import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.LineUnavailableException;
import javax.sound.sampled.SourceDataLine;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.AudioSystem;
import java.io.*;
import java.util.Base64;
import java.util.Queue;
import java.util.Scanner;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class Main {
    public static QwenTtsRealtimeAudioFormat ttsFormat = QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT;
    // Real-time PCM audio player
    public static class RealtimePcmPlayer {
        private int sampleRate;
        private SourceDataLine line;
        private AudioFormat audioFormat;
        private Thread decoderThread;
        private Thread playerThread;
        private AtomicBoolean stopped = new AtomicBoolean(false);
        private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
        private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
        private ByteArrayOutputStream totalAudioStream = new ByteArrayOutputStream();


        // Initialize the audio format and audio line.
        public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
            this.sampleRate = sampleRate;
            this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
            DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
            line = (SourceDataLine) AudioSystem.getLine(info);
            line.open(audioFormat);
            line.start();
            decoderThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        String b64Audio = b64AudioBuffer.poll();
                        if (b64Audio != null) {
                            byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
                            RawAudioBuffer.add(rawAudio);
                            // Write audio data to totalAudioStream.
                            try {
                                totalAudioStream.write(rawAudio);
                            } catch (IOException e) {
                                throw new RuntimeException(e);
                            }
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            playerThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        byte[] rawAudio = RawAudioBuffer.poll();
                        if (rawAudio != null) {
                            try {
                                playChunk(rawAudio);
                            } catch (IOException e) {
                                throw new RuntimeException(e);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            decoderThread.start();
            playerThread.start();
        }

        // Play an audio chunk and block until playback completes.
        private void playChunk(byte[] chunk) throws IOException, InterruptedException {
            if (chunk == null || chunk.length == 0) return;

            int bytesWritten = 0;
            while (bytesWritten < chunk.length) {
                bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
            }
            int audioLength = chunk.length / (this.sampleRate*2/1000);
            // Wait for the buffered audio to finish playing.
            Thread.sleep(audioLength - 10);
        }

        public void write(String b64Audio) {
            b64AudioBuffer.add(b64Audio);
        }

        public void cancel() {
            b64AudioBuffer.clear();
            RawAudioBuffer.clear();
        }

        public void waitForComplete() throws InterruptedException {
            // Wait for all buffered audio data to finish playing.
            while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
                Thread.sleep(100);
            }
            // Wait for the audio line to drain.
            line.drain();
        }

        public void shutdown() throws InterruptedException {
            stopped.set(true);
            decoderThread.join();
            playerThread.join();
            // Save the complete audio file.
            File file = new File("TotalAudio_"+ttsFormat.getSampleRate()+"."+ttsFormat.getFormat());
            try (FileOutputStream fos = new FileOutputStream(file)) {
                fos.write(totalAudioStream.toByteArray());
            } catch (FileNotFoundException e) {
                throw new RuntimeException(e);
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
            if (line != null && line.isRunning()) {
                line.drain();
                line.close();
            }
        }
    }

    public static void main(String[] args) throws InterruptedException, LineUnavailableException, FileNotFoundException {
        Scanner scanner = new Scanner(System.in);

        QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
                // To use instruction control, replace the model with qwen3-tts-instruct-flash-realtime.
                .model("qwen3-tts-flash-realtime")
                // Singapore endpoint. For China (Beijing), use wss://dashscope.aliyuncs.com/api-ws/v1/realtime.
                .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                // API keys differ between Singapore and China (Beijing). See https://www.alibabacloud.com/help/zh/model-studio/get-api-key.
                .apikey(System.getenv("DASHSCOPE_API_KEY"))
                .build();

        AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));

        // Create a real-time player instance.
        RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);

        final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
        QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
            @Override
            public void onOpen() {
                System.out.println("connection opened");
                System.out.println("Enter text and press Enter to send. Enter 'quit' to exit the program.");
            }
            @Override
            public void onEvent(JsonObject message) {
                String type = message.get("type").getAsString();
                switch(type) {
                    case "session.created":
                        System.out.println("start session: " + message.get("session").getAsJsonObject().get("id").getAsString());
                        break;
                    case "response.audio.delta":
                        String recvAudioB64 = message.get("delta").getAsString();
                        byte[] rawAudio = Base64.getDecoder().decode(recvAudioB64);
                        // Play audio in real time.
                        audioPlayer.write(recvAudioB64);
                        break;
                    case "response.done":
                        System.out.println("response done");
                        // Wait for audio playback to complete.
                        try {
                            audioPlayer.waitForComplete();
                        } catch (InterruptedException e) {
                            throw new RuntimeException(e);
                        }
                        // Prepare for the next input.
                        completeLatch.get().countDown();
                        break;
                    case "session.finished":
                        System.out.println("session finished");
                        if (qwenTtsRef.get() != null) {
                            System.out.println("[Metric] response: " + qwenTtsRef.get().getResponseId() +
                                    ", first audio delay: " + qwenTtsRef.get().getFirstAudioDelay() + " ms");
                        }
                        completeLatch.get().countDown();
                    default:
                        break;
                }
            }
            @Override
            public void onClose(int code, String reason) {
                System.out.println("connection closed code: " + code + ", reason: " + reason);
                try {
                    // Wait for playback to complete, then shut down the player.
                    audioPlayer.waitForComplete();
                    audioPlayer.shutdown();
                } catch (InterruptedException e) {
                    throw new RuntimeException(e);
                }
            }
        });
        qwenTtsRef.set(qwenTtsRealtime);
        try {
            qwenTtsRealtime.connect();
        } catch (NoApiKeyException e) {
            throw new RuntimeException(e);
        }
        QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
                .voice("Cherry")
                .responseFormat(ttsFormat)
                .mode("commit")
                // To use instruction control, uncomment the following lines and replace the model with qwen3-tts-instruct-flash-realtime.
                // .instructions("")
                // .optimizeInstructions(true)
                .build();
        qwenTtsRealtime.updateSession(config);

        // Read user input in a loop.
        while (true) {
            System.out.print("Enter the text to synthesize: ");
            String text = scanner.nextLine();

            // Exit when the user enters 'quit'.
            if ("quit".equalsIgnoreCase(text.trim())) {
                System.out.println("Closing the connection...");
                qwenTtsRealtime.finish();
                completeLatch.get().await();
                break;
            }

            // Skip empty input.
            if (text.trim().isEmpty()) {
                continue;
            }

            // Re-initialize the countdown latch.
            completeLatch.set(new CountDownLatch(1));

            // Send the text.
            qwenTtsRealtime.appendText(text);
            qwenTtsRealtime.commit();

            // Wait for the current synthesis to complete.
            completeLatch.get().await();
        }

        // Clean up resources.
        audioPlayer.waitForComplete();
        audioPlayer.shutdown();
        scanner.close();
        System.exit(0);
    }
}

WebSocket API

環境のセットアップ

Python

オペレーティングシステムに応じて pyaudio をインストールします。

macOS

brew install portaudio && pip install pyaudio

Debian/Ubuntu

sudo apt-get install python3-pyaudio

# or

pip install pyaudio

CentOS

sudo yum install -y portaudio portaudio-devel && pip install pyaudio

Windows

pip install pyaudio

インストール後、pip を使って WebSocket の依存関係をインストールします。

pip install websocket-client==1.8.0 websockets

Java

プロジェクトに次の依存関係を追加します。

Maven

pom.xml に次の内容を追加します。

<!-- Java-WebSocket library -->
<dependency>
    <groupId>org.java-websocket</groupId>
    <artifactId>Java-WebSocket</artifactId>
    <version>1.5.7</version>
</dependency>
<!-- Gson for JSON processing -->
<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.13.1</version>
</dependency>

Gradle

build.gradle に次の内容を追加します。

// Java-WebSocket library
implementation("org.java-websocket:Java-WebSocket:1.5.7")
// Gson for JSON processing
implementation("com.google.code.gson:gson:2.13.1")

クライアントの作成

Python

tts_realtime_client.py という名前の Python ファイルを作成し、次のコードをコピーします。

tts_realtime_client.py

# -- coding: utf-8 --

import asyncio
import websockets
import json
import base64
import time
from typing import Optional, Callable, Dict, Any
from enum import Enum

class SessionMode(Enum):
    SERVER_COMMIT = "server_commit"
    COMMIT = "commit"

class TTSRealtimeClient:
    """
    A client for interacting with the TTS Realtime API.

    This class provides methods to connect to the TTS Realtime API, send text data, receive audio output, and manage the WebSocket connection.

    Attributes:
        base_url (str):
            The base URL of the Realtime API.
        api_key (str):
            The API Key used for authentication.
        voice (str):
            The voice used by the server for speech synthesis.
        mode (SessionMode):
            The session mode, either server_commit or commit.
        audio_callback (Callable[[bytes], None]):
            A callback function to receive audio data.
        language_type(str)
            The language of the synthesized speech. Valid values: Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian, Auto
    """

    def __init__(
            self,
            base_url: str,
            api_key: str,
            voice: str = "Cherry",
            mode: SessionMode = SessionMode.SERVER_COMMIT,
            audio_callback: Optional[Callable[[bytes], None]] = None,
        language_type: str = "Auto"):
        self.base_url = base_url
        self.api_key = api_key
        self.voice = voice
        self.mode = mode
        self.ws = None
        self.audio_callback = audio_callback
        self.language_type = language_type

        # Current response state
        self._current_response_id = None
        self._current_item_id = None
        self._is_responding = False
        self._response_done_future = None

    async def connect(self) -> None:
        """Establish a WebSocket connection to the TTS Realtime API."""
        headers = {
            "Authorization": f"Bearer {self.api_key}"
        }

        self.ws = await websockets.connect(self.base_url, additional_headers=headers)

        # Set default session configuration
        await self.update_session({
            "mode": self.mode.value,
            "voice": self.voice,
            # To use the instruction control feature, uncomment the lines below and replace model in server_commit.py or commit.py with qwen3-tts-instruct-flash-realtime
            # "instructions": "Speak quickly with a noticeable rising intonation, suitable for introducing fashion products.",
            # "optimize_instructions": true
            "language_type": self.language_type,
            "response_format": "pcm",
            "sample_rate": 24000
        })

    async def send_event(self, event) -> None:
        """Send an event to the server."""
        event['event_id'] = "event_" + str(int(time.time() * 1000))
        print(f"Sending event: type={event['type']}, event_id={event['event_id']}")
        await self.ws.send(json.dumps(event))

    async def update_session(self, config: Dict[str, Any]) -> None:
        """Update the session configuration."""
        event = {
            "type": "session.update",
            "session": config
        }
        print("Updating session configuration: ", event)
        await self.send_event(event)

    async def append_text(self, text: str) -> None:
        """Send text data to the API."""
        event = {
            "type": "input_text_buffer.append",
            "text": text
        }
        await self.send_event(event)

    async def commit_text_buffer(self) -> None:
        """Commit the text buffer to trigger processing."""
        event = {
            "type": "input_text_buffer.commit"
        }
        await self.send_event(event)

    async def clear_text_buffer(self) -> None:
        """Clear the text buffer."""
        event = {
            "type": "input_text_buffer.clear"
        }
        await self.send_event(event)

    async def finish_session(self) -> None:
        """End the session."""
        event = {
            "type": "session.finish"
        }
        await self.send_event(event)

    async def wait_for_response_done(self):
        """Wait for the response.done event"""
        if self._response_done_future:
            await self._response_done_future

    async def handle_messages(self) -> None:
        """Handle messages from the server."""
        try:
            async for message in self.ws:
                event = json.loads(message)
                event_type = event.get("type")

                if event_type != "response.audio.delta":
                    print(f"Received event: {event_type}")

                if event_type == "error":
                    print("Error: ", event.get('error', {}))
                    continue
                elif event_type == "session.created":
                    print("Session created, ID: ", event.get('session', {}).get('id'))
                elif event_type == "session.updated":
                    print("Session updated, ID: ", event.get('session', {}).get('id'))
                elif event_type == "input_text_buffer.committed":
                    print("Text buffer committed, item ID: ", event.get('item_id'))
                elif event_type == "input_text_buffer.cleared":
                    print("Text buffer cleared")
                elif event_type == "response.created":
                    self._current_response_id = event.get("response", {}).get("id")
                    self._is_responding = True
                    # Create a new future to wait for response.done
                    self._response_done_future = asyncio.Future()
                    print("Response created, ID: ", self._current_response_id)
                elif event_type == "response.output_item.added":
                    self._current_item_id = event.get("item", {}).get("id")
                    print("Output item added, ID: ", self._current_item_id)
                # Handle audio delta
                elif event_type == "response.audio.delta" and self.audio_callback:
                    audio_bytes = base64.b64decode(event.get("delta", ""))
                    self.audio_callback(audio_bytes)
                elif event_type == "response.audio.done":
                    print("Audio generation complete")
                elif event_type == "response.done":
                    self._is_responding = False
                    self._current_response_id = None
                    self._current_item_id = None
                    # Mark the future as done
                    if self._response_done_future and not self._response_done_future.done():
                        self._response_done_future.set_result(True)
                    print("Response complete")
                elif event_type == "session.finished":
                    print("Session ended")

        except websockets.exceptions.ConnectionClosed:
            print("Connection closed")
        except Exception as e:
            print("Error handling message: ", str(e))

    async def close(self) -> None:
        """Close the WebSocket connection."""
        if self.ws:
            await self.ws.close()

Java

TTSRealtimeClient.java という名前の Java ファイルを作成し、次のコードをコピーします。

import com.google.gson.Gson;
import com.google.gson.JsonObject;
import org.java_websocket.client.WebSocketClient;
import org.java_websocket.handshake.ServerHandshake;

import java.net.URI;
import java.util.Base64;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.CountDownLatch;
import java.util.function.Consumer;

/**
 * A client for interacting with the TTS Realtime API.
 *
 * This class provides methods to connect to the TTS Realtime API, send text data, receive audio output, and manage the WebSocket connection.
 */
public class TTSRealtimeClient {

    public enum SessionMode {
        SERVER_COMMIT("server_commit"),
        COMMIT("commit");
        private final String value;
        SessionMode(String value) { this.value = value; }
        public String getValue() { return value; }
    }

    /**
     * Audio callback interface
     */
    public interface AudioCallback {
        void onAudio(byte[] audioData);
    }

    private final String baseUrl;
    private final String apiKey;
    private final String voice;
    private final SessionMode mode;
    private final String languageType;
    private final AudioCallback audioCallback;
    private final Gson gson = new Gson();

    private WebSocketClient ws;
    private CountDownLatch responseDoneLatch;
    private CountDownLatch sessionFinishedLatch;

    public TTSRealtimeClient(String baseUrl, String apiKey, String voice,
                             SessionMode mode, AudioCallback audioCallback,
                             String languageType) {
        this.baseUrl = baseUrl;
        this.apiKey = apiKey;
        this.voice = voice;
        this.mode = mode;
        this.audioCallback = audioCallback;
        this.languageType = languageType;
    }

    public TTSRealtimeClient(String baseUrl, String apiKey, String voice,
                             SessionMode mode, AudioCallback audioCallback) {
        this(baseUrl, apiKey, voice, mode, audioCallback, "Auto");
    }

    /**
     * Establish a WebSocket connection to the TTS Realtime API.
     */
    public void connect() throws Exception {
        Map<String, String> headers = new HashMap<>();
        headers.put("Authorization", "Bearer " + apiKey);

        responseDoneLatch = new CountDownLatch(0);
        sessionFinishedLatch = new CountDownLatch(1);

        ws = new WebSocketClient(new URI(baseUrl), headers) {
            @Override
            public void onOpen(ServerHandshake handshake) {
                System.out.println("WebSocket connection established");
                // Send default session configuration
                JsonObject session = new JsonObject();
                session.addProperty("mode", mode.getValue());
                session.addProperty("voice", TTSRealtimeClient.this.voice);
                // To use the instruction control feature, uncomment the lines below and replace model with qwen3-tts-instruct-flash-realtime
                // session.addProperty("instructions", "Speak quickly with a noticeable rising intonation, suitable for introducing fashion products.");
                // session.addProperty("optimize_instructions", true);
                session.addProperty("language_type", languageType);
                session.addProperty("response_format", "pcm");
                session.addProperty("sample_rate", 24000);
                updateSession(session);
            }

            @Override
            public void onMessage(String message) {
                JsonObject event = gson.fromJson(message, JsonObject.class);
                String eventType = event.has("type") ? event.get("type").getAsString() : "";

                if (!"response.audio.delta".equals(eventType)) {
                    System.out.println("Received event: " + eventType);
                }

                switch (eventType) {
                    case "error":
                        System.err.println("Error: " + event.get("error"));
                        break;
                    case "session.created":
                        System.out.println("Session created, ID: " +
                            event.getAsJsonObject("session").get("id").getAsString());
                        break;
                    case "session.updated":
                        System.out.println("Session updated, ID: " +
                            event.getAsJsonObject("session").get("id").getAsString());
                        break;
                    case "input_text_buffer.committed":
                        System.out.println("Text buffer committed, item ID: " + event.get("item_id"));
                        break;
                    case "input_text_buffer.cleared":
                        System.out.println("Text buffer cleared");
                        break;
                    case "response.created":
                        System.out.println("Response created, ID: " +
                            event.getAsJsonObject("response").get("id").getAsString());
                        responseDoneLatch = new CountDownLatch(1);
                        break;
                    case "response.output_item.added":
                        System.out.println("Output item added, ID: " +
                            event.getAsJsonObject("item").get("id").getAsString());
                        break;
                    case "response.audio.delta":
                        if (audioCallback != null) {
                            byte[] audioBytes = Base64.getDecoder().decode(
                                event.get("delta").getAsString());
                            audioCallback.onAudio(audioBytes);
                        }
                        break;
                    case "response.audio.done":
                        System.out.println("Audio generation complete");
                        break;
                    case "response.done":
                        System.out.println("Response complete");
                        responseDoneLatch.countDown();
                        break;
                    case "session.finished":
                        System.out.println("Session ended");
                        sessionFinishedLatch.countDown();
                        break;
                }
            }

            @Override
            public void onClose(int code, String reason, boolean remote) {
                System.out.println("Connection closed: " + reason);
            }

            @Override
            public void onError(Exception ex) {
                System.err.println("WebSocket error: " + ex.getMessage());
            }
        };
        ws.connectBlocking();
    }

    /**
     * Send an event to the server.
     */
    public void sendEvent(JsonObject event) {
        String eventId = "event_" + System.currentTimeMillis();
        event.addProperty("event_id", eventId);
        System.out.println("Sending event: type=" + event.get("type").getAsString()
            + ", event_id=" + eventId);
        ws.send(gson.toJson(event));
    }

    /**
     * Update the session configuration.
     */
    public void updateSession(JsonObject config) {
        JsonObject event = new JsonObject();
        event.addProperty("type", "session.update");
        event.add("session", config);
        System.out.println("Updating session configuration: " + event);
        sendEvent(event);
    }

    /**
     * Send text data to the API.
     */
    public void appendText(String text) {
        JsonObject event = new JsonObject();
        event.addProperty("type", "input_text_buffer.append");
        event.addProperty("text", text);
        sendEvent(event);
    }

    /**
     * Commit the text buffer to trigger processing.
     */
    public void commitTextBuffer() {
        JsonObject event = new JsonObject();
        event.addProperty("type", "input_text_buffer.commit");
        sendEvent(event);
    }

    /**
     * Clear the text buffer.
     */
    public void clearTextBuffer() {
        JsonObject event = new JsonObject();
        event.addProperty("type", "input_text_buffer.clear");
        sendEvent(event);
    }

    /**
     * End the session.
     */
    public void finishSession() {
        JsonObject event = new JsonObject();
        event.addProperty("type", "session.finish");
        sendEvent(event);
    }

    /**
     * Wait for the response.done event.
     */
    public void waitForResponseDone() throws InterruptedException {
        responseDoneLatch.await();
    }

    /**
     * Wait for the session.finished event.
     */
    public void waitForSessionFinished() throws InterruptedException {
        sessionFinishedLatch.await();
    }

    /**
     * Close the WebSocket connection.
     */
    public void close() {
        if (ws != null) {
            ws.close();
        }
    }
}

合成モードの選択

リアルタイム API は次の 2 つのモードをサポートしています。

サーバーコミットモード

クライアントはテキストのみを送信します。サーバーがテキストをセグメント化し、合成タイミングを自動的に決定します。GPS ナビゲーションなど、手動でのタイミング制御が不要で低遅延が求められるシナリオに最適です。
コミットモード

クライアントはテキストをバッファーに追加し、明示的にコミットすることで合成をトリガーします。ニュース放送など、文の区切りやポーズを詳細に制御したいシナリオに最適です。

サーバーコミットモード

Python

server_commit.py という名前の Python ファイルを tts_realtime_client.py と同じディレクトリに作成し、次のコードをコピーします。

server_commit.py

import os
import asyncio
import logging
import wave
from tts_realtime_client import TTSRealtimeClient, SessionMode
import pyaudio

# QwenTTS service configuration
# To use the instruction control feature, replace model with qwen3-tts-instruct-flash-realtime and uncomment the instructions lines in tts_realtime_client.py
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime?model=qwen3-tts-flash-realtime
URL = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime?model=qwen3-tts-flash-realtime"
# The API Keys for the Singapore and Beijing regions are different. Obtain an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API Key: API_KEY="sk-xxx"
API_KEY = os.getenv("DASHSCOPE_API_KEY")

if not API_KEY:
    raise ValueError("Please set DASHSCOPE_API_KEY environment variable")

# Collect audio data
_audio_chunks = []
# Realtime playback related
_AUDIO_SAMPLE_RATE = 24000
_audio_pyaudio = pyaudio.PyAudio()
_audio_stream = None  # Will be opened at runtime

def _audio_callback(audio_bytes: bytes):
    """TTSRealtimeClient audio callback: realtime playback and caching"""
    global _audio_stream
    if _audio_stream is not None:
        try:
            _audio_stream.write(audio_bytes)
        except Exception as exc:
            logging.error(f"PyAudio playback error: {exc}")
    _audio_chunks.append(audio_bytes)
    logging.info(f"Received audio chunk: {len(audio_bytes)} bytes")

def _save_audio_to_file(filename: str = "output.wav", sample_rate: int = 24000) -> bool:
    """Save collected audio data to a WAV file"""
    if not _audio_chunks:
        logging.warning("No audio data to save")
        return False

    try:
        audio_data = b"".join(_audio_chunks)
        with wave.open(filename, 'wb') as wav_file:
            wav_file.setnchannels(1)  # Mono
            wav_file.setsampwidth(2)  # 16-bit
            wav_file.setframerate(sample_rate)
            wav_file.writeframes(audio_data)
        logging.info(f"Audio saved to: {filename}")
        return True
    except Exception as exc:
        logging.error(f"Failed to save audio: {exc}")
        return False

async def _produce_text(client: TTSRealtimeClient):
    """Send text fragments to the server"""
    text_fragments = [
        "Alibaba Cloud's large language model platform, Model Studio, is an all-in-one platform for developing and building large language model applications.",
        "Both developers and business users can deeply participate in the design and development of large language model applications.",
        "You can develop a large language model application in five minutes using a simple interface,",
        "or train a dedicated model in a few hours, allowing you to focus more energy on application innovation.",
    ]

    logging.info("Sending text fragments…")
    for text in text_fragments:
        logging.info(f"Sending fragment: {text}")
        await client.append_text(text)
        await asyncio.sleep(0.1)  # Brief delay between fragments

    # Wait for the server to finish internal processing before ending the session
    await asyncio.sleep(1.0)
    await client.finish_session()

async def _run_demo():
    """Run the complete demo"""
    global _audio_stream
    # Open PyAudio output stream
    _audio_stream = _audio_pyaudio.open(
        format=pyaudio.paInt16,
        channels=1,
        rate=_AUDIO_SAMPLE_RATE,
        output=True,
        frames_per_buffer=1024
    )

    client = TTSRealtimeClient(
        base_url=URL,
        api_key=API_KEY,
        voice="Cherry",
        mode=SessionMode.SERVER_COMMIT,
        audio_callback=_audio_callback
    )

    # Establish connection
    await client.connect()

    # Execute message handling and text sending in parallel
    consumer_task = asyncio.create_task(client.handle_messages())
    producer_task = asyncio.create_task(_produce_text(client))

    await producer_task  # Wait for text sending to complete

    # Wait for response.done
    await client.wait_for_response_done()

    # Close connection and cancel the consumer task
    await client.close()
    consumer_task.cancel()

    # Close the audio stream
    if _audio_stream is not None:
        _audio_stream.stop_stream()
        _audio_stream.close()
    _audio_pyaudio.terminate()

    # Save audio data
    os.makedirs("outputs", exist_ok=True)
    _save_audio_to_file(os.path.join("outputs", "qwen_tts_output.wav"))

def main():
    """Synchronous entry point"""
    logging.basicConfig(
        level=logging.INFO,
        format='%(asctime)s [%(levelname)s] %(message)s',
        datefmt='%Y-%m-%d %H:%M:%S'
    )
    logging.info("Starting QwenTTS Realtime Client demo…")
    asyncio.run(_run_demo())

if __name__ == "__main__":
    main()

server_commit.py を実行すると、リアルタイム API によって生成された音声をリアルタイムで聞くことができます。

Java

ServerCommit.java という名前の Java ファイルを TTSRealtimeClient.java と同じディレクトリに作成し、次のコードをコピーします。

import javax.sound.sampled.*;
import java.io.*;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class ServerCommit {
    // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime?model=qwen3-tts-flash-realtime
    private static final String URL = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime?model=qwen3-tts-flash-realtime";
    // The API Keys for the Singapore and Beijing regions are different. Obtain an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    // If you have not configured the environment variable, replace the following line with your Model Studio API Key: private static final String API_KEY = "sk-xxx";
    private static final String API_KEY = System.getenv("DASHSCOPE_API_KEY");
    private static final int SAMPLE_RATE = 24000;

    // Audio data cache
    private static final List<byte[]> audioChunks = new ArrayList<>();
    // Realtime playback queue
    private static final ConcurrentLinkedQueue<byte[]> playbackQueue = new ConcurrentLinkedQueue<>();
    private static final AtomicBoolean playing = new AtomicBoolean(true);

    public static void main(String[] args) throws Exception {
        if (API_KEY == null || API_KEY.isEmpty()) {
            throw new IllegalStateException("Please set the DASHSCOPE_API_KEY environment variable");
        }

        // Initialize audio playback
        AudioFormat format = new AudioFormat(SAMPLE_RATE, 16, 1, true, false);
        DataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
        SourceDataLine audioLine = (SourceDataLine) AudioSystem.getLine(info);
        audioLine.open(format);
        audioLine.start();

        // Start the playback thread
        Thread playerThread = new Thread(() -> {
            while (playing.get() || !playbackQueue.isEmpty()) {
                byte[] chunk = playbackQueue.poll();
                if (chunk != null) {
                    audioLine.write(chunk, 0, chunk.length);
                } else {
                    try { Thread.sleep(10); } catch (InterruptedException ignored) {}
                }
            }
        });
        playerThread.start();

        // Create the TTS client
        // To use the instruction control feature, replace model with qwen3-tts-instruct-flash-realtime and uncomment the instructions lines in TTSRealtimeClient.java
        TTSRealtimeClient client = new TTSRealtimeClient(
            URL, API_KEY, "Cherry",
            TTSRealtimeClient.SessionMode.SERVER_COMMIT,
            audioData -> {
                playbackQueue.add(audioData);
                audioChunks.add(audioData);
                System.out.println("Received audio data: " + audioData.length + " bytes");
            }
        );

        client.connect();

        // Send text fragments
        String[] textFragments = {
            "Alibaba Cloud's large language model platform, Model Studio, is an all-in-one platform for developing and building large language model applications.",
            "Both developers and business users can deeply participate in the design and development of large language model applications.",
            "You can develop a large language model application in five minutes using a simple interface,",
            "or train a dedicated model in a few hours, allowing you to focus more energy on application innovation.",
        };

        System.out.println("Starting to send text...");
        for (String text : textFragments) {
            System.out.println("Sending fragment: " + text);
            client.appendText(text);
            Thread.sleep(100);
        }

        Thread.sleep(1000);
        client.finishSession();

        // Wait for the response to complete
        client.waitForResponseDone();
        client.waitForSessionFinished();
        client.close();

        // Wait for playback to complete
        playing.set(false);
        playerThread.join();
        audioLine.drain();
        audioLine.close();

        // Save the audio file
        saveWav("output.wav");
        System.out.println("Done");
    }

    private static void saveWav(String filename) throws IOException {
        if (audioChunks.isEmpty()) {
            System.out.println("No audio data to save");
            return;
        }
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        for (byte[] chunk : audioChunks) {
            bos.write(chunk);
        }
        byte[] allAudio = bos.toByteArray();
        AudioFormat format = new AudioFormat(SAMPLE_RATE, 16, 1, true, false);
        AudioInputStream ais = new AudioInputStream(
            new ByteArrayInputStream(allAudio), format, allAudio.length / 2);
        new File("outputs").mkdirs();
        AudioSystem.write(ais, AudioFileFormat.Type.WAVE,
            new File("outputs/" + filename));
        System.out.println("Audio saved to: outputs/" + filename);
    }
}

ServerCommit.java をコンパイルして実行すると、リアルタイム API によって生成された音声をリアルタイムで聞くことができます。

コミットモード

Python

commit.py という名前の Python ファイルを tts_realtime_client.py と同じディレクトリに作成し、次のコードをコピーします。

commit.py

import os
import asyncio
import logging
import wave
from tts_realtime_client import TTSRealtimeClient, SessionMode
import pyaudio

# QwenTTS service configuration
# To use the instruction control feature, replace model with qwen3-tts-instruct-flash-realtime and uncomment the instructions lines in tts_realtime_client.py
# The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime?model=qwen3-tts-flash-realtime
URL = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime?model=qwen3-tts-flash-realtime"
# The API Keys for the Singapore and Beijing regions are different. Obtain an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API Key: API_KEY="sk-xxx"
API_KEY = os.getenv("DASHSCOPE_API_KEY")

if not API_KEY:
    raise ValueError("Please set DASHSCOPE_API_KEY environment variable")

# Collect audio data
_audio_chunks = []
_AUDIO_SAMPLE_RATE = 24000
_audio_pyaudio = pyaudio.PyAudio()
_audio_stream = None

def _audio_callback(audio_bytes: bytes):
    """TTSRealtimeClient audio callback: realtime playback and caching"""
    global _audio_stream
    if _audio_stream is not None:
        try:
            _audio_stream.write(audio_bytes)
        except Exception as exc:
            logging.error(f"PyAudio playback error: {exc}")
    _audio_chunks.append(audio_bytes)
    logging.info(f"Received audio chunk: {len(audio_bytes)} bytes")

def _save_audio_to_file(filename: str = "output.wav", sample_rate: int = 24000) -> bool:
    """Save collected audio data to a WAV file"""
    if not _audio_chunks:
        logging.warning("No audio data to save")
        return False

    try:
        audio_data = b"".join(_audio_chunks)
        with wave.open(filename, 'wb') as wav_file:
            wav_file.setnchannels(1)  # Mono
            wav_file.setsampwidth(2)  # 16-bit
            wav_file.setframerate(sample_rate)
            wav_file.writeframes(audio_data)
        logging.info(f"Audio saved to: {filename}")
        return True
    except Exception as exc:
        logging.error(f"Failed to save audio: {exc}")
        return False

async def _user_input_loop(client: TTSRealtimeClient):
    """Continuously read user input and send text. When the user enters empty text, send a commit event and end the current session"""
    print("Enter text (press Enter directly to send a commit event and end the current session, press Ctrl+C or Ctrl+D to exit the program):")
    
    while True:
        try:
            user_text = input("> ")
            if not user_text:  # User entered empty text
                # Empty input is treated as end of a conversation: commit buffer -> end session -> break out of loop
                logging.info("Empty input, sending commit event and ending current session")
                await client.commit_text_buffer()
                # Wait briefly for the server to process the commit, preventing premature session end from losing audio
                await asyncio.sleep(0.3)
                await client.finish_session()
                break  # Exit the user input loop directly, no need to press Enter again
            else:
                logging.info(f"Sending text: {user_text}")
                await client.append_text(user_text)
                
        except EOFError:  # User pressed Ctrl+D
            break
        except KeyboardInterrupt:  # User pressed Ctrl+C
            break
    
    # End session
    logging.info("Ending session...")
async def _run_demo():
    """Run the complete demo"""
    global _audio_stream
    # Open PyAudio output stream
    _audio_stream = _audio_pyaudio.open(
        format=pyaudio.paInt16,
        channels=1,
        rate=_AUDIO_SAMPLE_RATE,
        output=True,
        frames_per_buffer=1024
    )

    client = TTSRealtimeClient(
        base_url=URL,
        api_key=API_KEY,
        voice="Cherry",
        mode=SessionMode.COMMIT,  # Changed to COMMIT mode
        audio_callback=_audio_callback
    )

    # Establish connection
    await client.connect()

    # Execute message handling and user input in parallel
    consumer_task = asyncio.create_task(client.handle_messages())
    producer_task = asyncio.create_task(_user_input_loop(client))

    await producer_task  # Wait for user input to complete

    # Wait for response.done
    await client.wait_for_response_done()

    # Close connection and cancel the consumer task
    await client.close()
    consumer_task.cancel()

    # Close the audio stream
    if _audio_stream is not None:
        _audio_stream.stop_stream()
        _audio_stream.close()
    _audio_pyaudio.terminate()

    # Save audio data
    os.makedirs("outputs", exist_ok=True)
    _save_audio_to_file(os.path.join("outputs", "qwen_tts_output.wav"))

def main():
    logging.basicConfig(
        level=logging.INFO,
        format='%(asctime)s [%(levelname)s] %(message)s',
        datefmt='%Y-%m-%d %H:%M:%S'
    )
    logging.info("Starting QwenTTS Realtime Client demo…")
    asyncio.run(_run_demo())

if __name__ == "__main__":
    main()

commit.py を実行します。複数回テキストを入力して音声合成できます。テキストを入力せずに Enter キーを押すと、リアルタイム API によって返された音声をスピーカーから聞くことができます。

Java

Commit.java という名前の Java ファイルを TTSRealtimeClient.java と同じディレクトリに作成し、次のコードをコピーします。

import javax.sound.sampled.*;
import java.io.*;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class Commit {
    // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime?model=qwen3-tts-flash-realtime
    private static final String URL = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime?model=qwen3-tts-flash-realtime";
    // The API Keys for the Singapore and Beijing regions are different. Obtain an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    // If you have not configured the environment variable, replace the following line with your Model Studio API Key: private static final String API_KEY = "sk-xxx";
    private static final String API_KEY = System.getenv("DASHSCOPE_API_KEY");
    private static final int SAMPLE_RATE = 24000;

    private static final List<byte[]> audioChunks = new ArrayList<>();
    private static final ConcurrentLinkedQueue<byte[]> playbackQueue = new ConcurrentLinkedQueue<>();
    private static final AtomicBoolean playing = new AtomicBoolean(true);

    public static void main(String[] args) throws Exception {
        if (API_KEY == null || API_KEY.isEmpty()) {
            throw new IllegalStateException("Please set the DASHSCOPE_API_KEY environment variable");
        }

        // Initialize audio playback
        AudioFormat format = new AudioFormat(SAMPLE_RATE, 16, 1, true, false);
        DataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
        SourceDataLine audioLine = (SourceDataLine) AudioSystem.getLine(info);
        audioLine.open(format);
        audioLine.start();

        // Start the playback thread
        Thread playerThread = new Thread(() -> {
            while (playing.get() || !playbackQueue.isEmpty()) {
                byte[] chunk = playbackQueue.poll();
                if (chunk != null) {
                    audioLine.write(chunk, 0, chunk.length);
                } else {
                    try { Thread.sleep(10); } catch (InterruptedException ignored) {}
                }
            }
        });
        playerThread.start();

        // Create the TTS client (commit mode)
        // To use the instruction control feature, replace model with qwen3-tts-instruct-flash-realtime and uncomment the instructions lines in TTSRealtimeClient.java
        TTSRealtimeClient client = new TTSRealtimeClient(
            URL, API_KEY, "Cherry",
            TTSRealtimeClient.SessionMode.COMMIT,
            audioData -> {
                playbackQueue.add(audioData);
                audioChunks.add(audioData);
                System.out.println("Received audio data: " + audioData.length + " bytes");
            }
        );

        client.connect();

        // Interactive input
        System.out.println("Enter text (press Enter directly to send a commit event and end the current session, press Ctrl+D to exit the program):");
        Scanner scanner = new Scanner(System.in);
        while (true) {
            System.out.print("> ");
            if (!scanner.hasNextLine()) {
                client.finishSession();
                break;
            }
            String userText = scanner.nextLine();
            if (userText.isEmpty()) {
                // Empty input: commit the buffer and end the session
                System.out.println("Empty input, sending commit event and ending current session");
                client.commitTextBuffer();
                Thread.sleep(300);
                client.finishSession();
                break;
            } else {
                System.out.println("Sending text: " + userText);
                client.appendText(userText);
            }
        }
        scanner.close();

        // Wait for the response to complete
        client.waitForResponseDone();
        client.waitForSessionFinished();
        client.close();

        // Wait for playback to complete
        playing.set(false);
        playerThread.join();
        audioLine.drain();
        audioLine.close();

        // Save the audio file
        saveWav("output.wav");
        System.out.println("Done");
    }

    private static void saveWav(String filename) throws IOException {
        if (audioChunks.isEmpty()) {
            System.out.println("No audio data to save");
            return;
        }
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        for (byte[] chunk : audioChunks) {
            bos.write(chunk);
        }
        byte[] allAudio = bos.toByteArray();
        AudioFormat format = new AudioFormat(SAMPLE_RATE, 16, 1, true, false);
        AudioInputStream ais = new AudioInputStream(
            new ByteArrayInputStream(allAudio), format, allAudio.length / 2);
        new File("outputs").mkdirs();
        AudioSystem.write(ais, AudioFileFormat.Type.WAVE,
            new File("outputs/" + filename));
        System.out.println("Audio saved to: outputs/" + filename);
    }
}

Commit.java をコンパイルして実行します。複数回テキストを入力して音声合成できます。その後、テキストを入力せずに Enter キーを押すと、音声がスピーカーから再生されます。

クローニング音声での音声合成

音声クローニングではプレビュー音声が提供されません。出力を評価するには音声合成 API を使用してください。最初のテストでは短いテキストを使用することを推奨します。

次の例では、音声サンプルからクローニングされた音声を使用して音声合成を行います。これはシステム音声タブの DashScope SDK サーバーコミットモードのコードを拡張したもので、voice がクローニングされた音声に設定されています。

重要な原則：音声クローニングモデル (target_model) は音声合成モデル (model) と一致させる必要があります。不一致の場合、合成は失敗します。
この例ではローカル音声ファイル voice.mp3 を音声クローニングに使用します。コードを実行する際は、ご自身の音声ファイルに置き換えてください。

Python

# coding=utf-8
# Installation instructions for pyaudio:
# APPLE Mac OS X
#   brew install portaudio
#   pip install pyaudio
# Debian/Ubuntu
#   sudo apt-get install python-pyaudio python3-pyaudio
#   or
#   pip install pyaudio
# CentOS
#   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
#   python -m pip install pyaudio

import pyaudio
import os
import requests
import base64
import pathlib
import threading
import time
import dashscope  # DashScope Python SDK version must be 1.23.9 or later
from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat

# ======= Constant configuration =======
DEFAULT_TARGET_MODEL = "qwen3-tts-vc-realtime-2026-01-15"  # Voice cloning and speech synthesis must use the same model
DEFAULT_PREFERRED_NAME = "guanyu"
DEFAULT_AUDIO_MIME_TYPE = "audio/mpeg"
VOICE_FILE_PATH = "voice.mp3"  # Relative path to the local audio file used for voice cloning

TEXT_TO_SYNTHESIZE = [
    'Right? I love supermarkets like this.',
    'Especially during Chinese New Year',
    'When I go shopping',
    'I feel',
    'Extremely happy!',
    'And want to buy so many things!'
]

def create_voice(file_path: str,
                 target_model: str = DEFAULT_TARGET_MODEL,
                 preferred_name: str = DEFAULT_PREFERRED_NAME,
                 audio_mime_type: str = DEFAULT_AUDIO_MIME_TYPE) -> str:
    """
    Create a voice and return the voice parameter
    """
    # The API Keys for the Singapore and Beijing regions are different. Obtain an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you have not configured the environment variable, replace the following line with your Model Studio API Key: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")

    file_path_obj = pathlib.Path(file_path)
    if not file_path_obj.exists():
        raise FileNotFoundError(f"Audio file not found: {file_path}")

    base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
    data_uri = f"data:{audio_mime_type};base64,{base64_str}"

    # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    payload = {
        "model": "qwen-voice-enrollment", # Do not modify this value
        "input": {
            "action": "create",
            "target_model": target_model,
            "preferred_name": preferred_name,
            "audio": {"data": data_uri}
        }
    }
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }

    resp = requests.post(url, json=payload, headers=headers)
    if resp.status_code != 200:
        raise RuntimeError(f"Failed to create voice: {resp.status_code}, {resp.text}")

    try:
        return resp.json()["output"]["voice"]
    except (KeyError, ValueError) as e:
        raise RuntimeError(f"Failed to parse voice response: {e}")

def init_dashscope_api_key():
    """
    Initialize the DashScope SDK API key
    """
    # The API Keys for the Singapore and Beijing regions are different. Obtain an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you have not configured the environment variable, replace the following line with your Model Studio API Key: dashscope.api_key = "sk-xxx"
    dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

# ======= Callback class =======
class MyCallback(QwenTtsRealtimeCallback):
    """
    Custom TTS streaming callback
    """
    def __init__(self):
        self.complete_event = threading.Event()
        self._player = pyaudio.PyAudio()
        self._stream = self._player.open(
            format=pyaudio.paInt16, channels=1, rate=24000, output=True
        )

    def on_open(self) -> None:
        print('[TTS] Connection established')

    def on_close(self, close_status_code, close_msg) -> None:
        self._stream.stop_stream()
        self._stream.close()
        self._player.terminate()
        print(f'[TTS] Connection closed code={close_status_code}, msg={close_msg}')

    def on_event(self, response: dict) -> None:
        try:
            event_type = response.get('type', '')
            if event_type == 'session.created':
                print(f'[TTS] Session started: {response["session"]["id"]}')
            elif event_type == 'response.audio.delta':
                audio_data = base64.b64decode(response['delta'])
                self._stream.write(audio_data)
            elif event_type == 'response.done':
                print(f'[TTS] Response complete, Response ID: {qwen_tts_realtime.get_last_response_id()}')
            elif event_type == 'session.finished':
                print('[TTS] Session ended')
                self.complete_event.set()
        except Exception as e:
            print(f'[Error] Exception handling callback event: {e}')

    def wait_for_finished(self):
        self.complete_event.wait()

# ======= Main execution logic =======
if __name__ == '__main__':
    init_dashscope_api_key()
    print('[System] Initializing Qwen TTS Realtime ...')

    callback = MyCallback()
    qwen_tts_realtime = QwenTtsRealtime(
        model=DEFAULT_TARGET_MODEL,
        callback=callback,
        # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
        url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
    )
    qwen_tts_realtime.connect()
    
    qwen_tts_realtime.update_session(
        voice=create_voice(VOICE_FILE_PATH), # Replace the voice parameter with the custom voice generated by voice cloning
        response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
        mode='server_commit'
    )

    for text_chunk in TEXT_TO_SYNTHESIZE:
        print(f'[Sending text]: {text_chunk}')
        qwen_tts_realtime.append_text(text_chunk)
        time.sleep(0.1)

    qwen_tts_realtime.finish()
    callback.wait_for_finished()

    print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, '
          f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')

Java

Gson 依存関係をインポートします。Maven または Gradle を使用する場合は、次のように依存関係を追加します。

Maven

pom.xml に次の内容を追加します。

<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.13.1</version>
</dependency>

Gradle

build.gradle に次の内容を追加します。

// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")

import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.Gson;
import com.google.gson.JsonObject;

import javax.sound.sampled.*;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class Main {
    // ===== Constant definitions =====
    // Voice cloning and speech synthesis must use the same model
    private static final String TARGET_MODEL = "qwen3-tts-vc-realtime-2026-01-15";
    private static final String PREFERRED_NAME = "guanyu";
    // Relative path to the local audio file used for voice cloning
    private static final String AUDIO_FILE = "voice.mp3";
    private static final String AUDIO_MIME_TYPE = "audio/mpeg";
    private static String[] textToSynthesize = {
            "Right? I love supermarkets like this.",
            "Especially during Chinese New Year",
            "When I go shopping",
            "I feel",
            "Extremely happy!",
            "And want to buy so many things!"
    };

    // Generate data URI
    public static String toDataUrl(String filePath) throws IOException {
        byte[] bytes = Files.readAllBytes(Paths.get(filePath));
        String encoded = Base64.getEncoder().encodeToString(bytes);
        return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
    }

    // Call the API to create a voice
    public static String createVoice() throws Exception {
        // The API Keys for the Singapore and Beijing regions are different. Obtain an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        // If you have not configured the environment variable, replace the following line with your Model Studio API Key: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        String jsonPayload =
                "{"
                        + "\"model\": \"qwen-voice-enrollment\"," // Do not modify this value
                        + "\"input\": {"
                        +     "\"action\": \"create\","
                        +     "\"target_model\": \"" + TARGET_MODEL + "\","
                        +     "\"preferred_name\": \"" + PREFERRED_NAME + "\","
                        +     "\"audio\": {"
                        +         "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
                        +     "}"
                        + "}"
                        + "}";

        HttpURLConnection con = (HttpURLConnection) new URL("https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization").openConnection();
        con.setRequestMethod("POST");
        con.setRequestProperty("Authorization", "Bearer " + apiKey);
        con.setRequestProperty("Content-Type", "application/json");
        con.setDoOutput(true);

        try (OutputStream os = con.getOutputStream()) {
            os.write(jsonPayload.getBytes(StandardCharsets.UTF_8));
        }

        int status = con.getResponseCode();
        System.out.println("HTTP status code: " + status);

        try (BufferedReader br = new BufferedReader(
                new InputStreamReader(status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(),
                        StandardCharsets.UTF_8))) {
            StringBuilder response = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                response.append(line);
            }
            System.out.println("Response body: " + response);

            if (status == 200) {
                JsonObject jsonObj = new Gson().fromJson(response.toString(), JsonObject.class);
                return jsonObj.getAsJsonObject("output").get("voice").getAsString();
            }
            throw new IOException("Failed to create voice: " + status + " - " + response);
        }
    }

    // Realtime PCM audio player class
    public static class RealtimePcmPlayer {
        private int sampleRate;
        private SourceDataLine line;
        private AudioFormat audioFormat;
        private Thread decoderThread;
        private Thread playerThread;
        private AtomicBoolean stopped = new AtomicBoolean(false);
        private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
        private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();

        // Constructor: initialize audio format and audio line
        public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
            this.sampleRate = sampleRate;
            this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
            DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
            line = (SourceDataLine) AudioSystem.getLine(info);
            line.open(audioFormat);
            line.start();
            decoderThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        String b64Audio = b64AudioBuffer.poll();
                        if (b64Audio != null) {
                            byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
                            RawAudioBuffer.add(rawAudio);
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            playerThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        byte[] rawAudio = RawAudioBuffer.poll();
                        if (rawAudio != null) {
                            try {
                                playChunk(rawAudio);
                            } catch (IOException e) {
                                throw new RuntimeException(e);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            decoderThread.start();
            playerThread.start();
        }

        // Play an audio chunk and block until playback is complete
        private void playChunk(byte[] chunk) throws IOException, InterruptedException {
            if (chunk == null || chunk.length == 0) return;

            int bytesWritten = 0;
            while (bytesWritten < chunk.length) {
                bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
            }
            int audioLength = chunk.length / (this.sampleRate*2/1000);
            // Wait for the audio in the buffer to finish playing
            Thread.sleep(audioLength - 10);
        }

        public void write(String b64Audio) {
            b64AudioBuffer.add(b64Audio);
        }

        public void cancel() {
            b64AudioBuffer.clear();
            RawAudioBuffer.clear();
        }

        public void waitForComplete() throws InterruptedException {
            while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
                Thread.sleep(100);
            }
            line.drain();
        }

        public void shutdown() throws InterruptedException {
            stopped.set(true);
            decoderThread.join();
            playerThread.join();
            if (line != null && line.isRunning()) {
                line.drain();
                line.close();
            }
        }
    }

    public static void main(String[] args) throws Exception {
        QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
                .model(TARGET_MODEL)
                // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
                .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                // The API Keys for the Singapore and Beijing regions are different. Obtain an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                // If you have not configured the environment variable, replace the following line with your Model Studio API Key: .apikey("sk-xxx")
                .apikey(System.getenv("DASHSCOPE_API_KEY"))
                .build();
        AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
        final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);

        // Create a realtime audio player instance
        RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);

        QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
            @Override
            public void onOpen() {
                // Handle connection establishment
            }
            @Override
            public void onEvent(JsonObject message) {
                String type = message.get("type").getAsString();
                switch(type) {
                    case "session.created":
                        // Handle session creation
                        break;
                    case "response.audio.delta":
                        String recvAudioB64 = message.get("delta").getAsString();
                        // Play audio in realtime
                        audioPlayer.write(recvAudioB64);
                        break;
                    case "response.done":
                        // Handle response completion
                        break;
                    case "session.finished":
                        // Handle session end
                        completeLatch.get().countDown();
                    default:
                        break;
                }
            }
            @Override
            public void onClose(int code, String reason) {
                // Handle connection closure
            }
        });
        qwenTtsRef.set(qwenTtsRealtime);
        try {
            qwenTtsRealtime.connect();
        } catch (NoApiKeyException e) {
            throw new RuntimeException(e);
        }
        QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
                .voice(createVoice()) // Replace the voice parameter with the custom voice generated by voice cloning
                .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
                .mode("server_commit")
                .build();
        qwenTtsRealtime.updateSession(config);
        for (String text:textToSynthesize) {
            qwenTtsRealtime.appendText(text);
            Thread.sleep(100);
        }
        qwenTtsRealtime.finish();
        completeLatch.get().await();

        // Wait for audio playback to complete and shut down the player
        audioPlayer.waitForComplete();
        audioPlayer.shutdown();
        System.exit(0);
    }
}

デザイン音声での音声合成

音声デザイン機能ではプレビュー音声が返されます。音声合成に使用する前にプレビューを聞いて、期待通りの出力であることを確認してください。これにより不要な API 呼び出しを回避できます。

カスタム音声を作成してプレビューします。満足すれば続行し、そうでなければ再作成します。

Python

import requests
import base64
import os

def create_voice_and_play():
    # API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")
    
    if not api_key:
        print("Error: DASHSCOPE_API_KEY environment variable not found. Please set the API key first.")
        return None, None, None
    
    # Prepare request data
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    
    data = {
        "model": "qwen-voice-design",
        "input": {
            "action": "create",
            "target_model": "qwen3-tts-vd-realtime-2026-01-15",
            "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.",
            "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
            "preferred_name": "announcer",
            "language": "en"
        },
        "parameters": {
            "sample_rate": 24000,
            "response_format": "wav"
        }
    }
    
    # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    
    try:
        # Send the request
        response = requests.post(
            url,
            headers=headers,
            json=data,
            timeout=60  # Add a timeout setting
        )
        
        if response.status_code == 200:
            result = response.json()
            
            # Get the voice name
            voice_name = result["output"]["voice"]
            print(f"Voice name: {voice_name}")
            
            # Get the preview audio data
            base64_audio = result["output"]["preview_audio"]["data"]
            
            # Decode the Base64 audio data
            audio_bytes = base64.b64decode(base64_audio)
            
            # Save the audio file locally
            filename = f"{voice_name}_preview.wav"
            
            # Write the audio data to a local file
            with open(filename, 'wb') as f:
                f.write(audio_bytes)
            
            print(f"Audio saved to local file: {filename}")
            print(f"File path: {os.path.abspath(filename)}")
            
            return voice_name, audio_bytes, filename
        else:
            print(f"Request failed with status code: {response.status_code}")
            print(f"Response content: {response.text}")
            return None, None, None
            
    except requests.exceptions.RequestException as e:
        print(f"A network request error occurred: {e}")
        return None, None, None
    except KeyError as e:
        print(f"Response data format error, missing required field: {e}")
        print(f"Response content: {response.text if 'response' in locals() else 'No response'}")
        return None, None, None
    except Exception as e:
        print(f"An unknown error occurred: {e}")
        return None, None, None

if __name__ == "__main__":
    print("Starting to create voice...")
    voice_name, audio_data, saved_filename = create_voice_and_play()
    
    if voice_name:
        print(f"\nSuccessfully created voice '{voice_name}'")
        print(f"Audio file saved as: '{saved_filename}'")
        print(f"File size: {os.path.getsize(saved_filename)} bytes")
    else:
        print("\nVoice creation failed")

Java

プロジェクトに Gson 依存関係を追加します。

Maven

pom.xml に次の内容を追加します。

<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.13.1</version>
</dependency>

Gradle

build.gradle に次の内容を追加します。

// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Base64;

public class Main {
    public static void main(String[] args) {
        Main example = new Main();
        example.createVoice();
    }

    public void createVoice() {
        // API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        // If the environment variable is not set, replace the following line with your Model Studio API key: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        // Create the JSON request body string
        String jsonBody = "{\n" +
                "    \"model\": \"qwen-voice-design\",\n" +
                "    \"input\": {\n" +
                "        \"action\": \"create\",\n" +
                "        \"target_model\": \"qwen3-tts-vd-realtime-2026-01-15\",\n" +
                "        \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" +
                "        \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" +
                "        \"preferred_name\": \"announcer\",\n" +
                "        \"language\": \"en\"\n" +
                "    },\n" +
                "    \"parameters\": {\n" +
                "        \"sample_rate\": 24000,\n" +
                "        \"response_format\": \"wav\"\n" +
                "    }\n" +
                "}";

        HttpURLConnection connection = null;
        try {
            // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
            URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
            connection = (HttpURLConnection) url.openConnection();

            // Set the request method and headers
            connection.setRequestMethod("POST");
            connection.setRequestProperty("Authorization", "Bearer " + apiKey);
            connection.setRequestProperty("Content-Type", "application/json");
            connection.setDoOutput(true);
            connection.setDoInput(true);

            // Send the request body
            try (OutputStream os = connection.getOutputStream()) {
                byte[] input = jsonBody.getBytes("UTF-8");
                os.write(input, 0, input.length);
                os.flush();
            }

            // Get the response
            int responseCode = connection.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK) {
                // Read the response content
                StringBuilder response = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        response.append(responseLine.trim());
                    }
                }

                // Parse the JSON response
                JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
                JsonObject outputObj = jsonResponse.getAsJsonObject("output");
                JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio");

                // Get the voice name
                String voiceName = outputObj.get("voice").getAsString();
                System.out.println("Voice name: " + voiceName);

                // Get the Base64-encoded audio data
                String base64Audio = previewAudioObj.get("data").getAsString();

                // Decode the Base64 audio data
                byte[] audioBytes = Base64.getDecoder().decode(base64Audio);

                // Save the audio to a local file
                String filename = voiceName + "_preview.wav";
                saveAudioToFile(audioBytes, filename);

                System.out.println("Audio saved to local file: " + filename);

            } else {
                // Read the error response
                StringBuilder errorResponse = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        errorResponse.append(responseLine.trim());
                    }
                }

                System.out.println("Request failed with status code: " + responseCode);
                System.out.println("Error response: " + errorResponse.toString());
            }

        } catch (Exception e) {
            System.err.println("An error occurred during the request: " + e.getMessage());
            e.printStackTrace();
        } finally {
            if (connection != null) {
                connection.disconnect();
            }
        }
    }

    private void saveAudioToFile(byte[] audioBytes, String filename) {
        try {
            File file = new File(filename);
            try (FileOutputStream fos = new FileOutputStream(file)) {
                fos.write(audioBytes);
            }
            System.out.println("Audio saved to: " + file.getAbsolutePath());
        } catch (IOException e) {
            System.err.println("An error occurred while saving the audio file: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

前のステップで作成したカスタム音声を使用して音声合成を行います。

この例は、DashScope SDK のシステム音声の「サーバーコミットモード」のサンプルコードに従っています。voice パラメーターを音声デザインで生成されたカスタム音声に置き換えています。

重要な原則：音声デザインに使用するモデル (target_model) は、後続の音声合成に使用するモデル (model) と一致させる必要があります。一致しない場合、合成は失敗します。

Python

# coding=utf-8
# Installation instructions for pyaudio:
# APPLE Mac OS X
#   brew install portaudio
#   pip install pyaudio
# Debian/Ubuntu
#   sudo apt-get install python-pyaudio python3-pyaudio
#   or
#   pip install pyaudio
# CentOS
#   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
#   python -m pip install pyaudio

import pyaudio
import os
import base64
import threading
import time
import dashscope  # DashScope Python SDK version must be 1.23.9 or later
from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat

# ======= Constant configuration =======
TEXT_TO_SYNTHESIZE = [
    'Right? I really like this kind of supermarket,',
    'especially during the New Year.',
    'Going to the supermarket',
    'just makes me feel',
    'super, super happy!',
    'I want to buy so many things!'
]

def init_dashscope_api_key():
    """
    Initialize the API key for the DashScope SDK.
    """
    # API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If the environment variable is not set, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
    dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

# ======= Callback class =======
class MyCallback(QwenTtsRealtimeCallback):
    """
    Custom TTS streaming callback.
    """
    def __init__(self):
        self.complete_event = threading.Event()
        self._player = pyaudio.PyAudio()
        self._stream = self._player.open(
            format=pyaudio.paInt16, channels=1, rate=24000, output=True
        )

    def on_open(self) -> None:
        print('[TTS] Connection established')

    def on_close(self, close_status_code, close_msg) -> None:
        self._stream.stop_stream()
        self._stream.close()
        self._player.terminate()
        print(f'[TTS] Connection closed, code={close_status_code}, msg={close_msg}')

    def on_event(self, response: dict) -> None:
        try:
            event_type = response.get('type', '')
            if event_type == 'session.created':
                print(f'[TTS] Session started: {response["session"]["id"]}')
            elif event_type == 'response.audio.delta':
                audio_data = base64.b64decode(response['delta'])
                self._stream.write(audio_data)
            elif event_type == 'response.done':
                print(f'[TTS] Response complete, Response ID: {qwen_tts_realtime.get_last_response_id()}')
            elif event_type == 'session.finished':
                print('[TTS] Session finished')
                self.complete_event.set()
        except Exception as e:
            print(f'[Error] Exception processing callback event: {e}')

    def wait_for_finished(self):
        self.complete_event.wait()

# ======= Main execution logic =======
if __name__ == '__main__':
    init_dashscope_api_key()
    print('[System] Initializing Qwen TTS Realtime ...')

    callback = MyCallback()
    qwen_tts_realtime = QwenTtsRealtime(
        # Use the same model for voice design and speech synthesis
        model="qwen3-tts-vd-realtime-2026-01-15",
        callback=callback,
        # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
        url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
    )
    qwen_tts_realtime.connect()
    
    qwen_tts_realtime.update_session(
        voice="myvoice", # Replace the voice parameter with the custom voice generated by voice design
        response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
        mode='server_commit'
    )

    for text_chunk in TEXT_TO_SYNTHESIZE:
        print(f'[Sending text]: {text_chunk}')
        qwen_tts_realtime.append_text(text_chunk)
        time.sleep(0.1)

    qwen_tts_realtime.finish()
    callback.wait_for_finished()

    print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, '
          f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')

Java

import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;

import javax.sound.sampled.*;
import java.io.*;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class Main {
    // ===== Constant definitions =====
    private static String[] textToSynthesize = {
            "Right? I really like this kind of supermarket,",
            "especially during the New Year.",
            "Going to the supermarket",
            "just makes me feel",
            "super, super happy!",
            "I want to buy so many things!"
    };

    // Real-time audio player class
    public static class RealtimePcmPlayer {
        private int sampleRate;
        private SourceDataLine line;
        private AudioFormat audioFormat;
        private Thread decoderThread;
        private Thread playerThread;
        private AtomicBoolean stopped = new AtomicBoolean(false);
        private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
        private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();

        // Constructor initializes audio format and audio line
        public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
            this.sampleRate = sampleRate;
            this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
            DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
            line = (SourceDataLine) AudioSystem.getLine(info);
            line.open(audioFormat);
            line.start();
            decoderThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        String b64Audio = b64AudioBuffer.poll();
                        if (b64Audio != null) {
                            byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
                            RawAudioBuffer.add(rawAudio);
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            playerThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        byte[] rawAudio = RawAudioBuffer.poll();
                        if (rawAudio != null) {
                            try {
                                playChunk(rawAudio);
                            } catch (IOException e) {
                                throw new RuntimeException(e);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            decoderThread.start();
            playerThread.start();
        }

        // Plays an audio chunk and blocks until playback is complete
        private void playChunk(byte[] chunk) throws IOException, InterruptedException {
            if (chunk == null || chunk.length == 0) return;

            int bytesWritten = 0;
            while (bytesWritten < chunk.length) {
                bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
            }
            int audioLength = chunk.length / (this.sampleRate*2/1000);
            // Wait for the audio in the buffer to finish playing
            Thread.sleep(audioLength - 10);
        }

        public void write(String b64Audio) {
            b64AudioBuffer.add(b64Audio);
        }

        public void cancel() {
            b64AudioBuffer.clear();
            RawAudioBuffer.clear();
        }

        public void waitForComplete() throws InterruptedException {
            while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
                Thread.sleep(100);
            }
            line.drain();
        }

        public void shutdown() throws InterruptedException {
            stopped.set(true);
            decoderThread.join();
            playerThread.join();
            if (line != null && line.isRunning()) {
                line.drain();
                line.close();
            }
        }
    }

    public static void main(String[] args) throws Exception {
        QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
                // Use the same model for voice design and speech synthesis
                .model("qwen3-tts-vd-realtime-2026-01-15")
                // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
                .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                // API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                // If the environment variable is not set, replace the following line with your Model Studio API key: .apikey("sk-xxx")
                .apikey(System.getenv("DASHSCOPE_API_KEY"))
                .build();
        AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
        final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);

        // Create a real-time audio player instance
        RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);

        QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
            @Override
            public void onOpen() {
                // Handling for when the connection is established
            }
            @Override
            public void onEvent(JsonObject message) {
                String type = message.get("type").getAsString();
                switch(type) {
                    case "session.created":
                        // Handling for when the session is created
                        break;
                    case "response.audio.delta":
                        String recvAudioB64 = message.get("delta").getAsString();
                        // Play audio in real time
                        audioPlayer.write(recvAudioB64);
                        break;
                    case "response.done":
                        // Handling for when the response is complete
                        break;
                    case "session.finished":
                        // Handling for when the session is finished
                        completeLatch.get().countDown();
                    default:
                        break;
                }
            }
            @Override
            public void onClose(int code, String reason) {
                // Handling for when the connection is closed
            }
        });
        qwenTtsRef.set(qwenTtsRealtime);
        try {
            qwenTtsRealtime.connect();
        } catch (NoApiKeyException e) {
            throw new RuntimeException(e);
        }
        QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
                .voice("myvoice") // Replace the voice parameter with the custom voice generated by voice design
                .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
                .mode("server_commit")
                .build();
        qwenTtsRealtime.updateSession(config);
        for (String text:textToSynthesize) {
            qwenTtsRealtime.appendText(text);
            Thread.sleep(100);
        }
        qwenTtsRealtime.finish();
        completeLatch.get().await();

        // Wait for audio playback to complete and shut down the player
        audioPlayer.waitForComplete();
        audioPlayer.shutdown();
        System.exit(0);
    }
}

その他のサンプルコードについては、「GitHub の alibabacloud-bailian-speech-demo」をご参照ください。

インタラクションフロー

サーバーコミットモード

session.mode を "server_commit" に設定します（session.update イベント内）。このモードでは、サーバーがテキストのセグメント化と合成タイミングを自動的に処理します。

クライアントが session.update イベントを送信します。サーバーは session.created および session.updated イベントで応答します。
クライアントが input_text_buffer.append イベントを送信して、サーバー側のバッファーにテキストを追加します。
サーバーが自動的にテキストをセグメント化し、合成タイミングを決定した後、response.created、response.output_item.added、response.content_part.added、および response.audio.delta イベントを返します。
応答が完了した後、サーバーは response.audio.done、response.content_part.done、response.output_item.done、および response.done を送信します。
サーバーは session.finished イベントを送信してセッションを終了します。

ライフサイクル	クライアントイベント	サーバーイベント
セッション初期化	session.update セッション構成	session.created セッション作成 session.updated セッション構成更新
ユーザーのテキスト入力	input_text_buffer.append サーバー側バッファーにテキストを追加 input_text_buffer.commit サーバー上でバッファーテキストを即時合成 session.finish これ以上テキスト入力がないことをサーバーに通知	input_text_buffer.committed サーバーがコミットされたテキストを確認
サーバーのオーディオ出力	なし	response.created サーバーが応答の生成を開始 response.output_item.added 応答に新しい出力コンテンツを追加 response.content_part.added アシスタントメッセージに新しいコンテンツ部分を追加 response.audio.delta モデルからの増分オーディオデータ response.content_part.done アシスタントメッセージのテキストまたはオーディオコンテンツのストリーミング完了 response.output_item.done アシスタントメッセージの完全な出力アイテムのストリーミング完了 response.audio.done オーディオ生成完了 response.done 応答完了

コミットモード

session.mode を "commit" に設定します（session.update イベント内）。このモードでは、クライアントが明示的にテキストバッファーをサーバーにコミットして合成をトリガーします。

クライアントが session.update イベントを送信します。サーバーは session.created および session.updated イベントで応答します。
クライアントが input_text_buffer.append イベントを送信して、サーバー側のバッファーにテキストを追加します。
クライアントが input_text_buffer.commit イベントを送信してバッファーをサーバーにコミットし、その後 session.finish イベントを送信して、これ以上テキスト入力がないことを示します。
サーバーは response.created で応答し、応答の生成を開始します。
サーバーは response.output_item.added、response.content_part.added、および response.audio.delta イベントを送信します。
応答が完了した後、サーバーは response.audio.done、response.content_part.done、response.output_item.done、および response.done を返します。
サーバーは session.finished イベントを送信してセッションを終了します。

ライフサイクル	クライアントイベント	サーバーイベント
セッション初期化	session.update セッション構成	session.created セッション作成 session.updated セッション構成更新
ユーザーのテキスト入力	input_text_buffer.append バッファーにテキストを追加 input_text_buffer.commit バッファーをサーバーにコミット input_text_buffer.clear バッファーをクリア	input_text_buffer.committed サーバーがコミットされたテキストを確認
サーバーのオーディオ出力	なし	response.created サーバーが応答の生成を開始 response.output_item.added 応答に新しい出力コンテンツを追加 response.content_part.added アシスタントメッセージに新しいコンテンツ部分を追加 response.audio.delta モデルからの増分オーディオデータ response.content_part.done アシスタントメッセージのテキストまたはオーディオコンテンツのストリーミング完了 response.output_item.done アシスタントメッセージの完全な出力アイテムのストリーミング完了 response.audio.done オーディオ生成完了 response.done 応答完了

命令制御

命令制御は自然言語を通じて音声の表現力を調整します。音声パラメーターを調整することなく、直接トーン、音声速度、感情、または音声特性を記述できます。

サポートされるモデル：Qwen3-TTS-Instruct-Flash-Realtime シリーズのみ。

使用方法：instructions パラメーターに命令を指定します。例：「ファッション商品の紹介に適した、速めの話し方でイントネーションが上昇するように話してください。」

サポートされる言語：命令テキストは中国語および英語のみをサポートします。

長さ制限：命令は 1,600 トークンを超えてはなりません。

ユースケース：

オーディオブックおよびラジオドラマのボイスオーバー
広告およびプロモーションのボイスオーバー
ゲームキャラクターおよびアニメーションのボイスオーバー
感情認識型インテリジェント音声アシスタント
ドキュメンタリーおよびニュース放送のナレーション

効果的な音声記述の書き方のヒント：

基本原則：
1. 具体的にする：「深みのある」「はっきりとした」などの具体的な記述子を使用します。「心地よい」や「普通」などの曖昧な言葉は避けてください。
2. 複数のディメンションをカバーする：ピッチ、音声速度、感情を組み合わせます。単一のディメンション（例：「高いピッチ」）だけの記述では一般的な結果になります。
3. 客観的である：好みではなく、物理的および知覚的な音声特性を記述します。たとえば、「お気に入りの音声」ではなく、「やや高いピッチでエネルギッシュなトーン」を使用します。
4. 品質を記述し、模倣しない：特定の個人（有名人や俳優など）の模倣を要求するのではなく、音声特性を記述します。モデルは直接的な模倣をサポートしておらず、そのような要求には著作権リスクがあります。
5. 簡潔にする：一語一句に意味を持たせてください。同義語を繰り返したり、無意味な強調語（例：「本当に素晴らしい音声」）を使用したりしないでください。

ディメンションのリファレンス：複数のディメンションを組み合わせることで、より豊かで表現力のある結果を得られます。

ディメンション	例
ピッチ	高、中、低、やや高、やや低
音声速度	速、中、遅、やや速、やや遅
感情	明るい、落ち着いた、優しい、真剣、活発、冷静、癒し系
音声品質	磁性的、はっきりとした、ハスキー、まろやか、甘い、響きのある、力強い
ユースケース	ニュース放送、広告ボイスオーバー、オーディオブック、アニメーションキャラクター、音声アシスタント、ドキュメンタリーのナレーション

例：
- 標準的な放送スタイル：明瞭で正確な発音、丸みのある発音
- 感情の高まり：会話レベルから叫び声まで音量が急速に上昇し、直接的な性格で感情をオープンに表現
- 特別な感情状態：泣き声によるやや不明瞭な話し方、わずかな嗄声、泣いていることによる明らかな緊張感
- 広告ボイスオーバースタイル：やや高いピッチ、中程度の音声速度、エネルギッシュでカリスマ性のある広告ボイスオーバーに最適
- 優しく癒し系のスタイル：やや遅めの音声速度、柔らかく甘いピッチ、温かく安心感のあるトーン（思いやりのある友人のよう）

API リファレンス

リアルタイム音声合成 - Qwen API リファレンス

音声クローニング - API リファレンス

音声デザイン - API リファレンス

モデル機能比較

機能	Qwen3-TTS-Instruct-Flash-Realtime	Qwen3-TTS-VD-Realtime	Qwen3-TTS-VC-Realtime	Qwen3-TTS-Flash-Realtime	Qwen-TTS-Realtime
サポート言語	中国語（標準語）、英語、スペイン語、ロシア語、イタリア語、フランス語、韓国語、日本語、ドイツ語、ポルトガル語	中国語（標準語）、英語、スペイン語、ロシア語、イタリア語、フランス語、韓国語、日本語、ドイツ語、ポルトガル語		中国語（標準語、北京、上海、四川、南京、陝西、閩南語、天津、広東語の方言。使用する音声によって異なります）、英語、スペイン語、ロシア語、イタリア語、フランス語、韓国語、日本語、ドイツ語、ポルトガル語	中国語、英語
オーディオ形式	PCM、WAV、MP3、Opus				PCM
サンプルレート	8 kHz、16 kHz、24 kHz、48 kHz				24 kHz
音声クローニング	未対応		対応	未対応
音声デザイン	未対応	対応	未対応
SSML	未対応
LaTeX	未対応
ボリューム制御	対応				未対応
音声速度制御	対応				未対応
ピッチ制御	対応				未対応
ビットレート制御	対応				未対応
タイムスタンプ	未対応
命令制御	対応	未対応
ストリーミング入力	対応
ストリーミング出力	対応
レート制限		1 分あたりのリクエスト数 (RPM)：180		qwen3-tts-flash-realtime および qwen3-tts-flash-realtime-2025-11-27：1 分あたりのリクエスト数 (RPM)：180 qwen3-tts-flash-realtime-2025-09-18：1 分あたりのリクエスト数 (RPM)：10	1 分あたりのリクエスト数 (RPM)：10 1 分あたりのトークン数 (TPM)：100,000
アクセス方法	Java/Python SDK、WebSocket API
料金	国際：1 万文字あたり 0.143 USD 中国本土：1 万文字あたり 0.143 USD	国際：1 万文字あたり 0.143353 USD 中国本土：1 万文字あたり 0.143353 USD	国際：1 万文字あたり 0.13 USD 中国本土：1 万文字あたり 0.143353 USD		中国本土：入力：1,000 トークンあたり 0.345 USD 出力：1,000 トークンあたり 1.721 USD

サポートされる音声

異なるモデルは異なる音声をサポートしています。特定の音声を使用するには、下の表のvoice パラメーター列の値を voice に設定します。

`voice` パラメーター	詳細	サポート言語	サポートモデル
`Cherry`	音声名: Cherry 説明: 陽気でポジティブ、フレンドリーで自然な若い女性（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 Qwen-TTS-Realtime: qwen-tts-realtime, qwen-tts-realtime-latest, qwen-tts-realtime-2025-07-15
`Serena`	音声名: Serena 説明: 優しい若い女性（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 Qwen-TTS-Realtime: qwen-tts-realtime, qwen-tts-realtime-latest, qwen-tts-realtime-2025-07-15
`Ethan`	音声名: Ethan 説明: 標準中国語にやや北方訛りあり。陽気で温かく、エネルギッシュで活発（男性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18 Qwen-TTS-Realtime: qwen-tts-realtime, qwen-tts-realtime-latest, qwen-tts-realtime-2025-07-15
`Chelsie`	音声名: Chelsie 説明: 二次元のバーチャル彼女（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27 Qwen-TTS-Realtime: qwen-tts-realtime, qwen-tts-realtime-latest, qwen-tts-realtime-2025-07-15
`Momo`	音声名: Momo 説明: 遊び心があり悪ふざけ好きで、気分を盛り上げてくれる（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Vivian`	音声名: Vivian 説明: 自信に満ちて可愛らしく、少し気の強い（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Moon`	音声名: Moon 説明: 大胆でハンサムな月白という名の男性（男性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Maia`	音声名: Maia 説明: 知性と優しさを兼ね備えた（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Kai`	音声名: Kai 説明: 耳に心地よいオーディオスパ（男性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Nofish`	音声名: Nofish 説明: そり舌音が発音できないデザイナー（男性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18
`Bella`	音声名: Bella 説明: 酔っても決して喧嘩を売らない女の子（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Jennifer`	音声名: Jennifer 説明: プレミアムで映画のような品質のアメリカ英語女性音声（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18
`Ryan`	音声名: Ryan 説明: リズム感に溢れ、ドラマチックな表現力豊かで、本物らしさと緊張感のバランスが取れている（男性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18
`Katerina`	音声名: Katerina 説明: 豊かで記憶に残るリズムを持つ成熟した女性の音声（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18
`Aiden`	音声名: Aiden 説明: 料理上手なアメリカ英語の若い男性（男性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Eldric Sage`	音声名: Eldric Sage 説明: 落ち着いて賢明な老人 — 松の木のように風雪に耐え、鏡のように明晰（男性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Mia`	音声名: Mia 説明: 春の水のように優しく、新雪のように従順（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Mochi`	音声名: Mochi 説明: 賢くて機知に富んだ若者 — 子供のような無邪気さを保ちつつ、知恵が光る（男性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Bellona`	音声名: Bellona 説明: 力強く明瞭な音声でキャラクターに命を吹き込み、血を沸騰させるほど感動的。英雄的な壮大さと完璧な発音で、人間の感情表現の全範囲を捉える。	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Vincent`	音声名: Vincent 説明: 独特のハスキーで煙たいような音声 — たった一言で軍勢や英雄譚を想起させる（男性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Bunny`	音声名: Bunny 説明: 「かわいさ」が溢れる女の子（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Neil`	音声名: Neil 説明: 平坦なベースラインのイントネーションで、正確かつ明瞭な発音 — 最もプロフェッショナルなニュースキャスター（男性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Elias`	音声名: Elias 説明: 学術的な厳密さを保ちながらストーリーテリング技法を用い、複雑な知識を消化しやすい学習モジュールに変える（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18
`Arthur`	音声名: Arthur 説明: 時間とタバコの煙に染まった素朴で地味な音声 — ゆっくりと村の物語や珍事を語る（男性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Nini`	音声名: Nini 説明: 甘酒のような柔らかく甘えん坊な音声 — 「お兄ちゃん」と伸ばして呼ぶ声が骨の髄まで甘く溶かす（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Seren`	音声名: Seren 説明: 優しく癒し系の音声で、より速く眠りにつけるようにします。おやすみなさい、よい夢を（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Pip`	音声名: Pip 説明: 子供のような驚きに満ちた遊び心と悪ふざけ好きな男の子 — これはあなたの記憶の中のシンちゃんでしょうか？（男性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Stella`	音声名: Stella 説明: 通常は甘ったるく、ぼーっとした少女のような声ですが、「月にかわってお仕置きよ！」と叫ぶ瞬間、揺るぎない愛と正義を放ちます（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Instruct-Flash-Realtime: qwen3-tts-instruct-flash-realtime, qwen3-tts-instruct-flash-realtime-2026-01-22 Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Bodega`	音声名: Bodega 説明: 情熱的なスペイン人男性（男性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Sonrisa`	音声名: Sonrisa 説明: 明るく社交的なラテンアメリカ人女性（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Alek`	音声名: Alek 説明: ロシアの精神のように冷たく、ウールのコートの裏地のように温かい（男性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Dolce`	音声名: Dolce 説明: のんびりとしたイタリア人男性（男性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Sohee`	音声名: Sohee 説明: 温かく明るく、感情豊かな韓国のオヌニ（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Ono Anna`	音声名: Ono Anna 説明: 賢くて活発な幼なじみ（女性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Lenn`	音声名: Lenn 説明: 心は合理的で、細部は反骨的 — スーツを着てポストパンクを聴くドイツの若者	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Emilien`	音声名: Emilien 説明: ロマンチックなフランス人のお兄さん（男性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Andre`	音声名: Andre 説明: 磁性的で自然、安定した男性音声	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Radio Gol`	音声名: Radio Gol 説明: サッカー詩人 Radio Gol！今日から私の名前を使ってサッカー実況します（男性）	中国語（標準語）、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27
`Jada`	音声名: 上海 - Jada 説明: 早口でエネルギッシュな上海のおばさん（女性）	上海語、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18
`Dylan`	音声名: 北京 - Dylan 説明: 北京の胡同で育った若者（男性）	北京弁、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18
`Li`	音声名: 南京 - Li 説明: 忍耐強いヨガインストラクター（男性）	南京弁、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18
`Marcus`	音声名: 陝西 - Marcus 説明: 顔が広く寡黙、誠実で声が低く、本物の陝西風味（男性）	陝西弁、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18
`Roy`	音声名: 閩南語 - Roy 説明: ユーモラスで率直、活発な台湾の男性（男性）	閩南語、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18
`Peter`	音声名: 天津 - Peter 説明: 天津風の漫才、専門のボケ役（男性）	天津弁、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18
`Sunny`	音声名: 四川 - Sunny 説明: 心まで溶かすほど甘い四川の女の子（女性）	四川弁、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18
`Eric`	音声名: 四川 - Eric 説明: 成都出身の四川人で、日常の中で際立っている（男性）	四川弁、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18
`Rocky`	音声名: 広東語 - Rocky 説明: ユーモラスで機知に富んだライブチャット担当の阿強（男性）	広東語、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18
`Kiki`	音声名: 広東語 - Kiki 説明: 甘い香港の親友の女の子（女性）	広東語、英語、フランス語、ドイツ語、ロシア語、イタリア語、スペイン語、ポルトガル語、日本語、韓国語	Qwen3-TTS-Flash-Realtime: qwen3-tts-flash-realtime, qwen3-tts-flash-realtime-2025-11-27, qwen3-tts-flash-realtime-2025-09-18