All Products
Search
Document Center

Alibaba Cloud Model Studio:Qwen voice design API reference

Last Updated:Mar 15, 2026

Create custom voices from text descriptions specifying gender, age, tone, and pace. The API returns a reusable voice name and preview audio clip. Use the voice name with the Speech synthesis - Qwen or Real-time speech synthesis - Qwen API.

Important

Voice creation and synthesis are separate steps. The target_model you specify when creating a voice must match the model you use for synthesis, or synthesis will fail.

Prerequisites

  1. Get an API key and store it as the DASHSCOPE_API_KEY environment variable.

  2. Install the latest DashScope SDK (SDK examples only).

How it works

image
  1. Write a voice description (voice_prompt) and preview text (preview_text).

  2. Submit a Create voice request with your chosen target_model.

  3. The API returns a voice name and Base64-encoded preview audio.

  4. Listen to the preview. If satisfied, use the voice name with the speech synthesis API. Otherwise, create a new voice.

Note The synthesis model in step 4 must be the same target_model from step 2.

Supported models

Voice design requires two models: a design model and a target speech synthesis model.

Model Value Use with
Voice design model qwen-voice-design All voice design operations (fixed value)
Real-time synthesis target qwen3-tts-vd-realtime-2026-01-15 Real-time speech synthesis - Qwen
Real-time synthesis target (earlier version) qwen3-tts-vd-realtime-2025-12-16 Real-time speech synthesis - Qwen
Non-real-time synthesis target qwen3-tts-vd-2026-01-26 Speech synthesis - Qwen
Note Voice design synthesis models (qwen3-tts-vd-*) only support voices created through voice design. System voices (Chelsie, Serena, Ethan, Cherry) are not supported.

Language support

Supported languages for voice creation and speech synthesis:

Code Language
zh Chinese
en English
de German
it Italian
pt Portuguese
es Spanish
ja Japanese
ko Korean
fr French
ru Russian

The voice_prompt description text supports Chinese and English only. The language parameter must match the preview_text language.

Write effective voice descriptions

A voice description (voice_prompt) defines what voice to generate. Combine attributes (gender, age, tone, use case) for a distinctive voice.

Requirements and limitations

  • Maximum length: 2,048 characters.

  • Supported languages: Chinese and English only.

Description dimensions

Dimension Examples
Gender Male, female, neutral
Age Child (5--12), teenager (13--18), young adult (19--35), middle-aged (36--55), elderly (55+)
Pitch High, medium, low, high-pitched, low-pitched
Pace Fast, medium, slow, fast-paced, slow-paced
Emotion Cheerful, calm, gentle, serious, lively, composed, soothing
Characteristics Magnetic, crisp, hoarse, mellow, sweet, rich, powerful
Use case News broadcast, ad voice-over, audiobook, animation character, voice assistant, documentary narration

Principles for effective descriptions

  1. Be specific. Use concrete voice qualities like "deep," "crisp," or "fast-paced." Avoid subjective terms like "nice" or "normal."

  2. Combine multiple dimensions (gender, age, emotion, use case). Single-dimension descriptions like "female voice" are too broad.

  3. Be objective. Describe physical and perceptual features, not opinions ("high-pitched and energetic", not "my favorite voice").

  4. Be original. Describe voice qualities, not celebrity imitations (copyright risk, not supported).

  5. Be concise. Avoid synonyms and meaningless intensifiers ("a very, very great voice").

Examples

Good descriptions:

  • "A young, lively female voice with a fast pace and noticeable upward inflection, suitable for fashion product introductions." Combines age, personality, pace, intonation, and use case.

  • "A calm, middle-aged male voice with a slow pace and deep, magnetic tone, suitable for news or documentary narration." Defines gender, age, pace, vocal characteristics, and domain.

  • "A cute child's voice, around 8 years old, with a slightly childish tone, suitable for animation character voice-overs." Specifies age, vocal quality, and use case.

  • "A gentle, intellectual female voice, around 30 years old, with a calm tone, suitable for audiobook narration." Conveys emotion, style, age, and application clearly.

Ineffective descriptions:

Description Issue Improvement
"A nice voice" Too vague and subjective. "A young female voice with a clear vocal line and gentle tone."
"A voice like a certain celebrity" Celebrity imitation is not supported (copyright risk). "A mature, magnetic male voice with a calm pace."
"A very, very, very nice female voice" Redundant. Repetition does not improve results. "A female voice, 20--24 years old, with a light tone, lively pitch, and sweet quality."
"123456" Invalid input. Cannot be parsed as voice features. Provide a meaningful text description using the dimensions above.

API reference

All operations use the same endpoint and authentication. Specify the operation with the action parameter.

Common request details

Endpoint

Region URL
Chinese mainland POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
Singapore (International) POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
Note API keys differ by region (Chinese mainland vs. Singapore). Use the key matching your endpoint.

Request headers

Header Type Required Description
Authorization string Yes Bearer <your-api-key>
Content-Type string Yes application/json
Important

Use one account for all voice design and synthesis operations.

Create a voice

Create a custom voice from text and get a preview audio clip.

Request syntax

{
    "model": "qwen-voice-design",
    "input": {
        "action": "create",
        "target_model": "<target-synthesis-model>",
        "voice_prompt": "<voice-description>",
        "preview_text": "<text-for-preview-audio>",
        "preferred_name": "<keyword-for-voice-name>",
        "language": "<language-code>"
    },
    "parameters": {
        "sample_rate": 24000,
        "response_format": "wav"
    }
}
Important

model is the voice design model (always qwen-voice-design). target_model is the synthesis model for the created voice. Do not confuse them.

Request parameters

Parameter Type Default Required Description
model string -- Yes Voice design model. Fixed to qwen-voice-design.
action string -- Yes Operation type. Fixed to create.
target_model string -- Yes Synthesis model for the voice. Must match subsequent synthesis calls. Values: qwen3-tts-vd-realtime-2026-01-15 or qwen3-tts-vd-realtime-2025-12-16 (real-time), qwen3-tts-vd-2026-01-26 (non-real-time).
voice_prompt string -- Yes Voice description (max 2,048 chars, Chinese/English only). See Write effective voice descriptions.
preview_text string -- Yes Preview audio text (max 1,024 chars, supported languages only).
preferred_name string -- No Keyword for the voice name (alphanumeric/underscores, max 16 chars). Appears in the generated voice name. Example: announcerqwen-tts-vd-announcer-voice-20251201102800-a1b2.
language string zh No Language code for the generated voice. Must match preview_text. Values: zh, en, de, it, pt, es, ja, ko, fr, ru.
sample_rate int 24000 No Preview audio sample rate (Hz). Values: 8000, 16000, 24000, 48000.
response_format string wav No Preview audio format. Values: pcm, wav, mp3, opus.

Click to view a response example

{
    "output": {
        "preview_audio": {
            "data": "{base64_encoded_audio}",
            "sample_rate": 24000,
            "response_format": "wav"
        },
        "target_model": "qwen3-tts-vd-realtime-2026-01-15",
        "voice": "qwen-tts-vd-announcer-voice-20251201102800-a1b2"
    },
    "usage": {
        "count": 1
    },
    "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Response parameters

Parameter Type Description
voice string Generated voice name. Use as the voice parameter in synthesis API calls.
preview_audio.data string Base64-encoded preview audio.
preview_audio.sample_rate int Preview audio sample rate (request value or default: 24000).
preview_audio.response_format string Preview audio format (request value or default: wav).
target_model string Synthesis model for this voice.
usage.count int Voice creation count (always 1 for success). Cost: $0.2.
request_id string Request ID for troubleshooting.

List voices

Returns a paginated list of all voices created under your account.

Request syntax

{
    "model": "qwen-voice-design",
    "input": {
        "action": "list",
        "page_size": 10,
        "page_index": 0
    }
}

Request parameters

Parameter Type Default Required Description
model string -- Yes Fixed to qwen-voice-design.
action string -- Yes Fixed to list.
page_index integer 0 No Page number. Range: 0--200.
page_size integer 10 No Entries per page. Must be greater than 0.

Click to view a response example

{
    "output": {
        "page_index": 0,
        "page_size": 2,
        "total_count": 26,
        "voice_list": [
            {
                "gmt_create": "2025-12-10 17:04:54",
                "gmt_modified": "2025-12-10 17:04:54",
                "language": "zh",
                "preview_text": "Dear listeners, hello everyone. Welcome to today's program.",
                "target_model": "qwen3-tts-vd-realtime-2026-01-15",
                "voice": "qwen-tts-vd-announcer-voice-20251210170454-a1b2",
                "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, suitable for news broadcasting or documentary commentary."
            }
        ]
    },
    "usage": {},
    "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Response parameters

Parameter Type Description
page_index integer Current page number.
page_size integer Entries per page.
total_count integer Total number of voices.
voice_list[].voice string Voice name.
voice_list[].target_model string Speech synthesis model bound to this voice.
voice_list[].language string Language code.
voice_list[].voice_prompt string Voice description used during creation.
voice_list[].preview_text string Preview text used during creation.
voice_list[].gmt_create string Creation time.
voice_list[].gmt_modified string Last modification time.
request_id string Request ID.

Query a voice

Returns detailed information about a specific voice.

Request syntax

{
    "model": "qwen-voice-design",
    "input": {
        "action": "query",
        "voice": "<voice-name>"
    }
}

Request parameters

Parameter Type Default Required Description
model string -- Yes Fixed to qwen-voice-design.
action string -- Yes Fixed to query.
voice string -- Yes Voice name to query.

Click to view a response example

Response example (voice found)

{
    "output": {
        "gmt_create": "2025-12-10 14:54:09",
        "gmt_modified": "2025-12-10 17:47:48",
        "language": "zh",
        "preview_text": "Dear listeners, hello everyone.",
        "target_model": "qwen3-tts-vd-realtime-2026-01-15",
        "voice": "qwen-tts-vd-announcer-voice-20251210145409-a1b2",
        "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, suitable for news broadcasting or documentary commentary."
    },
    "usage": {},
    "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Response example (voice not found)

When the voice does not exist, the API returns HTTP 400 with a VoiceNotFound error:

{
    "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "code": "VoiceNotFound",
    "message": "Voice not found: qwen-tts-vd-announcer-voice-xxxx"
}

Response parameters

Parameter Type Description
voice string Voice name.
target_model string Speech synthesis model bound to this voice.
language string Language code.
voice_prompt string Voice description.
preview_text string Preview text.
gmt_create string Creation time.
gmt_modified string Last modification time.
request_id string Request ID.

Delete a voice

Deletes a voice and releases the corresponding quota.

Request syntax

{
    "model": "qwen-voice-design",
    "input": {
        "action": "delete",
        "voice": "<voice-name>"
    }
}

Request parameters

Parameter Type Default Required Description
model string -- Yes Fixed to qwen-voice-design.
action string -- Yes Fixed to delete.
voice string -- Yes Voice name to delete.

Click to view a response example

{
    "output": {
        "voice": "qwen-tts-vd-announcer-voice-20251210145409-a1b2"
    },
    "usage": {},
    "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Response parameters

Parameter Type Description
voice string Deleted voice name.
request_id string Request ID.

Sample code

Examples use the Singapore endpoint. For Chinese mainland, replace URLs:

  • HTTP: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

  • WebSocket: wss://dashscope.aliyuncs.com/api-ws/v1/realtime

Note API keys differ by region. Get one at Get an API key.

Create a voice and preview

cURL

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "create",
        "target_model": "qwen3-tts-vd-realtime-2026-01-15",
        "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, suitable for news broadcasting or documentary commentary.",
        "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
        "preferred_name": "announcer",
        "language": "en"
    },
    "parameters": {
        "sample_rate": 24000,
        "response_format": "wav"
    }
}'

Python

import requests
import base64
import os

def create_voice():
    """Create a custom voice and save the preview audio."""
    # Load API key from environment variable
    api_key = os.getenv("DASHSCOPE_API_KEY")
    if not api_key:
        print("Error: DASHSCOPE_API_KEY not set.")
        return None, None

    data = {
        "model": "qwen-voice-design",
        "input": {
            "action": "create",
            "target_model": "qwen3-tts-vd-realtime-2026-01-15",
            "voice_prompt": "A composed middle-aged male announcer with a deep, rich "
                           "and magnetic voice, a steady speaking speed and clear "
                           "articulation, suitable for news broadcasting or "
                           "documentary commentary.",
            "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
            "preferred_name": "announcer",
            "language": "en"
        },
        "parameters": {
            "sample_rate": 24000,
            "response_format": "wav"
        }
    }

    response = requests.post(
        "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization",
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        },
        json=data,
        timeout=60
    )

    if response.status_code == 200:
        result = response.json()
        voice_name = result["output"]["voice"]
        audio_bytes = base64.b64decode(result["output"]["preview_audio"]["data"])

        # Save preview audio
        filename = f"{voice_name}_preview.wav"
        with open(filename, "wb") as f:
            f.write(audio_bytes)

        print(f"Voice created: {voice_name}")
        print(f"Preview saved to: {filename}")
        return voice_name, filename
    else:
        print(f"Request failed ({response.status_code}): {response.text}")
        return None, None

if __name__ == "__main__":
    create_voice()

Java

Add the Gson dependency to your project:

Maven
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.13.1</version>
</dependency>
Gradle
// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Base64;

public class Main {
    public static void main(String[] args) {
        new Main().createVoice();
    }

    public void createVoice() {
        // Load API key from environment variable
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        String jsonBody = "{\n" +
                "    \"model\": \"qwen-voice-design\",\n" +
                "    \"input\": {\n" +
                "        \"action\": \"create\",\n" +
                "        \"target_model\": \"qwen3-tts-vd-realtime-2026-01-15\",\n" +
                "        \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, suitable for news broadcasting or documentary commentary.\",\n" +
                "        \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" +
                "        \"preferred_name\": \"announcer\",\n" +
                "        \"language\": \"en\"\n" +
                "    },\n" +
                "    \"parameters\": {\n" +
                "        \"sample_rate\": 24000,\n" +
                "        \"response_format\": \"wav\"\n" +
                "    }\n" +
                "}";

        HttpURLConnection connection = null;
        try {
            URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
            connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("POST");
            connection.setRequestProperty("Authorization", "Bearer " + apiKey);
            connection.setRequestProperty("Content-Type", "application/json");
            connection.setDoOutput(true);

            // Send request body
            try (OutputStream os = connection.getOutputStream()) {
                os.write(jsonBody.getBytes("UTF-8"));
                os.flush();
            }

            int responseCode = connection.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK) {
                StringBuilder response = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                    String line;
                    while ((line = br.readLine()) != null) {
                        response.append(line.trim());
                    }
                }

                // Parse response and save preview audio
                JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
                JsonObject output = jsonResponse.getAsJsonObject("output");
                String voiceName = output.get("voice").getAsString();
                String base64Audio = output.getAsJsonObject("preview_audio").get("data").getAsString();

                byte[] audioBytes = Base64.getDecoder().decode(base64Audio);
                String filename = voiceName + "_preview.wav";
                try (FileOutputStream fos = new FileOutputStream(filename)) {
                    fos.write(audioBytes);
                }

                System.out.println("Voice created: " + voiceName);
                System.out.println("Preview saved to: " + filename);
            } else {
                StringBuilder error = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                    String line;
                    while ((line = br.readLine()) != null) {
                        error.append(line.trim());
                    }
                }
                System.out.println("Request failed (" + responseCode + "): " + error);
            }
        } catch (Exception e) {
            System.err.println("Error: " + e.getMessage());
            e.printStackTrace();
        } finally {
            if (connection != null) connection.disconnect();
        }
    }
}

Use a custom voice for speech synthesis

Pass the returned voice name to the synthesis API. The synthesis model must match the design target_model.

Bidirectional streaming (real-time)

Uses qwen3-tts-vd-realtime-2026-01-15. See Real-time speech synthesis - Qwen for details.

Python
# pyaudio installation:
#   macOS:   brew install portaudio && pip install pyaudio
#   Ubuntu:  sudo apt-get install python3-pyaudio  (or pip install pyaudio)
#   CentOS:  sudo yum install -y portaudio portaudio-devel && pip install pyaudio
#   Windows: python -m pip install pyaudio

import pyaudio
import os
import base64
import threading
import time
import dashscope
from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat

TEXT_TO_SYNTHESIZE = [
    "Right? I really like this kind of supermarket,",
    "especially during the New Year.",
    "Going to the supermarket",
    "just makes me feel",
    "super, super happy!",
    "I want to buy so many things!"
]

def init_dashscope_api_key():
    """Load the API key from environment variable."""
    dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

class MyCallback(QwenTtsRealtimeCallback):
    """Callback for streaming TTS playback."""
    def __init__(self):
        self.complete_event = threading.Event()
        self._player = pyaudio.PyAudio()
        self._stream = self._player.open(
            format=pyaudio.paInt16, channels=1, rate=24000, output=True
        )

    def on_open(self) -> None:
        print("[TTS] Connection established")

    def on_close(self, close_status_code, close_msg) -> None:
        self._stream.stop_stream()
        self._stream.close()
        self._player.terminate()
        print(f"[TTS] Connection closed, code={close_status_code}, msg={close_msg}")

    def on_event(self, response: dict) -> None:
        event_type = response.get("type", "")
        if event_type == "session.created":
            print(f'[TTS] Session started: {response["session"]["id"]}')
        elif event_type == "response.audio.delta":
            audio_data = base64.b64decode(response["delta"])
            self._stream.write(audio_data)
        elif event_type == "response.done":
            print(f"[TTS] Response complete, ID: {qwen_tts_realtime.get_last_response_id()}")
        elif event_type == "session.finished":
            print("[TTS] Session finished")
            self.complete_event.set()

    def wait_for_finished(self):
        self.complete_event.wait()

if __name__ == "__main__":
    init_dashscope_api_key()

    callback = MyCallback()
    qwen_tts_realtime = QwenTtsRealtime(
        model="qwen3-tts-vd-realtime-2026-01-15",
        callback=callback,
        url="wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime"
    )
    qwen_tts_realtime.connect()

    qwen_tts_realtime.update_session(
        voice="<your-voice-name>",  # Replace with your voice design voice name
        response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
        mode="server_commit"
    )

    for text_chunk in TEXT_TO_SYNTHESIZE:
        print(f"[Sending text]: {text_chunk}")
        qwen_tts_realtime.append_text(text_chunk)
        time.sleep(0.1)

    qwen_tts_realtime.finish()
    callback.wait_for_finished()

    print(f"[Metric] session_id={qwen_tts_realtime.get_session_id()}, "
          f"first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s")
Java
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;

import javax.sound.sampled.*;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class Main {
    private static String[] textToSynthesize = {
            "Right? I really like this kind of supermarket,",
            "especially during the New Year.",
            "Going to the supermarket",
            "just makes me feel",
            "super, super happy!",
            "I want to buy so many things!"
    };

    // Real-time PCM audio player
    public static class RealtimePcmPlayer {
        private int sampleRate;
        private SourceDataLine line;
        private Thread decoderThread;
        private Thread playerThread;
        private AtomicBoolean stopped = new AtomicBoolean(false);
        private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
        private Queue<byte[]> rawAudioBuffer = new ConcurrentLinkedQueue<>();

        public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
            this.sampleRate = sampleRate;
            AudioFormat audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
            DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
            line = (SourceDataLine) AudioSystem.getLine(info);
            line.open(audioFormat);
            line.start();

            decoderThread = new Thread(() -> {
                while (!stopped.get()) {
                    String b64Audio = b64AudioBuffer.poll();
                    if (b64Audio != null) {
                        rawAudioBuffer.add(Base64.getDecoder().decode(b64Audio));
                    } else {
                        try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); }
                    }
                }
            });

            playerThread = new Thread(() -> {
                while (!stopped.get()) {
                    byte[] rawAudio = rawAudioBuffer.poll();
                    if (rawAudio != null) {
                        int bytesWritten = 0;
                        while (bytesWritten < rawAudio.length) {
                            bytesWritten += line.write(rawAudio, bytesWritten, rawAudio.length - bytesWritten);
                        }
                        int audioLength = rawAudio.length / (this.sampleRate * 2 / 1000);
                        try { Thread.sleep(audioLength - 10); } catch (InterruptedException e) { throw new RuntimeException(e); }
                    } else {
                        try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); }
                    }
                }
            });

            decoderThread.start();
            playerThread.start();
        }

        public void write(String b64Audio) { b64AudioBuffer.add(b64Audio); }

        public void waitForComplete() throws InterruptedException {
            while (!b64AudioBuffer.isEmpty() || !rawAudioBuffer.isEmpty()) { Thread.sleep(100); }
            line.drain();
        }

        public void shutdown() throws InterruptedException {
            stopped.set(true);
            decoderThread.join();
            playerThread.join();
            if (line != null && line.isRunning()) { line.drain(); line.close(); }
        }
    }

    public static void main(String[] args) throws Exception {
        QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
                .model("qwen3-tts-vd-realtime-2026-01-15")
                .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                .apikey(System.getenv("DASHSCOPE_API_KEY"))
                .build();

        AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
        RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);

        QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
            @Override
            public void onOpen() { }

            @Override
            public void onEvent(JsonObject message) {
                String type = message.get("type").getAsString();
                switch (type) {
                    case "response.audio.delta":
                        audioPlayer.write(message.get("delta").getAsString());
                        break;
                    case "session.finished":
                        completeLatch.get().countDown();
                        break;
                }
            }

            @Override
            public void onClose(int code, String reason) { }
        });

        try {
            qwenTtsRealtime.connect();
        } catch (NoApiKeyException e) {
            throw new RuntimeException(e);
        }

        QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
                .voice("<your-voice-name>")  // Replace with your voice design voice name
                .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
                .mode("server_commit")
                .build();
        qwenTtsRealtime.updateSession(config);

        for (String text : textToSynthesize) {
            qwenTtsRealtime.appendText(text);
            Thread.sleep(100);
        }
        qwenTtsRealtime.finish();
        completeLatch.get().await();

        audioPlayer.waitForComplete();
        audioPlayer.shutdown();
        System.exit(0);
    }
}

Non-streaming and unidirectional streaming

Uses qwen3-tts-vd-2026-01-26. See Speech synthesis - Qwen for details.

The setup follows the same pattern as above: create a voice first, then pass the returned voice name to the speech synthesis API with the matching model. For non-streaming and unidirectional streaming code examples, see Speech synthesis - Qwen.

Query voices

cURL

# Query a specific voice
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "query",
        "voice": "<your-voice-name>"
    }
}'
# List all voices (paginated)
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "list",
        "page_size": 10,
        "page_index": 0
    }
}'

Python

import requests
import os

def query_voice(voice_name):
    """Get details for a specific voice."""
    api_key = os.getenv("DASHSCOPE_API_KEY")

    response = requests.post(
        "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization",
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        },
        json={
            "model": "qwen-voice-design",
            "input": {
                "action": "query",
                "voice": voice_name
            }
        }
    )

    if response.status_code == 200:
        result = response.json()
        print(f"Voice: {result['output']['voice']}")
        print(f"Model: {result['output']['target_model']}")
        print(f"Created: {result['output']['gmt_create']}")
        return result
    else:
        error = response.json()
        if error.get("code") == "VoiceNotFound":
            print(f"Voice not found: {voice_name}")
        else:
            print(f"Request failed ({response.status_code}): {response.text}")
        return None

def list_voices(page_index=0, page_size=10):
    """List all voices with pagination."""
    api_key = os.getenv("DASHSCOPE_API_KEY")

    response = requests.post(
        "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization",
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        },
        json={
            "model": "qwen-voice-design",
            "input": {
                "action": "list",
                "page_size": page_size,
                "page_index": page_index
            }
        }
    )

    if response.status_code == 200:
        result = response.json()
        total = result["output"]["total_count"]
        voices = result["output"]["voice_list"]
        print(f"Total voices: {total}")
        for v in voices:
            print(f"  - {v['voice']} ({v['language']}, {v['target_model']})")
        return result
    else:
        print(f"Request failed ({response.status_code}): {response.text}")
        return None

if __name__ == "__main__":
    list_voices()

Java

Query a specific voice:

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;

public class Main {

    public static void main(String[] args) {
        Main example = new Main();
        String voiceName = "<your-voice-name>";  // Replace with the actual voice name
        System.out.println("Querying voice: " + voiceName);
        example.queryVoice(voiceName);
    }

    public void queryVoice(String voiceName) {
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        String jsonBody = "{\n" +
                "    \"model\": \"qwen-voice-design\",\n" +
                "    \"input\": {\n" +
                "        \"action\": \"query\",\n" +
                "        \"voice\": \"" + voiceName + "\"\n" +
                "    }\n" +
                "}";

        HttpURLConnection connection = null;
        try {
            URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
            connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("POST");
            connection.setRequestProperty("Authorization", "Bearer " + apiKey);
            connection.setRequestProperty("Content-Type", "application/json");
            connection.setDoOutput(true);
            connection.setDoInput(true);

            try (OutputStream os = connection.getOutputStream()) {
                byte[] input = jsonBody.getBytes("UTF-8");
                os.write(input, 0, input.length);
                os.flush();
            }

            int responseCode = connection.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK) {
                StringBuilder response = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        response.append(responseLine.trim());
                    }
                }

                JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();

                if (jsonResponse.has("code") && "VoiceNotFound".equals(jsonResponse.get("code").getAsString())) {
                    String errorMessage = jsonResponse.has("message") ?
                            jsonResponse.get("message").getAsString() : "Voice not found";
                    System.out.println("Voice not found: " + voiceName);
                    System.out.println("Error message: " + errorMessage);
                    return;
                }

                JsonObject outputObj = jsonResponse.getAsJsonObject("output");
                System.out.println("Successfully queried voice information:");
                System.out.println("  Voice Name: " + outputObj.get("voice").getAsString());
                System.out.println("  Creation Time: " + outputObj.get("gmt_create").getAsString());
                System.out.println("  Modification Time: " + outputObj.get("gmt_modified").getAsString());
                System.out.println("  Language: " + outputObj.get("language").getAsString());
                System.out.println("  Preview Text: " + outputObj.get("preview_text").getAsString());
                System.out.println("  Model: " + outputObj.get("target_model").getAsString());
                System.out.println("  Voice Description: " + outputObj.get("voice_prompt").getAsString());
            } else {
                StringBuilder errorResponse = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        errorResponse.append(responseLine.trim());
                    }
                }
                System.out.println("Request failed with status code: " + responseCode);
                System.out.println("Error response: " + errorResponse.toString());
            }
        } catch (Exception e) {
            System.err.println("An error occurred during the request: " + e.getMessage());
            e.printStackTrace();
        } finally {
            if (connection != null) {
                connection.disconnect();
            }
        }
    }
}

List all voices (paginated):

import com.google.gson.Gson;
import com.google.gson.JsonArray;
import com.google.gson.JsonObject;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;

public class Main {
    public static void main(String[] args) {
        String apiKey = System.getenv("DASHSCOPE_API_KEY");
        String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";

        String jsonPayload =
                "{"
                        + "\"model\": \"qwen-voice-design\","
                        + "\"input\": {"
                        +     "\"action\": \"list\","
                        +     "\"page_size\": 10,"
                        +     "\"page_index\": 0"
                        + "}"
                        + "}";

        try {
            HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
            con.setRequestMethod("POST");
            con.setRequestProperty("Authorization", "Bearer " + apiKey);
            con.setRequestProperty("Content-Type", "application/json");
            con.setDoOutput(true);

            try (OutputStream os = con.getOutputStream()) {
                os.write(jsonPayload.getBytes("UTF-8"));
            }

            int status = con.getResponseCode();
            BufferedReader br = new BufferedReader(new InputStreamReader(
                    status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8"));

            StringBuilder response = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                response.append(line);
            }
            br.close();

            System.out.println("HTTP Status Code: " + status);

            if (status == 200) {
                Gson gson = new Gson();
                JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
                JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list");

                System.out.println("\nQueried voice list:");
                for (int i = 0; i < voiceList.size(); i++) {
                    JsonObject voiceItem = voiceList.get(i).getAsJsonObject();
                    String voice = voiceItem.get("voice").getAsString();
                    String gmtCreate = voiceItem.get("gmt_create").getAsString();
                    String targetModel = voiceItem.get("target_model").getAsString();
                    System.out.printf("- Voice: %s  Creation Time: %s  Model: %s\n",
                            voice, gmtCreate, targetModel);
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Delete a voice

cURL

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "delete",
        "voice": "<your-voice-name>"
    }
}'

Python

import requests
import os

def delete_voice(voice_name):
    """Delete a voice and release the quota."""
    api_key = os.getenv("DASHSCOPE_API_KEY")

    response = requests.post(
        "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization",
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        },
        json={
            "model": "qwen-voice-design",
            "input": {
                "action": "delete",
                "voice": voice_name
            }
        }
    )

    if response.status_code == 200:
        print(f"Deleted: {voice_name}")
        return True
    else:
        print(f"Request failed ({response.status_code}): {response.text}")
        return False

if __name__ == "__main__":
    delete_voice("<your-voice-name>")

Java

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;

public class Main {

    public static void main(String[] args) {
        Main example = new Main();
        String voiceName = "<your-voice-name>";  // Replace with the actual voice name
        System.out.println("Deleting voice: " + voiceName);
        example.deleteVoice(voiceName);
    }

    public void deleteVoice(String voiceName) {
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        String jsonBody = "{\n" +
                "    \"model\": \"qwen-voice-design\",\n" +
                "    \"input\": {\n" +
                "        \"action\": \"delete\",\n" +
                "        \"voice\": \"" + voiceName + "\"\n" +
                "    }\n" +
                "}";

        HttpURLConnection connection = null;
        try {
            URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
            connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("POST");
            connection.setRequestProperty("Authorization", "Bearer " + apiKey);
            connection.setRequestProperty("Content-Type", "application/json");
            connection.setDoOutput(true);
            connection.setDoInput(true);

            try (OutputStream os = connection.getOutputStream()) {
                byte[] input = jsonBody.getBytes("UTF-8");
                os.write(input, 0, input.length);
                os.flush();
            }

            int responseCode = connection.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK) {
                StringBuilder response = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        response.append(responseLine.trim());
                    }
                }

                JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();

                if (jsonResponse.has("code") && jsonResponse.get("code").getAsString().contains("VoiceNotFound")) {
                    String errorMessage = jsonResponse.has("message") ?
                            jsonResponse.get("message").getAsString() : "Voice not found";
                    System.out.println("Voice does not exist: " + voiceName);
                    System.out.println("Error message: " + errorMessage);
                } else if (jsonResponse.has("usage")) {
                    System.out.println("Voice deleted successfully: " + voiceName);
                    String requestId = jsonResponse.has("request_id") ?
                            jsonResponse.get("request_id").getAsString() : "N/A";
                    System.out.println("Request ID: " + requestId);
                } else {
                    System.out.println("Unexpected response format: " + response.toString());
                }
            } else {
                StringBuilder errorResponse = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        errorResponse.append(responseLine.trim());
                    }
                }
                System.out.println("Request failed with status code: " + responseCode);
                System.out.println("Error response: " + errorResponse.toString());
            }
        } catch (Exception e) {
            System.err.println("An error occurred during the request: " + e.getMessage());
            e.printStackTrace();
        } finally {
            if (connection != null) {
                connection.disconnect();
            }
        }
    }
}

Voice quota and automatic cleanup

  • Limit: 1,000 voices per account. Check count via total_count in List voices response.

  • Automatic cleanup: Unused voices (past year) are deleted.

Billing

Voice design and speech synthesis are billed separately.

Voice creation: $0.2 per voice (failed creations free).

Note Free quota (Singapore only): 10 voices within 90 days of activation. Failed creations don't count. Deletions don't restore quota. Standard pricing applies after exhaustion or 90 days.

Speech synthesis with custom voices: Billed per character. For pricing details, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.

Error handling

Failed requests return code and message fields. See Error messages for the full reference.

Common errors for voice design:

HTTP status Error code Cause Resolution
400 VoiceNotFound The specified voice does not exist. Verify via List voices or Query a voice API.

Related topics