Qwen voice design API reference - Alibaba Cloud Model Studio

Prerequisites

Get an API key and store it as the DASHSCOPE_API_KEY environment variable.
Install the latest DashScope SDK (SDK examples only).

How it works

Write a voice description (voice_prompt) and preview text (preview_text).
Submit a Create voice request with your chosen target_model.
The API returns a voice name and Base64-encoded preview audio.
Listen to the preview. If satisfied, use the voice name with the speech synthesis API. Otherwise, create a new voice.

Note The synthesis model in step 4 must be the same target_model from step 2.

Supported models

Voice design requires two models: a design model and a target speech synthesis model.

Model	Value	Use with
Voice design model	`qwen-voice-design`	All voice design operations (fixed value)
Real-time synthesis target	`qwen3-tts-vd-realtime-2026-01-15`	Real-time speech synthesis - Qwen
Real-time synthesis target (earlier version)	`qwen3-tts-vd-realtime-2025-12-16`	Real-time speech synthesis - Qwen
Non-real-time synthesis target	`qwen3-tts-vd-2026-01-26`	Speech synthesis - Qwen

Note Voice design synthesis models (qwen3-tts-vd-*) only support voices created through voice design. System voices (Chelsie, Serena, Ethan, Cherry) are not supported.

Language support

Supported languages for voice creation and speech synthesis:

Code	Language
`zh`	Chinese
`en`	English
`de`	German
`it`	Italian
`pt`	Portuguese
`es`	Spanish
`ja`	Japanese
`ko`	Korean
`fr`	French
`ru`	Russian

The voice_prompt description text supports Chinese and English only. The language parameter must match the preview_text language.

Write effective voice descriptions

A voice description (voice_prompt) defines what voice to generate. Combine attributes (gender, age, tone, use case) for a distinctive voice.

Requirements and limitations

Maximum length: 2,048 characters.
Supported languages: Chinese and English only.

Description dimensions

Dimension	Examples
Gender	Male, female, neutral
Age	Child (5--12), teenager (13--18), young adult (19--35), middle-aged (36--55), elderly (55+)
Pitch	High, medium, low, high-pitched, low-pitched
Pace	Fast, medium, slow, fast-paced, slow-paced
Emotion	Cheerful, calm, gentle, serious, lively, composed, soothing
Characteristics	Magnetic, crisp, hoarse, mellow, sweet, rich, powerful
Use case	News broadcast, ad voice-over, audiobook, animation character, voice assistant, documentary narration

Principles for effective descriptions

Be specific. Use concrete voice qualities like "deep," "crisp," or "fast-paced." Avoid subjective terms like "nice" or "normal."
Combine multiple dimensions (gender, age, emotion, use case). Single-dimension descriptions like "female voice" are too broad.
Be objective. Describe physical and perceptual features, not opinions ("high-pitched and energetic", not "my favorite voice").
Be original. Describe voice qualities, not celebrity imitations (copyright risk, not supported).
Be concise. Avoid synonyms and meaningless intensifiers ("a very, very great voice").

Examples

Good descriptions:

"A young, lively female voice with a fast pace and noticeable upward inflection, suitable for fashion product introductions." Combines age, personality, pace, intonation, and use case.
"A calm, middle-aged male voice with a slow pace and deep, magnetic tone, suitable for news or documentary narration." Defines gender, age, pace, vocal characteristics, and domain.
"A cute child's voice, around 8 years old, with a slightly childish tone, suitable for animation character voice-overs." Specifies age, vocal quality, and use case.
"A gentle, intellectual female voice, around 30 years old, with a calm tone, suitable for audiobook narration." Conveys emotion, style, age, and application clearly.

Ineffective descriptions:

Description	Issue	Improvement
"A nice voice"	Too vague and subjective.	"A young female voice with a clear vocal line and gentle tone."
"A voice like a certain celebrity"	Celebrity imitation is not supported (copyright risk).	"A mature, magnetic male voice with a calm pace."
"A very, very, very nice female voice"	Redundant. Repetition does not improve results.	"A female voice, 20--24 years old, with a light tone, lively pitch, and sweet quality."
"123456"	Invalid input. Cannot be parsed as voice features.	Provide a meaningful text description using the dimensions above.

API reference

All operations use the same endpoint and authentication. Specify the operation with the action parameter.

Common request details

Endpoint

Region	URL
Chinese mainland	`POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization`
Singapore (International)	`POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization`

Note API keys differ by region (Chinese mainland vs. Singapore). Use the key matching your endpoint.

Request headers

Header	Type	Required	Description
Authorization	string	Yes	`Bearer <your-api-key>`
Content-Type	string	Yes	`application/json`

Important

Use one account for all voice design and synthesis operations.

Create a voice

Create a custom voice from text and get a preview audio clip.

Request syntax

{
    "model": "qwen-voice-design",
    "input": {
        "action": "create",
        "target_model": "<target-synthesis-model>",
        "voice_prompt": "<voice-description>",
        "preview_text": "<text-for-preview-audio>",
        "preferred_name": "<keyword-for-voice-name>",
        "language": "<language-code>"
    },
    "parameters": {
        "sample_rate": 24000,
        "response_format": "wav"
    }
}

Important

model is the voice design model (always qwen-voice-design). target_model is the synthesis model for the created voice. Do not confuse them.

Request parameters

Parameter	Type	Default	Required	Description
model	string	--	Yes	Voice design model. Fixed to `qwen-voice-design`.
action	string	--	Yes	Operation type. Fixed to `create`.
target_model	string	--	Yes	Synthesis model for the voice. Must match subsequent synthesis calls. Values: `qwen3-tts-vd-realtime-2026-01-15` or `qwen3-tts-vd-realtime-2025-12-16` (real-time), `qwen3-tts-vd-2026-01-26` (non-real-time).
voice_prompt	string	--	Yes	Voice description (max 2,048 chars, Chinese/English only). See Write effective voice descriptions.
preview_text	string	--	Yes	Preview audio text (max 1,024 chars, supported languages only).
preferred_name	string	--	No	Keyword for the voice name (alphanumeric/underscores, max 16 chars). Appears in the generated voice name. Example: `announcer` → `qwen-tts-vd-announcer-voice-20251201102800-a1b2`.
language	string	`zh`	No	Language code for the generated voice. Must match `preview_text`. Values: `zh`, `en`, `de`, `it`, `pt`, `es`, `ja`, `ko`, `fr`, `ru`.
sample_rate	int	24000	No	Preview audio sample rate (Hz). Values: `8000`, `16000`, `24000`, `48000`.
response_format	string	`wav`	No	Preview audio format. Values: `pcm`, `wav`, `mp3`, `opus`.

Click to view a response example

{
    "output": {
        "preview_audio": {
            "data": "{base64_encoded_audio}",
            "sample_rate": 24000,
            "response_format": "wav"
        },
        "target_model": "qwen3-tts-vd-realtime-2026-01-15",
        "voice": "qwen-tts-vd-announcer-voice-20251201102800-a1b2"
    },
    "usage": {
        "count": 1
    },
    "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Response parameters

Parameter	Type	Description
voice	string	Generated voice name. Use as the `voice` parameter in synthesis API calls.
preview_audio.data	string	Base64-encoded preview audio.
preview_audio.sample_rate	int	Preview audio sample rate (request value or default: 24000).
preview_audio.response_format	string	Preview audio format (request value or default: `wav`).
target_model	string	Synthesis model for this voice.
usage.count	int	Voice creation count (always `1` for success). Cost: $0.2.
request_id	string	Request ID for troubleshooting.

List voices

Returns a paginated list of all voices created under your account.

Request syntax

{
    "model": "qwen-voice-design",
    "input": {
        "action": "list",
        "page_size": 10,
        "page_index": 0
    }
}

Request parameters

Parameter	Type	Default	Required	Description
model	string	--	Yes	Fixed to `qwen-voice-design`.
action	string	--	Yes	Fixed to `list`.
page_index	integer	0	No	Page number. Range: 0--200.
page_size	integer	10	No	Entries per page. Must be greater than 0.

Click to view a response example

{
    "output": {
        "page_index": 0,
        "page_size": 2,
        "total_count": 26,
        "voice_list": [
            {
                "gmt_create": "2025-12-10 17:04:54",
                "gmt_modified": "2025-12-10 17:04:54",
                "language": "zh",
                "preview_text": "Dear listeners, hello everyone. Welcome to today's program.",
                "target_model": "qwen3-tts-vd-realtime-2026-01-15",
                "voice": "qwen-tts-vd-announcer-voice-20251210170454-a1b2",
                "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, suitable for news broadcasting or documentary commentary."
            }
        ]
    },
    "usage": {},
    "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Response parameters

Parameter	Type	Description
page_index	integer	Current page number.
page_size	integer	Entries per page.
total_count	integer	Total number of voices.
voice_list[].voice	string	Voice name.
voice_list[].target_model	string	Speech synthesis model bound to this voice.
voice_list[].language	string	Language code.
voice_list[].voice_prompt	string	Voice description used during creation.
voice_list[].preview_text	string	Preview text used during creation.
voice_list[].gmt_create	string	Creation time.
voice_list[].gmt_modified	string	Last modification time.
request_id	string	Request ID.

Query a voice

Returns detailed information about a specific voice.

Request syntax

{
    "model": "qwen-voice-design",
    "input": {
        "action": "query",
        "voice": "<voice-name>"
    }
}

Request parameters

Parameter	Type	Default	Required	Description
model	string	--	Yes	Fixed to `qwen-voice-design`.
action	string	--	Yes	Fixed to `query`.
voice	string	--	Yes	Voice name to query.

Click to view a response example

Response example (voice found)

{
    "output": {
        "gmt_create": "2025-12-10 14:54:09",
        "gmt_modified": "2025-12-10 17:47:48",
        "language": "zh",
        "preview_text": "Dear listeners, hello everyone.",
        "target_model": "qwen3-tts-vd-realtime-2026-01-15",
        "voice": "qwen-tts-vd-announcer-voice-20251210145409-a1b2",
        "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, suitable for news broadcasting or documentary commentary."
    },
    "usage": {},
    "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Response example (voice not found)

When the voice does not exist, the API returns HTTP 400 with a VoiceNotFound error:

{
    "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx",
    "code": "VoiceNotFound",
    "message": "Voice not found: qwen-tts-vd-announcer-voice-xxxx"
}

Response parameters

Parameter	Type	Description
voice	string	Voice name.
target_model	string	Speech synthesis model bound to this voice.
language	string	Language code.
voice_prompt	string	Voice description.
preview_text	string	Preview text.
gmt_create	string	Creation time.
gmt_modified	string	Last modification time.
request_id	string	Request ID.

Delete a voice

Deletes a voice and releases the corresponding quota.

Request syntax

{
    "model": "qwen-voice-design",
    "input": {
        "action": "delete",
        "voice": "<voice-name>"
    }
}

Request parameters

Parameter	Type	Default	Required	Description
model	string	--	Yes	Fixed to `qwen-voice-design`.
action	string	--	Yes	Fixed to `delete`.
voice	string	--	Yes	Voice name to delete.

Click to view a response example

{
    "output": {
        "voice": "qwen-tts-vd-announcer-voice-20251210145409-a1b2"
    },
    "usage": {},
    "request_id": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
}

Response parameters

Parameter	Type	Description
voice	string	Deleted voice name.
request_id	string	Request ID.

Sample code

Examples use the Singapore endpoint. For Chinese mainland, replace URLs:

HTTP: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
WebSocket: wss://dashscope.aliyuncs.com/api-ws/v1/realtime

Note API keys differ by region. Get one at Get an API key.

Create a voice and preview

cURL

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "create",
        "target_model": "qwen3-tts-vd-realtime-2026-01-15",
        "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, suitable for news broadcasting or documentary commentary.",
        "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
        "preferred_name": "announcer",
        "language": "en"
    },
    "parameters": {
        "sample_rate": 24000,
        "response_format": "wav"
    }
}'

Python

import requests
import base64
import os

def create_voice():
    """Create a custom voice and save the preview audio."""
    # Load API key from environment variable
    api_key = os.getenv("DASHSCOPE_API_KEY")
    if not api_key:
        print("Error: DASHSCOPE_API_KEY not set.")
        return None, None

    data = {
        "model": "qwen-voice-design",
        "input": {
            "action": "create",
            "target_model": "qwen3-tts-vd-realtime-2026-01-15",
            "voice_prompt": "A composed middle-aged male announcer with a deep, rich "
                           "and magnetic voice, a steady speaking speed and clear "
                           "articulation, suitable for news broadcasting or "
                           "documentary commentary.",
            "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
            "preferred_name": "announcer",
            "language": "en"
        },
        "parameters": {
            "sample_rate": 24000,
            "response_format": "wav"
        }
    }

    response = requests.post(
        "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization",
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        },
        json=data,
        timeout=60
    )

    if response.status_code == 200:
        result = response.json()
        voice_name = result["output"]["voice"]
        audio_bytes = base64.b64decode(result["output"]["preview_audio"]["data"])

        # Save preview audio
        filename = f"{voice_name}_preview.wav"
        with open(filename, "wb") as f:
            f.write(audio_bytes)

        print(f"Voice created: {voice_name}")
        print(f"Preview saved to: {filename}")
        return voice_name, filename
    else:
        print(f"Request failed ({response.status_code}): {response.text}")
        return None, None

if __name__ == "__main__":
    create_voice()

Java

Add the Gson dependency to your project:

Maven

<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.13.1</version>
</dependency>

Gradle

// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Base64;

public class Main {
    public static void main(String[] args) {
        new Main().createVoice();
    }

    public void createVoice() {
        // Load API key from environment variable
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        String jsonBody = "{\n" +
                "    \"model\": \"qwen-voice-design\",\n" +
                "    \"input\": {\n" +
                "        \"action\": \"create\",\n" +
                "        \"target_model\": \"qwen3-tts-vd-realtime-2026-01-15\",\n" +
                "        \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, suitable for news broadcasting or documentary commentary.\",\n" +
                "        \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" +
                "        \"preferred_name\": \"announcer\",\n" +
                "        \"language\": \"en\"\n" +
                "    },\n" +
                "    \"parameters\": {\n" +
                "        \"sample_rate\": 24000,\n" +
                "        \"response_format\": \"wav\"\n" +
                "    }\n" +
                "}";

        HttpURLConnection connection = null;
        try {
            URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
            connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("POST");
            connection.setRequestProperty("Authorization", "Bearer " + apiKey);
            connection.setRequestProperty("Content-Type", "application/json");
            connection.setDoOutput(true);

            // Send request body
            try (OutputStream os = connection.getOutputStream()) {
                os.write(jsonBody.getBytes("UTF-8"));
                os.flush();
            }

            int responseCode = connection.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK) {
                StringBuilder response = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                    String line;
                    while ((line = br.readLine()) != null) {
                        response.append(line.trim());
                    }
                }

                // Parse response and save preview audio
                JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
                JsonObject output = jsonResponse.getAsJsonObject("output");
                String voiceName = output.get("voice").getAsString();
                String base64Audio = output.getAsJsonObject("preview_audio").get("data").getAsString();

                byte[] audioBytes = Base64.getDecoder().decode(base64Audio);
                String filename = voiceName + "_preview.wav";
                try (FileOutputStream fos = new FileOutputStream(filename)) {
                    fos.write(audioBytes);
                }

                System.out.println("Voice created: " + voiceName);
                System.out.println("Preview saved to: " + filename);
            } else {
                StringBuilder error = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                    String line;
                    while ((line = br.readLine()) != null) {
                        error.append(line.trim());
                    }
                }
                System.out.println("Request failed (" + responseCode + "): " + error);
            }
        } catch (Exception e) {
            System.err.println("Error: " + e.getMessage());
            e.printStackTrace();
        } finally {
            if (connection != null) connection.disconnect();
        }
    }
}

Use a custom voice for speech synthesis

Pass the returned voice name to the synthesis API. The synthesis model must match the design target_model.

Bidirectional streaming (real-time)

Uses qwen3-tts-vd-realtime-2026-01-15. See Real-time speech synthesis - Qwen for details.

Python

# pyaudio installation:
#   macOS:   brew install portaudio && pip install pyaudio
#   Ubuntu:  sudo apt-get install python3-pyaudio  (or pip install pyaudio)
#   CentOS:  sudo yum install -y portaudio portaudio-devel && pip install pyaudio
#   Windows: python -m pip install pyaudio

import pyaudio
import os
import base64
import threading
import time
import dashscope
from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat

TEXT_TO_SYNTHESIZE = [
    "Right? I really like this kind of supermarket,",
    "especially during the New Year.",
    "Going to the supermarket",
    "just makes me feel",
    "super, super happy!",
    "I want to buy so many things!"
]

def init_dashscope_api_key():
    """Load the API key from environment variable."""
    dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

class MyCallback(QwenTtsRealtimeCallback):
    """Callback for streaming TTS playback."""
    def __init__(self):
        self.complete_event = threading.Event()
        self._player = pyaudio.PyAudio()
        self._stream = self._player.open(
            format=pyaudio.paInt16, channels=1, rate=24000, output=True
        )

    def on_open(self) -> None:
        print("[TTS] Connection established")

    def on_close(self, close_status_code, close_msg) -> None:
        self._stream.stop_stream()
        self._stream.close()
        self._player.terminate()
        print(f"[TTS] Connection closed, code={close_status_code}, msg={close_msg}")

    def on_event(self, response: dict) -> None:
        event_type = response.get("type", "")
        if event_type == "session.created":
            print(f'[TTS] Session started: {response["session"]["id"]}')
        elif event_type == "response.audio.delta":
            audio_data = base64.b64decode(response["delta"])
            self._stream.write(audio_data)
        elif event_type == "response.done":
            print(f"[TTS] Response complete, ID: {qwen_tts_realtime.get_last_response_id()}")
        elif event_type == "session.finished":
            print("[TTS] Session finished")
            self.complete_event.set()

    def wait_for_finished(self):
        self.complete_event.wait()

if __name__ == "__main__":
    init_dashscope_api_key()

    callback = MyCallback()
    qwen_tts_realtime = QwenTtsRealtime(
        model="qwen3-tts-vd-realtime-2026-01-15",
        callback=callback,
        url="wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime"
    )
    qwen_tts_realtime.connect()

    qwen_tts_realtime.update_session(
        voice="<your-voice-name>",  # Replace with your voice design voice name
        response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
        mode="server_commit"
    )

    for text_chunk in TEXT_TO_SYNTHESIZE:
        print(f"[Sending text]: {text_chunk}")
        qwen_tts_realtime.append_text(text_chunk)
        time.sleep(0.1)

    qwen_tts_realtime.finish()
    callback.wait_for_finished()

    print(f"[Metric] session_id={qwen_tts_realtime.get_session_id()}, "
          f"first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s")

Java

import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;

import javax.sound.sampled.*;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class Main {
    private static String[] textToSynthesize = {
            "Right? I really like this kind of supermarket,",
            "especially during the New Year.",
            "Going to the supermarket",
            "just makes me feel",
            "super, super happy!",
            "I want to buy so many things!"
    };

    // Real-time PCM audio player
    public static class RealtimePcmPlayer {
        private int sampleRate;
        private SourceDataLine line;
        private Thread decoderThread;
        private Thread playerThread;
        private AtomicBoolean stopped = new AtomicBoolean(false);
        private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
        private Queue<byte[]> rawAudioBuffer = new ConcurrentLinkedQueue<>();

        public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
            this.sampleRate = sampleRate;
            AudioFormat audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
            DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
            line = (SourceDataLine) AudioSystem.getLine(info);
            line.open(audioFormat);
            line.start();

            decoderThread = new Thread(() -> {
                while (!stopped.get()) {
                    String b64Audio = b64AudioBuffer.poll();
                    if (b64Audio != null) {
                        rawAudioBuffer.add(Base64.getDecoder().decode(b64Audio));
                    } else {
                        try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); }
                    }
                }
            });

            playerThread = new Thread(() -> {
                while (!stopped.get()) {
                    byte[] rawAudio = rawAudioBuffer.poll();
                    if (rawAudio != null) {
                        int bytesWritten = 0;
                        while (bytesWritten < rawAudio.length) {
                            bytesWritten += line.write(rawAudio, bytesWritten, rawAudio.length - bytesWritten);
                        }
                        int audioLength = rawAudio.length / (this.sampleRate * 2 / 1000);
                        try { Thread.sleep(audioLength - 10); } catch (InterruptedException e) { throw new RuntimeException(e); }
                    } else {
                        try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); }
                    }
                }
            });

            decoderThread.start();
            playerThread.start();
        }

        public void write(String b64Audio) { b64AudioBuffer.add(b64Audio); }

        public void waitForComplete() throws InterruptedException {
            while (!b64AudioBuffer.isEmpty() || !rawAudioBuffer.isEmpty()) { Thread.sleep(100); }
            line.drain();
        }

        public void shutdown() throws InterruptedException {
            stopped.set(true);
            decoderThread.join();
            playerThread.join();
            if (line != null && line.isRunning()) { line.drain(); line.close(); }
        }
    }

    public static void main(String[] args) throws Exception {
        QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
                .model("qwen3-tts-vd-realtime-2026-01-15")
                .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                .apikey(System.getenv("DASHSCOPE_API_KEY"))
                .build();

        AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
        RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);

        QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
            @Override
            public void onOpen() { }

            @Override
            public void onEvent(JsonObject message) {
                String type = message.get("type").getAsString();
                switch (type) {
                    case "response.audio.delta":
                        audioPlayer.write(message.get("delta").getAsString());
                        break;
                    case "session.finished":
                        completeLatch.get().countDown();
                        break;
                }
            }

            @Override
            public void onClose(int code, String reason) { }
        });

        try {
            qwenTtsRealtime.connect();
        } catch (NoApiKeyException e) {
            throw new RuntimeException(e);
        }

        QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
                .voice("<your-voice-name>")  // Replace with your voice design voice name
                .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
                .mode("server_commit")
                .build();
        qwenTtsRealtime.updateSession(config);

        for (String text : textToSynthesize) {
            qwenTtsRealtime.appendText(text);
            Thread.sleep(100);
        }
        qwenTtsRealtime.finish();
        completeLatch.get().await();

        audioPlayer.waitForComplete();
        audioPlayer.shutdown();
        System.exit(0);
    }
}

Non-streaming and unidirectional streaming

Uses qwen3-tts-vd-2026-01-26. See Speech synthesis - Qwen for details.

The setup follows the same pattern as above: create a voice first, then pass the returned voice name to the speech synthesis API with the matching model. For non-streaming and unidirectional streaming code examples, see Speech synthesis - Qwen.

Query voices

cURL

# Query a specific voice
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "query",
        "voice": "<your-voice-name>"
    }
}'

# List all voices (paginated)
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "list",
        "page_size": 10,
        "page_index": 0
    }
}'

Python

import requests
import os

def query_voice(voice_name):
    """Get details for a specific voice."""
    api_key = os.getenv("DASHSCOPE_API_KEY")

    response = requests.post(
        "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization",
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        },
        json={
            "model": "qwen-voice-design",
            "input": {
                "action": "query",
                "voice": voice_name
            }
        }
    )

    if response.status_code == 200:
        result = response.json()
        print(f"Voice: {result['output']['voice']}")
        print(f"Model: {result['output']['target_model']}")
        print(f"Created: {result['output']['gmt_create']}")
        return result
    else:
        error = response.json()
        if error.get("code") == "VoiceNotFound":
            print(f"Voice not found: {voice_name}")
        else:
            print(f"Request failed ({response.status_code}): {response.text}")
        return None

def list_voices(page_index=0, page_size=10):
    """List all voices with pagination."""
    api_key = os.getenv("DASHSCOPE_API_KEY")

    response = requests.post(
        "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization",
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        },
        json={
            "model": "qwen-voice-design",
            "input": {
                "action": "list",
                "page_size": page_size,
                "page_index": page_index
            }
        }
    )

    if response.status_code == 200:
        result = response.json()
        total = result["output"]["total_count"]
        voices = result["output"]["voice_list"]
        print(f"Total voices: {total}")
        for v in voices:
            print(f"  - {v['voice']} ({v['language']}, {v['target_model']})")
        return result
    else:
        print(f"Request failed ({response.status_code}): {response.text}")
        return None

if __name__ == "__main__":
    list_voices()

Java

Query a specific voice:

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;

public class Main {

    public static void main(String[] args) {
        Main example = new Main();
        String voiceName = "<your-voice-name>";  // Replace with the actual voice name
        System.out.println("Querying voice: " + voiceName);
        example.queryVoice(voiceName);
    }

    public void queryVoice(String voiceName) {
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        String jsonBody = "{\n" +
                "    \"model\": \"qwen-voice-design\",\n" +
                "    \"input\": {\n" +
                "        \"action\": \"query\",\n" +
                "        \"voice\": \"" + voiceName + "\"\n" +
                "    }\n" +
                "}";

        HttpURLConnection connection = null;
        try {
            URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
            connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("POST");
            connection.setRequestProperty("Authorization", "Bearer " + apiKey);
            connection.setRequestProperty("Content-Type", "application/json");
            connection.setDoOutput(true);
            connection.setDoInput(true);

            try (OutputStream os = connection.getOutputStream()) {
                byte[] input = jsonBody.getBytes("UTF-8");
                os.write(input, 0, input.length);
                os.flush();
            }

            int responseCode = connection.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK) {
                StringBuilder response = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        response.append(responseLine.trim());
                    }
                }

                JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();

                if (jsonResponse.has("code") && "VoiceNotFound".equals(jsonResponse.get("code").getAsString())) {
                    String errorMessage = jsonResponse.has("message") ?
                            jsonResponse.get("message").getAsString() : "Voice not found";
                    System.out.println("Voice not found: " + voiceName);
                    System.out.println("Error message: " + errorMessage);
                    return;
                }

                JsonObject outputObj = jsonResponse.getAsJsonObject("output");
                System.out.println("Successfully queried voice information:");
                System.out.println("  Voice Name: " + outputObj.get("voice").getAsString());
                System.out.println("  Creation Time: " + outputObj.get("gmt_create").getAsString());
                System.out.println("  Modification Time: " + outputObj.get("gmt_modified").getAsString());
                System.out.println("  Language: " + outputObj.get("language").getAsString());
                System.out.println("  Preview Text: " + outputObj.get("preview_text").getAsString());
                System.out.println("  Model: " + outputObj.get("target_model").getAsString());
                System.out.println("  Voice Description: " + outputObj.get("voice_prompt").getAsString());
            } else {
                StringBuilder errorResponse = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        errorResponse.append(responseLine.trim());
                    }
                }
                System.out.println("Request failed with status code: " + responseCode);
                System.out.println("Error response: " + errorResponse.toString());
            }
        } catch (Exception e) {
            System.err.println("An error occurred during the request: " + e.getMessage());
            e.printStackTrace();
        } finally {
            if (connection != null) {
                connection.disconnect();
            }
        }
    }
}

List all voices (paginated):

import com.google.gson.Gson;
import com.google.gson.JsonArray;
import com.google.gson.JsonObject;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;

public class Main {
    public static void main(String[] args) {
        String apiKey = System.getenv("DASHSCOPE_API_KEY");
        String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";

        String jsonPayload =
                "{"
                        + "\"model\": \"qwen-voice-design\","
                        + "\"input\": {"
                        +     "\"action\": \"list\","
                        +     "\"page_size\": 10,"
                        +     "\"page_index\": 0"
                        + "}"
                        + "}";

        try {
            HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
            con.setRequestMethod("POST");
            con.setRequestProperty("Authorization", "Bearer " + apiKey);
            con.setRequestProperty("Content-Type", "application/json");
            con.setDoOutput(true);

            try (OutputStream os = con.getOutputStream()) {
                os.write(jsonPayload.getBytes("UTF-8"));
            }

            int status = con.getResponseCode();
            BufferedReader br = new BufferedReader(new InputStreamReader(
                    status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8"));

            StringBuilder response = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                response.append(line);
            }
            br.close();

            System.out.println("HTTP Status Code: " + status);

            if (status == 200) {
                Gson gson = new Gson();
                JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
                JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list");

                System.out.println("\nQueried voice list:");
                for (int i = 0; i < voiceList.size(); i++) {
                    JsonObject voiceItem = voiceList.get(i).getAsJsonObject();
                    String voice = voiceItem.get("voice").getAsString();
                    String gmtCreate = voiceItem.get("gmt_create").getAsString();
                    String targetModel = voiceItem.get("target_model").getAsString();
                    System.out.printf("- Voice: %s  Creation Time: %s  Model: %s\n",
                            voice, gmtCreate, targetModel);
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Delete a voice

cURL

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "delete",
        "voice": "<your-voice-name>"
    }
}'

Python

import requests
import os

def delete_voice(voice_name):
    """Delete a voice and release the quota."""
    api_key = os.getenv("DASHSCOPE_API_KEY")

    response = requests.post(
        "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization",
        headers={
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        },
        json={
            "model": "qwen-voice-design",
            "input": {
                "action": "delete",
                "voice": voice_name
            }
        }
    )

    if response.status_code == 200:
        print(f"Deleted: {voice_name}")
        return True
    else:
        print(f"Request failed ({response.status_code}): {response.text}")
        return False

if __name__ == "__main__":
    delete_voice("<your-voice-name>")

Java

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;

public class Main {

    public static void main(String[] args) {
        Main example = new Main();
        String voiceName = "<your-voice-name>";  // Replace with the actual voice name
        System.out.println("Deleting voice: " + voiceName);
        example.deleteVoice(voiceName);
    }

    public void deleteVoice(String voiceName) {
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        String jsonBody = "{\n" +
                "    \"model\": \"qwen-voice-design\",\n" +
                "    \"input\": {\n" +
                "        \"action\": \"delete\",\n" +
                "        \"voice\": \"" + voiceName + "\"\n" +
                "    }\n" +
                "}";

        HttpURLConnection connection = null;
        try {
            URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
            connection = (HttpURLConnection) url.openConnection();
            connection.setRequestMethod("POST");
            connection.setRequestProperty("Authorization", "Bearer " + apiKey);
            connection.setRequestProperty("Content-Type", "application/json");
            connection.setDoOutput(true);
            connection.setDoInput(true);

            try (OutputStream os = connection.getOutputStream()) {
                byte[] input = jsonBody.getBytes("UTF-8");
                os.write(input, 0, input.length);
                os.flush();
            }

            int responseCode = connection.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK) {
                StringBuilder response = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        response.append(responseLine.trim());
                    }
                }

                JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();

                if (jsonResponse.has("code") && jsonResponse.get("code").getAsString().contains("VoiceNotFound")) {
                    String errorMessage = jsonResponse.has("message") ?
                            jsonResponse.get("message").getAsString() : "Voice not found";
                    System.out.println("Voice does not exist: " + voiceName);
                    System.out.println("Error message: " + errorMessage);
                } else if (jsonResponse.has("usage")) {
                    System.out.println("Voice deleted successfully: " + voiceName);
                    String requestId = jsonResponse.has("request_id") ?
                            jsonResponse.get("request_id").getAsString() : "N/A";
                    System.out.println("Request ID: " + requestId);
                } else {
                    System.out.println("Unexpected response format: " + response.toString());
                }
            } else {
                StringBuilder errorResponse = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        errorResponse.append(responseLine.trim());
                    }
                }
                System.out.println("Request failed with status code: " + responseCode);
                System.out.println("Error response: " + errorResponse.toString());
            }
        } catch (Exception e) {
            System.err.println("An error occurred during the request: " + e.getMessage());
            e.printStackTrace();
        } finally {
            if (connection != null) {
                connection.disconnect();
            }
        }
    }
}

Voice quota and automatic cleanup

Limit: 1,000 voices per account. Check count via total_count in List voices response.
Automatic cleanup: Unused voices (past year) are deleted.

Billing

Voice design and speech synthesis are billed separately.

Voice creation: $0.2 per voice (failed creations free).

Note Free quota (Singapore only): 10 voices within 90 days of activation. Failed creations don't count. Deletions don't restore quota. Standard pricing applies after exhaustion or 90 days.

Speech synthesis with custom voices: Billed per character. For pricing details, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.

Error handling

Failed requests return code and message fields. See Error messages for the full reference.

Common errors for voice design:

HTTP status	Error code	Cause	Resolution
400	VoiceNotFound	The specified voice does not exist.	Verify via List voices or Query a voice API.

Alibaba Cloud Model Studio:Qwen voice design API reference

Prerequisites

How it works

Supported models

Language support

Write effective voice descriptions

Requirements and limitations

Description dimensions

Principles for effective descriptions

Examples

API reference

Common request details

Create a voice

List voices

Query a voice

Delete a voice

Sample code

Create a voice and preview

cURL

Python

Java

Maven

Gradle

Use a custom voice for speech synthesis

Bidirectional streaming (real-time)

Python

Java

Non-streaming and unidirectional streaming

Query voices

cURL

Python

Java

Delete a voice

cURL

Python

Java

Voice quota and automatic cleanup

Billing

Error handling

Related topics