Qwen-TTS voice design API reference - Alibaba Cloud Model Studio

Voice design generates custom voices from text descriptions. It supports multi-language and multi-dimensional voice characteristics, making it suitable for applications such as ad voiceovers, character creation, and audiobook production. Voice design and speech synthesis are two sequential steps. This document focuses on the parameters and interface details of voice design. For more information about speech synthesis, see Real-time speech synthesis - Qwen.

User guide: For model introduction and selection recommendations, see Real-time speech synthesis - Qwen.

Supported languages

Voice design supports voice creation and speech synthesis in multiple languages, including the following: Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), Russian (ru).

How to write high-quality voice descriptions

Limitations

When writing a voice description (voice_prompt), adhere to the following constraints:

Length limit: voice_prompt must not exceed 2048 characters.
Supported languages: The description text can only be in Chinese or English.

Core principles

A high-quality voice description (voice_prompt) is key to creating your ideal voice. It acts as a blueprint that directly guides the model to generate a voice with specific characteristics.

Follow these core principles when describing a voice:

Be specific, not vague: Use words that clearly describe vocal traits, such as "deep," "crisp," or "fast-paced." Avoid subjective and uninformative terms such as "nice-sounding" or "ordinary."
Be multi-dimensional, not single-dimensional: Effective descriptions combine multiple dimensions, such as gender, age, and emotion, as described below. A single-dimension description, such as "female voice," is too broad to produce a distinctive voice.
Be objective, not subjective: Focus on the physical and perceptual features of the voice itself, not personal preferences. For example, use "high-pitched and energetic" instead of "my favorite voice."
Be original, not imitative: Describe vocal traits rather than requesting the imitation of specific people, such as celebrities. Such requests involve copyright risks, and the model does not support direct imitation.
Be concise, not redundant: Ensure every word adds meaning. Avoid repeating synonyms or using meaningless intensifiers, such as "very very nice voice."

Description dimensions reference

Dimension	Example descriptions
Gender	Male, female, neutral
Age	Child (5–12 years), teen (13–18 years), young adult (19–35 years), middle-aged (36–55 years), senior (55+ years)
Pitch	High, mid, low, slightly high, slightly low
Speaking rate	Fast, medium, slow, slightly fast, slightly slow
Emotion	Cheerful, calm, gentle, serious, lively, composed, soothing
Characteristics	Magnetic, crisp, raspy, smooth, sweet, rich, powerful
Use case	News broadcast, ad voiceover, audiobook, animated character, voice assistant, documentary narration

Example comparison

✅ Recommended examples

"A young, lively female voice with a fast speaking rate and noticeably rising intonation, suitable for introducing fashion products."
Analysis: This description combines age, personality, speaking rate, and intonation, and specifies a use case, creating a vivid and clear image.
"A calm middle-aged male voice with a slow speaking rate, deep and magnetic tone, ideal for news reading or documentary narration."
Analysis: This description clearly defines gender, age range, speaking rate, tonal qualities, and application domain.
"A cute child’s voice, approximately an 8-year-old girl, with a slightly childish tone, perfect for animated character dubbing."
Analysis: This description specifies an exact age and vocal trait ("childish"), with a clear purpose.
"A gentle and intellectual female voice, around 30 years old, with a calm tone, suitable for audiobook narration."
Analysis: This description effectively conveys emotional and stylistic qualities through words such as "intellectual" and "calm."

❌ Not recommended examples and suggestions

Example	Main issue	Suggestion
Nice-sounding voice	Too vague and subjective. Lacks actionable features.	Add specific dimensions, for example, "a clear-toned young female voice with a gentle intonation."
Sounds like a certain celebrity	Involves copyright risk. The model cannot directly imitate a specific person.	Describe the vocal traits instead, for example, "a mature, magnetic male voice with a steady pace."
Very very very nice female voice	Redundant. Repeated words do not help define the voice.	Remove repetition and add meaningful descriptors, for example, "a 20–24-year-old female voice with a light, upbeat tone and sweet timbre."
123456	Invalid input. It cannot be parsed as voice characteristics.	Provide meaningful text descriptions. For more information, see the recommended examples above.

Getting started: From voice design to speech synthesis

1. Workflow

Voice design and speech synthesis are two closely linked but independent steps that follow a "create first, then use" workflow:

Prepare the voice description and preview text for voice design.
- Voice description (voice_prompt): Defines the target voice characteristics. For guidance, see "How to write high-quality voice descriptions."
- Preview text (preview_text): The text that the preview audio will read aloud, for example, "Hello everyone, welcome to the show."
Call the Create voice API to generate a custom voice and get its name and preview audio.
In this step, you must specify target_model to declare which speech synthesis model will drive the created voice.
Listen to the preview audio to evaluate if it meets your expectations. If it does, proceed. If not, redesign the voice.
If you already have a created voice, which you can verify using the List voices API, you can skip this step and proceed to the next one.
Use the voice for speech synthesis.
Call the speech synthesis API and pass the voice obtained in the previous step. The speech synthesis model used here must match the target_model specified in the previous step.

2. Model configurations and preparations

Select the appropriate model and complete the setup tasks.

Model configurations

Specify the following two models during voice design:

Voice design model: qwen-voice-design
Speech synthesis model that drives the voice: Currently, only qwen3-tts-vd-realtime-2025-12-16 is supported.

Preparations

Get an API key: Get and Configure an API Key. For security, store your API key in an environment variable.
Install the SDK: Install the latest DashScope SDK.

3. Sample Code

Generate a custom voice and listen to the preview. If you are satisfied, proceed. Otherwise, regenerate the voice.

Python

import requests
import base64
import os

def create_voice_and_play():
    # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you haven't set an environment variable, replace the line below with: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")
    
    if not api_key:
        print("Error: DASHSCOPE_API_KEY environment variable not found. Please set your API key.")
        return None, None, None
    
    # Prepare request data
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    
    data = {
        "model": "qwen-voice-design",
        "input": {
            "action": "create",
            "target_model": "qwen3-tts-vd-realtime-2025-12-16",
            "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.",
            "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
            "preferred_name": "announcer",
            "language": "en"
        },
        "parameters": {
            "sample_rate": 24000,
            "response_format": "wav"
        }
    }
    
    # URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    
    try:
        # Send request
        response = requests.post(
            url,
            headers=headers,
            json=data,
            timeout=60  # Add timeout setting
        )
        
        if response.status_code == 200:
            result = response.json()
            
            # Get voice name
            voice_name = result["output"]["voice"]
            print(f"Voice name: {voice_name}")
            
            # Get preview audio data
            base64_audio = result["output"]["preview_audio"]["data"]
            
            # Decode Base64 audio data
            audio_bytes = base64.b64decode(base64_audio)
            
            # Save audio file locally
            filename = f"{voice_name}_preview.wav"
            
            # Write audio data to local file
            with open(filename, 'wb') as f:
                f.write(audio_bytes)
            
            print(f"Audio saved to local file: {filename}")
            print(f"File path: {os.path.abspath(filename)}")
            
            return voice_name, audio_bytes, filename
        else:
            print(f"Request failed. Status code: {response.status_code}")
            print(f"Response: {response.text}")
            return None, None, None
            
    except requests.exceptions.RequestException as e:
        print(f"Network request error: {e}")
        return None, None, None
    except KeyError as e:
        print(f"Response format error: missing required field: {e}")
        print(f"Response: {response.text if 'response' in locals() else 'No response'}")
        return None, None, None
    except Exception as e:
        print(f"Unexpected error: {e}")
        return None, None, None

if __name__ == "__main__":
    print("Creating voice...")
    voice_name, audio_data, saved_filename = create_voice_and_play()
    
    if voice_name:
        print(f"\nSuccessfully created voice '{voice_name}'")
        print(f"Audio file saved: '{saved_filename}'")
        print(f"File size: {os.path.getsize(saved_filename)} bytes")
    else:
        print("\nVoice creation failed")

Java

You need to import the Gson dependency. If you use Maven or Gradle, add the dependency:

Maven

Add the following content to pom.xml:

<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.13.1</version>
</dependency>

Gradle

Add the following content to build.gradle:

// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Base64;

public class Main {
    public static void main(String[] args) {
        Main example = new Main();
        example.createVoice();
    }

    public void createVoice() {
        // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        // If you haven't set an environment variable, replace the line below with: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        // Create the JSON request body string
        String jsonBody = "{\n" +
                "    \"model\": \"qwen-voice-design\",\n" +
                "    \"input\": {\n" +
                "        \"action\": \"create\",\n" +
                "        \"target_model\": \"qwen3-tts-vd-realtime-2025-12-16\",\n" +
                "        \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" +
                "        \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" +
                "        \"preferred_name\": \"announcer\",\n" +
                "        \"language\": \"en\"\n" +
                "    },\n" +
                "    \"parameters\": {\n" +
                "        \"sample_rate\": 24000,\n" +
                "        \"response_format\": \"wav\"\n" +
                "    }\n" +
                "}";

        HttpURLConnection connection = null;
        try {
            // URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
            URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
            connection = (HttpURLConnection) url.openConnection();

            // Set request method and headers
            connection.setRequestMethod("POST");
            connection.setRequestProperty("Authorization", "Bearer " + apiKey);
            connection.setRequestProperty("Content-Type", "application/json");
            connection.setDoOutput(true);
            connection.setDoInput(true);

            // Send request body
            try (OutputStream os = connection.getOutputStream()) {
                byte[] input = jsonBody.getBytes("UTF-8");
                os.write(input, 0, input.length);
                os.flush();
            }

            // Get response
            int responseCode = connection.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK) {
                // Read response content
                StringBuilder response = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        response.append(responseLine.trim());
                    }
                }

                // Parse JSON response
                JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
                JsonObject outputObj = jsonResponse.getAsJsonObject("output");
                JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio");

                // Get voice name
                String voiceName = outputObj.get("voice").getAsString();
                System.out.println("Voice name: " + voiceName);

                // Get Base64-encoded audio data
                String base64Audio = previewAudioObj.get("data").getAsString();

                // Decode Base64 audio data
                byte[] audioBytes = Base64.getDecoder().decode(base64Audio);

                // Save audio to a local file
                String filename = voiceName + "_preview.wav";
                saveAudioToFile(audioBytes, filename);

                System.out.println("Audio saved to local file: " + filename);

            } else {
                // Read error response
                StringBuilder errorResponse = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        errorResponse.append(responseLine.trim());
                    }
                }

                System.out.println("Request failed. Status code: " + responseCode);
                System.out.println("Error response: " + errorResponse.toString());
            }

        } catch (Exception e) {
            System.err.println("Request error: " + e.getMessage());
            e.printStackTrace();
        } finally {
            if (connection != null) {
                connection.disconnect();
            }
        }
    }

    private void saveAudioToFile(byte[] audioBytes, String filename) {
        try {
            File file = new File(filename);
            try (FileOutputStream fos = new FileOutputStream(file)) {
                fos.write(audioBytes);
            }
            System.out.println("Audio saved to: " + file.getAbsolutePath());
        } catch (IOException e) {
            System.err.println("Error saving audio file: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Use the custom voice generated in the previous step for speech synthesis.

This example is based on the "server commit mode" of the DashScope SDK for speech synthesis using a system voice. Replace the voice parameter with the custom voice generated by voice design.

Key Principle: The model used during voice design (target_model) must be the same as the model used for subsequent speech synthesis (model). Otherwise, the synthesis will fail.

Python

# coding=utf-8
# Installation instructions for pyaudio:
# APPLE Mac OS X
#   brew install portaudio
#   pip install pyaudio
# Debian/Ubuntu
#   sudo apt-get install python-pyaudio python3-pyaudio
#   or
#   pip install pyaudio
# CentOS
#   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
#   python -m pip install pyaudio

import pyaudio
import os
import base64
import threading
import time
import dashscope  # DashScope Python SDK version 1.23.9 or later is required
from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat

# ======= Constant Configuration =======
TEXT_TO_SYNTHESIZE = [
    'Right? I just love this kind of supermarket,',
    'especially during the New Year.',
    'Going to the supermarket',
    'just makes me feel',
    'super, super happy!',
    'I want to buy so many things!'
]

def init_dashscope_api_key():
    """
    Initializes the DashScope SDK API key
    """
    # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you haven't set an environment variable, replace the line below with: dashscope.api_key = "sk-xxx"
    dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

# ======= Callback Class =======
class MyCallback(QwenTtsRealtimeCallback):
    """
    Custom TTS streaming callback
    """
    def __init__(self):
        self.complete_event = threading.Event()
        self._player = pyaudio.PyAudio()
        self._stream = self._player.open(
            format=pyaudio.paInt16, channels=1, rate=24000, output=True
        )

    def on_open(self) -> None:
        print('[TTS] Connection established')

    def on_close(self, close_status_code, close_msg) -> None:
        self._stream.stop_stream()
        self._stream.close()
        self._player.terminate()
        print(f'[TTS] Connection closed, code={close_status_code}, msg={close_msg}')

    def on_event(self, response: dict) -> None:
        try:
            event_type = response.get('type', '')
            if event_type == 'session.created':
                print(f'[TTS] Session started: {response["session"]["id"]}')
            elif event_type == 'response.audio.delta':
                audio_data = base64.b64decode(response['delta'])
                self._stream.write(audio_data)
            elif event_type == 'response.done':
                print(f'[TTS] Response complete, Response ID: {qwen_tts_realtime.get_last_response_id()}')
            elif event_type == 'session.finished':
                print('[TTS] Session finished')
                self.complete_event.set()
        except Exception as e:
            print(f'[Error] Exception processing callback event: {e}')

    def wait_for_finished(self):
        self.complete_event.wait()

# ======= Main Execution Logic =======
if __name__ == '__main__':
    init_dashscope_api_key()
    print('[System] Initializing Qwen TTS Realtime ...')

    callback = MyCallback()
    qwen_tts_realtime = QwenTtsRealtime(
        # Voice design and speech synthesis must use the same model
        model="qwen3-tts-vd-realtime-2025-12-16",
        callback=callback,
        # URL for Singapore region. For Beijing region, use: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
        url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
    )
    qwen_tts_realtime.connect()
    
    qwen_tts_realtime.update_session(
        voice="myvoice", # Replace the voice parameter with the custom voice generated by voice design
        response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
        mode='server_commit'
    )

    for text_chunk in TEXT_TO_SYNTHESIZE:
        print(f'[Sending text]: {text_chunk}')
        qwen_tts_realtime.append_text(text_chunk)
        time.sleep(0.1)

    qwen_tts_realtime.finish()
    callback.wait_for_finished()

    print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, '
          f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')

Java

import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;

import javax.sound.sampled.*;
import java.io.*;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class Main {
    // ===== Constant Definitions =====
    private static String[] textToSynthesize = {
            "Right? I just love this kind of supermarket,",
            "especially during the New Year.",
            "Going to the supermarket",
            "just makes me feel",
            "super, super happy!",
            "I want to buy so many things!"
    };

    // Real-time audio player class
    public static class RealtimePcmPlayer {
        private int sampleRate;
        private SourceDataLine line;
        private AudioFormat audioFormat;
        private Thread decoderThread;
        private Thread playerThread;
        private AtomicBoolean stopped = new AtomicBoolean(false);
        private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
        private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();

        // Constructor initializes audio format and audio line
        public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
            this.sampleRate = sampleRate;
            this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
            DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
            line = (SourceDataLine) AudioSystem.getLine(info);
            line.open(audioFormat);
            line.start();
            decoderThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        String b64Audio = b64AudioBuffer.poll();
                        if (b64Audio != null) {
                            byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
                            RawAudioBuffer.add(rawAudio);
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            playerThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        byte[] rawAudio = RawAudioBuffer.poll();
                        if (rawAudio != null) {
                            try {
                                playChunk(rawAudio);
                            } catch (IOException e) {
                                throw new RuntimeException(e);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            decoderThread.start();
            playerThread.start();
        }

        // Plays an audio chunk and blocks until playback is complete
        private void playChunk(byte[] chunk) throws IOException, InterruptedException {
            if (chunk == null || chunk.length == 0) return;

            int bytesWritten = 0;
            while (bytesWritten < chunk.length) {
                bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
            }
            int audioLength = chunk.length / (this.sampleRate*2/1000);
            // Wait for the audio in the buffer to finish playing
            Thread.sleep(audioLength - 10);
        }

        public void write(String b64Audio) {
            b64AudioBuffer.add(b64Audio);
        }

        public void cancel() {
            b64AudioBuffer.clear();
            RawAudioBuffer.clear();
        }

        public void waitForComplete() throws InterruptedException {
            while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
                Thread.sleep(100);
            }
            line.drain();
        }

        public void shutdown() throws InterruptedException {
            stopped.set(true);
            decoderThread.join();
            playerThread.join();
            if (line != null && line.isRunning()) {
                line.drain();
                line.close();
            }
        }
    }

    public static void main(String[] args) throws Exception {
        QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
                // Voice design and speech synthesis must use the same model
                .model("qwen3-tts-vd-realtime-2025-12-16")
                // URL for Singapore region. For Beijing region, use: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
                .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                // If you haven't set an environment variable, replace the line below with: .apikey("sk-xxx")
                .apikey(System.getenv("DASHSCOPE_API_KEY"))
                .build();
        AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
        final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);

        // Create a real-time audio player instance
        RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);

        QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
            @Override
            public void onOpen() {
                // Handle connection open
            }
            @Override
            public void onEvent(JsonObject message) {
                String type = message.get("type").getAsString();
                switch(type) {
                    case "session.created":
                        // Handle session creation
                        break;
                    case "response.audio.delta":
                        String recvAudioB64 = message.get("delta").getAsString();
                        // Play audio in real time
                        audioPlayer.write(recvAudioB64);
                        break;
                    case "response.done":
                        // Handle response completion
                        break;
                    case "session.finished":
                        // Handle session finish
                        completeLatch.get().countDown();
                    default:
                        break;
                }
            }
            @Override
            public void onClose(int code, String reason) {
                // Handle connection close
            }
        });
        qwenTtsRef.set(qwenTtsRealtime);
        try {
            qwenTtsRealtime.connect();
        } catch (NoApiKeyException e) {
            throw new RuntimeException(e);
        }
        QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
                .voice("myvoice") // Replace the voice parameter with the custom voice generated by voice design
                .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
                .mode("server_commit")
                .build();
        qwenTtsRealtime.updateSession(config);
        for (String text:textToSynthesize) {
            qwenTtsRealtime.appendText(text);
            Thread.sleep(100);
        }
        qwenTtsRealtime.finish();
        completeLatch.get().await();

        // Wait for audio playback to complete and then shut down the player
        audioPlayer.waitForComplete();
        audioPlayer.shutdown();
        System.exit(0);
    }
}

API reference

Ensure that you use the same account when calling different APIs.

Create voice

Creates a custom voice by providing a voice description and preview text.

URL

China (Beijing):

POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

International (Singapore):

POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization

Request headers

Parameter	Type	Required	Description
Authorization	string	Yes	Authentication token. Format: `Bearer <your_api_key>`. Replace "`<your_api_key>`" with your actual API key.
Content-Type	string	Supported	The media type of the data transmitted in the request body. Fixed value: `application/json`.

Request body

The request body contains all request parameters. Omit optional fields as needed.

Important

Note the difference between the following parameters:

model: The voice design model. The value is fixed at qwen-voice-design.
target_model: The speech synthesis model that drives the voice. It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, the synthesis will fail.

{
    "model": "qwen-voice-design",
    "input": {
        "action": "create",
        "target_model": "qwen3-tts-vd-realtime-2025-12-16",
        "voice_prompt": "A calm middle-aged male announcer with a deep, rich, and magnetic voice, steady speaking speed, and clear articulation, suitable for news broadcasting or documentary narration.",
        "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
        "preferred_name": "announcer",
        "language": "en"
    },
    "parameters": {
        "sample_rate": 24000,
        "response_format": "wav"
    }
}

Request parameters

Parameter	Type	Default	Required	Description
model	string	-	Supported	The voice design model. Fixed value: `qwen-voice-design`.
action	string	-	Yes	The operation type. Fixed value: `create`.
target_model	string	-	Yes	The speech synthesis model that drives the voice. Currently, only qwen3-tts-vd-realtime-2025-12-16 is supported. It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, synthesis will fail.
voice_prompt	string	-	Supported	Voice description. Maximum length: 2048 characters. Only Chinese and English are supported. For guidance on writing voice descriptions, see "How to write high-quality voice descriptions".
preview_text	string	-	Yes	The text for the preview audio. Maximum length: 1024 characters. Supports Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), Russian (ru).
preferred_name	string	-	Supported	Assign an easy-to-identify name to the voice (only numbers, letters, and underscores are allowed; max 16 characters). We recommend using an identifier related to the character or scenario. The keyword will appear in the designed voice name. For example, if the keyword is "announcer", the final voice name will be "qwen-tts-vd-announcer-voice-20251201102800-a1b2"
language	string	zh	No	Language code. Specifies the language preference for the generated voice. This parameter affects the linguistic features and pronunciation tendencies of the voice. Choose the code that matches your use case. If you use this parameter, the language must match the language of the `preview_text`. Valid values: `zh` (Chinese), `en` (English), `de` (German), `it` (Italian), `pt` (Portuguese), `es` (Spanish), `ja` (Japanese), `ko` (Korean), `fr` (French), `ru` (Russian).
sample_rate	int	24000	No	The sample rate (in Hz) of the preview audio generated by voice design. Valid values: 8000 16000 24000 48000
response_format	string	wav	No	The format of the preview audio generated by voice design. Valid values: pcm wav mp3 opus

Response parameters

Click to view a response example

{
    "output": {
        "preview_audio": {
            "data": "{base64_encoded_audio}",
            "sample_rate": 24000,
            "response_format": "wav"
        },
        "target_model": "qwen3-tts-vd-realtime-2025-12-16",
        "voice": "yourVoice"
    },
    "usage": {
        "count": 1
    },
    "request_id": "yourRequestId"
}

The key parameters are:

Parameter	Type	Description
voice	string	The voice name. You can use it directly as the `voice` parameter in the speech synthesis API.
data	string	The preview audio data generated by voice design, returned as a Base64-encoded string.
sample_rate	int	The sample rate (in Hz) of the preview audio generated by voice design. It matches the sample rate set during voice creation. The default is 24000 Hz.
response_format	string	The format of the preview audio generated by voice design. It matches the audio format set during voice creation. The default is wav.
target_model	string	The speech synthesis model that drives the voice. Currently, only qwen3-tts-vd-realtime-2025-12-16 is supported. It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, synthesis will fail.
request_id	string	Request ID.
count	integer	The number of "Create voice" operations billed for this request. The cost for this request is $ $co u n t \times 0.2$ . For voice creation, count is always 1.

Sample code

Important

Note the difference between the following parameters:

model: The voice design model. The value is fixed at qwen-voice-design.
target_model: The speech synthesis model that drives the voice. It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, the synthesis will fail.

cURL

If you have not set your API key as an environment variable, replace $DASHSCOPE_API_KEY in the example with your actual API key.

# ======= Important =======
# The URL below is for the Singapore region. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
# API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "create",
        "target_model": "qwen3-tts-vd-realtime-2025-12-16",
        "voice_prompt": "A calm middle-aged male announcer with a deep, rich, and magnetic voice, steady speaking speed, and clear articulation, suitable for news broadcasting or documentary narration.",
        "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
        "preferred_name": "announcer",
        "language": "en"
    },
    "parameters": {
        "sample_rate": 24000,
        "response_format": "wav"
    }
}'

Python

import requests
import base64
import os

def create_voice_and_play():
    # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you haven't set an environment variable, replace the line below with: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")
    
    if not api_key:
        print("Error: DASHSCOPE_API_KEY environment variable not found. Please set your API key.")
        return None, None, None
    
    # Prepare request data
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    
    data = {
        "model": "qwen-voice-design",
        "input": {
            "action": "create",
            "target_model": "qwen3-tts-vd-realtime-2025-12-16",
            "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.",
            "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
            "preferred_name": "announcer",
            "language": "en"
        },
        "parameters": {
            "sample_rate": 24000,
            "response_format": "wav"
        }
    }
    
    # URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    
    try:
        # Send request
        response = requests.post(
            url,
            headers=headers,
            json=data,
            timeout=60  # Add timeout setting
        )
        
        if response.status_code == 200:
            result = response.json()
            
            # Get voice name
            voice_name = result["output"]["voice"]
            print(f"Voice name: {voice_name}")
            
            # Get preview audio data
            base64_audio = result["output"]["preview_audio"]["data"]
            
            # Decode Base64 audio data
            audio_bytes = base64.b64decode(base64_audio)
            
            # Save audio file locally
            filename = f"{voice_name}_preview.wav"
            
            # Write audio data to local file
            with open(filename, 'wb') as f:
                f.write(audio_bytes)
            
            print(f"Audio saved to local file: {filename}")
            print(f"File path: {os.path.abspath(filename)}")
            
            return voice_name, audio_bytes, filename
        else:
            print(f"Request failed. Status code: {response.status_code}")
            print(f"Response: {response.text}")
            return None, None, None
            
    except requests.exceptions.RequestException as e:
        print(f"Network request error: {e}")
        return None, None, None
    except KeyError as e:
        print(f"Response format error: missing required field: {e}")
        print(f"Response: {response.text if 'response' in locals() else 'No response'}")
        return None, None, None
    except Exception as e:
        print(f"Unexpected error: {e}")
        return None, None, None

if __name__ == "__main__":
    print("Creating voice...")
    voice_name, audio_data, saved_filename = create_voice_and_play()
    
    if voice_name:
        print(f"\nSuccessfully created voice '{voice_name}'")
        print(f"Audio file saved: '{saved_filename}'")
        print(f"File size: {os.path.getsize(saved_filename)} bytes")
    else:
        print("\nVoice creation failed")

Java

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Base64;

public class Main {
    public static void main(String[] args) {
        Main example = new Main();
        example.createVoice();
    }

    public void createVoice() {
        // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        // If you haven't set an environment variable, replace the line below with: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        // Create the JSON request body string
        String jsonBody = "{\n" +
                "    \"model\": \"qwen-voice-design\",\n" +
                "    \"input\": {\n" +
                "        \"action\": \"create\",\n" +
                "        \"target_model\": \"qwen3-tts-vd-realtime-2025-12-16\",\n" +
                "        \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" +
                "        \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" +
                "        \"preferred_name\": \"announcer\",\n" +
                "        \"language\": \"en\"\n" +
                "    },\n" +
                "    \"parameters\": {\n" +
                "        \"sample_rate\": 24000,\n" +
                "        \"response_format\": \"wav\"\n" +
                "    }\n" +
                "}";

        HttpURLConnection connection = null;
        try {
            // URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
            URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
            connection = (HttpURLConnection) url.openConnection();

            // Set request method and headers
            connection.setRequestMethod("POST");
            connection.setRequestProperty("Authorization", "Bearer " + apiKey);
            connection.setRequestProperty("Content-Type", "application/json");
            connection.setDoOutput(true);
            connection.setDoInput(true);

            // Send request body
            try (OutputStream os = connection.getOutputStream()) {
                byte[] input = jsonBody.getBytes("UTF-8");
                os.write(input, 0, input.length);
                os.flush();
            }

            // Get response
            int responseCode = connection.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK) {
                // Read response content
                StringBuilder response = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        response.append(responseLine.trim());
                    }
                }

                // Parse JSON response
                JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
                JsonObject outputObj = jsonResponse.getAsJsonObject("output");
                JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio");

                // Get voice name
                String voiceName = outputObj.get("voice").getAsString();
                System.out.println("Voice name: " + voiceName);

                // Get Base64-encoded audio data
                String base64Audio = previewAudioObj.get("data").getAsString();

                // Decode Base64 audio data
                byte[] audioBytes = Base64.getDecoder().decode(base64Audio);

                // Save audio to a local file
                String filename = voiceName + "_preview.wav";
                saveAudioToFile(audioBytes, filename);

                System.out.println("Audio saved to local file: " + filename);

            } else {
                // Read error response
                StringBuilder errorResponse = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        errorResponse.append(responseLine.trim());
                    }
                }

                System.out.println("Request failed. Status code: " + responseCode);
                System.out.println("Error response: " + errorResponse.toString());
            }

        } catch (Exception e) {
            System.err.println("Request error: " + e.getMessage());
            e.printStackTrace();
        } finally {
            if (connection != null) {
                connection.disconnect();
            }
        }
    }

    private void saveAudioToFile(byte[] audioBytes, String filename) {
        try {
            File file = new File(filename);
            try (FileOutputStream fos = new FileOutputStream(file)) {
                fos.write(audioBytes);
            }
            System.out.println("Audio saved to: " + file.getAbsolutePath());
        } catch (IOException e) {
            System.err.println("Error saving audio file: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

List voices

Performa a paged query to list created voices.

URL

China (Beijing):

POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

International (Singapore):

POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization

Request header

Parameter	Type	Required	Description
Authorization	string	Yes	An authentication token. The format is `Bearer <your_api_key>`. Replace "`<your_api_key>`" with your actual API key.
Content-Type	string	Supported	The media type of the data in the request body. The value is fixed to `application/json`.

Request body
The request body contains all request parameters. You can omit optional fields as needed.
Important
model: The voice design model. The value is fixed at qwen-voice-design. Do not modify this value.
```
{
    "model": "qwen-voice-design",
    "input": {
        "action": "list",
        "page_size": 10,
        "page_index": 0
    }
}
```

Request parameters

Parameter	Type	Default	Required	Description
model	string	-	Supported	The voice design model. Fixed value: `qwen-voice-design`.
action	string	-	Supported	The operation type. Fixed value: `list`.
page_index	integer	0	No	Page index. Value range: [0, 200].
page_size	integer	10	No	The number of data entries per page. Value must be greater than 0.

Response parameters

Click to view a response example

{
    "output": {
        "page_index": 0,
        "page_size": 2,
        "total_count": 26,
        "voice_list": [
            {
                "gmt_create": "2025-12-10 17:04:54",
                "gmt_modified": "2025-12-10 17:04:54",
                "language": "en",
                "preview_text": "Dear listeners, hello everyone. Welcome to today's program.",
                "target_model": "qwen3-tts-vd-realtime-2025-12-16",
                "voice": "yourVoice1",
                "voice_prompt": "A calm middle-aged male announcer with a deep, rich, and magnetic voice, steady speaking speed, and clear articulation, suitable for news broadcasting or documentary narration., deep and magnetic, steady speaking speed"
            },
            {
                "gmt_create": "2025-12-10 15:31:35",
                "gmt_modified": "2025-12-10 15:31:35",
                "language": "en",
                "preview_text": "Dear listeners, hello everyone",
                "target_model": "qwen3-tts-vd-realtime-2025-12-16",
                "voice": "yourVoice2",
                "voice_prompt": "A calm middle-aged male announcer with a deep, rich, and magnetic voice, steady speaking speed, and clear articulation, suitable for news broadcasting or documentary narration."
            }
        ]
    },
    "usage": {},
    "request_id": "yourRequestId"
}

The key parameters are:

Parameter	Type	Description
voice	string	The voice name. You can use it directly as the `voice` parameter in the speech synthesis API.
target_model	string	The speech synthesis model that drives the voice. Currently, only qwen3-tts-vd-realtime-2025-12-16 is supported. It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, synthesis will fail.
language	string	Language code. Valid values: `zh` (Chinese), `en` (English), `de` (German), `it` (Italian), `pt` (Portuguese), `es` (Spanish), `ja` (Japanese), `ko` (Korean), `fr` (French), `ru` (Russian).
voice_prompt	string	Voice description.
preview_text	string	Preview text.
gmt_create	string	The time the voice was created.
gmt_modified	string	The time the voice was modified.
page_index	integer	Page index.
page_size	integer	The number of data entries per page.
total_count	integer	The total number of data entries found.
request_id	string	Request ID.

Sample code

Important

model: The voice design model. The value is fixed at qwen-voice-design. Do not modify this value.

cURL

If you have not set the API key as an environment variable, replace $DASHSCOPE_API_KEY in the example with your actual API key.

# ======= Important =======
# The URL below is for the Singapore region. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
# API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "list",
        "page_size": 10,
        "page_index": 0
    }
}'

Python

import os
import requests

# API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you haven't set an environment variable, replace the line below with: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
# URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"

payload = {
    "model": "qwen-voice-design", # Do not modify this value
    "input": {
        "action": "list",
        "page_size": 10,
        "page_index": 0
    }
}

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print("HTTP Status Code:", response.status_code)

if response.status_code == 200:
    data = response.json()
    voice_list = data["output"]["voice_list"]

    print("Queried voice list:")
    for item in voice_list:
        print(f"- Voice: {item['voice']}  Created: {item['gmt_create']}  Model: {item['target_model']}")
else:
    print("Request failed:", response.text)

Java

import com.google.gson.Gson;
import com.google.gson.JsonArray;
import com.google.gson.JsonObject;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;

public class Main {
    public static void main(String[] args) {
        // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        // If you haven't set an environment variable, replace the line below with: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");
        // URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";

        // JSON request body (older Java versions do not support """ multiline strings)
        String jsonPayload =
                "{"
                        + "\"model\": \"qwen-voice-design\"," // Do not modify this value
                        + "\"input\": {"
                        +     "\"action\": \"list\","
                        +     "\"page_size\": 10,"
                        +     "\"page_index\": 0"
                        + "}"
                        + "}";

        try {
            HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
            con.setRequestMethod("POST");
            con.setRequestProperty("Authorization", "Bearer " + apiKey);
            con.setRequestProperty("Content-Type", "application/json");
            con.setDoOutput(true);

            try (OutputStream os = con.getOutputStream()) {
                os.write(jsonPayload.getBytes("UTF-8"));
            }

            int status = con.getResponseCode();
            BufferedReader br = new BufferedReader(new InputStreamReader(
                    status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8"));

            StringBuilder response = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                response.append(line);
            }
            br.close();

            System.out.println("HTTP Status Code: " + status);
            System.out.println("Returned JSON: " + response.toString());

            if (status == 200) {
                Gson gson = new Gson();
                JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
                JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list");

                System.out.println("\n Queried voice list:");
                for (int i = 0; i < voiceList.size(); i++) {
                    JsonObject voiceItem = voiceList.get(i).getAsJsonObject();
                    String voice = voiceItem.get("voice").getAsString();
                    String gmtCreate = voiceItem.get("gmt_create").getAsString();
                    String targetModel = voiceItem.get("target_model").getAsString();

                    System.out.printf("- Voice: %s  Created: %s  Model: %s\n",
                            voice, gmtCreate, targetModel);
                }
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Query a specific voice

Retrieves detailed information about a specific voice by its name.

URL

China (Beijing):

POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

International (Singapore):

POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization

Request header

Parameter	Type	Required	Description
Authorization	string	Yes	An authentication token. The format is `Bearer <your_api_key>`. Replace "`<your_api_key>`" with your actual API key.
Content-Type	string	Supported	The media type of the data in the request body. The value is fixed to `application/json`.

Request body
The request body contains all request parameters. You can omit optional fields as needed.
Important
model: The voice design model. The value is fixed at qwen-voice-design. Do not modify this value.
```
{
    "model": "qwen-voice-design",
    "input": {
        "action": "query",
        "voice": "voiceName"
    }
}
```

Request parameters

Parameter	Type	Default	Required	Description
model	string	-	Supported	The voice design model. Fixed value: `qwen-voice-design`.
action	string	-	Supported	The operation type. Fixed value: `query`.
voice	string	-	Supported	The name of the voice to query.

Response parameters

Click to view a response example

Data found

{
    "output": {
        "gmt_create": "2025-12-10 14:54:09",
        "gmt_modified": "2025-12-10 17:47:48",
        "language": "en",
        "preview_text": "Dear listeners, hello everyone",
        "target_model": "qwen3-tts-vd-realtime-2025-12-16",
        "voice": "yourVoice",
        "voice_prompt": "A calm middle-aged male announcer with a deep, rich, and magnetic voice, steady speaking speed, and clear articulation, suitable for news broadcasting or documentary narration."
    },
    "usage": {},
    "request_id": "yourRequestId"
}

No data found

When the queried voice does not exist, the API returns an HTTP 400 status code, and the response body contains the VoiceNotFound error code.

{
    "request_id":"yourRequestId",
    "code":"VoiceNotFound",
    "message":"Voice not found: qwen-tts-vd-announcer-voice-xxxx"
}

The key parameters are:

Parameter	Type	Description
voice	string	The voice name. You can use it directly as the `voice` parameter in the speech synthesis API.
target_model	string	The speech synthesis model that drives the voice. Currently, only qwen3-tts-vd-realtime-2025-12-16 is supported. It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, synthesis will fail.
language	string	Language code. Valid values: `zh` (Chinese), `en` (English), `de` (German), `it` (Italian), `pt` (Portuguese), `es` (Spanish), `ja` (Japanese), `ko` (Korean), `fr` (French), `ru` (Russian).
voice_prompt	string	Voice description.
preview_text	string	Preview text.
gmt_create	string	The time the voice was created.
gmt_modified	string	The time the voice was modified.
request_id	string	Request ID.

Sample code

Important

model: The voice design model. The value is fixed at qwen-voice-design. Do not modify this value.

cURL

If you have not set the API key as an environment variable, replace $DASHSCOPE_API_KEY in the example with your actual API key.

# ======= Important =======
# The URL below is for the Singapore region. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
# API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "query",
        "voice": "voiceName"
    }
}'

Python

import requests
import os

def query_voice(voice_name):
    """
    Query information for a specific voice
    :param voice_name: The name of the voice
    :return: A dictionary with voice information, or None if not found
    """
    # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you haven't set an environment variable, replace the line below with: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")
    
    # Prepare request data
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    
    data = {
        "model": "qwen-voice-design",
        "input": {
            "action": "query",
            "voice": voice_name
        }
    }
    
    # URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    # Send request
    response = requests.post(
        url,
        headers=headers,
        json=data
    )
    
    if response.status_code == 200:
        result = response.json()
        
        # Check for error message
        if "code" in result and result["code"] == "VoiceNotFound":
            print(f"Voice not found: {voice_name}")
            print(f"Error message: {result.get('message', 'Voice not found')}")
            return None
        
        # Get voice information
        voice_info = result["output"]
        print(f"Successfully queried voice information:")
        print(f"  Voice Name: {voice_info.get('voice')}")
        print(f"  Created: {voice_info.get('gmt_create')}")
        print(f"  Modified: {voice_info.get('gmt_modified')}")
        print(f"  Language: {voice_info.get('language')}")
        print(f"  Preview Text: {voice_info.get('preview_text')}")
        print(f"  Model: {voice_info.get('target_model')}")
        print(f"  Voice Prompt: {voice_info.get('voice_prompt')}")
        
        return voice_info
    else:
        print(f"Request failed. Status code: {response.status_code}")
        print(f"Response: {response.text}")
        return None

def main():
    # Example: Query a voice
    voice_name = "myvoice"  # Replace with the actual voice name you want to query
    
    print(f"Querying voice: {voice_name}")
    voice_info = query_voice(voice_name)
    
    if voice_info:
        print("\nVoice query successful!")
    else:
        print("\nVoice query failed or voice does not exist.")

if __name__ == "__main__":
    main()

Java

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;

public class Main {

    public static void main(String[] args) {
        Main example = new Main();
        // Example: Query a voice
        String voiceName = "myvoice"; // Replace with the actual voice name you want to query
        System.out.println("Querying voice: " + voiceName);
        example.queryVoice(voiceName);
    }

    public void queryVoice(String voiceName) {
        // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        // If you haven't set an environment variable, replace the line below with: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        // Create the JSON request body string
        String jsonBody = "{\n" +
                "    \"model\": \"qwen-voice-design\",\n" +
                "    \"input\": {\n" +
                "        \"action\": \"query\",\n" +
                "        \"voice\": \"" + voiceName + "\"\n" +
                "    }\n" +
                "}";

        HttpURLConnection connection = null;
        try {
            // URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
            URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
            connection = (HttpURLConnection) url.openConnection();

            // Set request method and headers
            connection.setRequestMethod("POST");
            connection.setRequestProperty("Authorization", "Bearer " + apiKey);
            connection.setRequestProperty("Content-Type", "application/json");
            connection.setDoOutput(true);
            connection.setDoInput(true);

            // Send request body
            try (OutputStream os = connection.getOutputStream()) {
                byte[] input = jsonBody.getBytes("UTF-8");
                os.write(input, 0, input.length);
                os.flush();
            }

            // Get response
            int responseCode = connection.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK) {
                // Read response content
                StringBuilder response = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        response.append(responseLine.trim());
                    }
                }

                // Parse JSON response
                JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();

                // Check for error message
                if (jsonResponse.has("code") && "VoiceNotFound".equals(jsonResponse.get("code").getAsString())) {
                    String errorMessage = jsonResponse.has("message") ?
                            jsonResponse.get("message").getAsString() : "Voice not found";
                    System.out.println("Voice not found: " + voiceName);
                    System.out.println("Error message: " + errorMessage);
                    return;
                }

                // Get voice information
                JsonObject outputObj = jsonResponse.getAsJsonObject("output");

                System.out.println("Successfully queried voice information:");
                System.out.println("  Voice Name: " + outputObj.get("voice").getAsString());
                System.out.println("  Created: " + outputObj.get("gmt_create").getAsString());
                System.out.println("  Modified: " + outputObj.get("gmt_modified").getAsString());
                System.out.println("  Language: " + outputObj.get("language").getAsString());
                System.out.println("  Preview Text: " + outputObj.get("preview_text").getAsString());
                System.out.println("  Model: " + outputObj.get("target_model").getAsString());
                System.out.println("  Voice Prompt: " + outputObj.get("voice_prompt").getAsString());

            } else {
                // Read error response
                StringBuilder errorResponse = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        errorResponse.append(responseLine.trim());
                    }
                }

                System.out.println("Request failed. Status code: " + responseCode);
                System.out.println("Error response: " + errorResponse.toString());
            }

        } catch (Exception e) {
            System.err.println("Request error: " + e.getMessage());
            e.printStackTrace();
        } finally {
            if (connection != null) {
                connection.disconnect();
            }
        }
    }
}

Delete voice

Deletes a specified voice and release the corresponding quota.

URL

China (Beijing):

POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

International (Singapore):

POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization

Request header

Parameter	Type	Required	Description
Authorization	string	Yes	An authentication token. The format is `Bearer <your_api_key>`. Replace "`<your_api_key>`" with your actual API key.
Content-Type	string	Supported	The media type of the data in the request body. The value is fixed to `application/json`.

Request body
The request body contains all request parameters. You can omit optional fields as needed:
Important
model: The voice design model. The value is fixed at qwen-voice-design. Do not modify this value.
```
{
    "model": "qwen-voice-design",
    "input": {
        "action": "delete",
        "voice": "yourVoice"
    }
}
```

Request parameters

Parameter	Type	Default	Required	Description
model	string	-	Supported	The voice design model. Fixed value: `qwen-voice-design`.
action	string	-	Supported	The operation type. Fixed value: `delete`.
voice	string	-	Supported	The voice to be deleted.

Response parameters
Click to view a response example
```
{
    "output": {
        "voice": "yourVoice"
    },
    "usage": {},
    "request_id": "yourRequestId"
}
```
The key parameters are:
Parameter
Type
Description
request_id
string
Request ID.
voice
string
The deleted voice.

Sample code

Important

model: The voice design model. The value is fixed at qwen-voice-design. Do not modify this value.

cURL

If you have not set the API key as an environment variable, replace $DASHSCOPE_API_KEY in the example with your actual API key.

# ======= Important =======
# The URL below is for the Singapore region. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
# API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# === Delete this comment before execution ===

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "delete",
        "voice": "yourVoice"
    }
}'

Python

import requests
import os

def delete_voice(voice_name):
    """
    Delete a specified voice
    :param voice_name: The name of the voice
    :return: True if deletion is successful or the voice does not exist but the request succeeds, False if the operation fails
    """
    # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you haven't set an environment variable, replace the line below with: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")
    
    # Prepare request data
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    
    data = {
        "model": "qwen-voice-design",
        "input": {
            "action": "delete",
            "voice": voice_name
        }
    }
    
    # URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    # Send request
    response = requests.post(
        url,
        headers=headers,
        json=data
    )
    
    if response.status_code == 200:
        result = response.json()
        
        # Check for error message
        if "code" in result and "VoiceNotFound" in result["code"]:
            print(f"Voice does not exist: {voice_name}")
            print(f"Error message: {result.get('message', 'Voice not found')}")
            return True  # Voice not existing is also a successful operation (target is already gone)
        
        # Check if deletion was successful
        if "usage" in result:
            print(f"Voice deleted successfully: {voice_name}")
            print(f"Request ID: {result.get('request_id', 'N/A')}")
            return True
        else:
            print(f"Delete operation returned an unexpected format: {result}")
            return False
    else:
        print(f"Delete voice request failed. Status code: {response.status_code}")
        print(f"Response: {response.text}")
        return False

def main():
    # Example: Delete a voice
    voice_name = "myvoice"  # Replace with the actual voice name you want to delete
    
    print(f"Deleting voice: {voice_name}")
    success = delete_voice(voice_name)
    
    if success:
        print(f"\nVoice '{voice_name}' deletion operation complete!")
    else:
        print(f"\nVoice '{voice_name}' deletion operation failed!")

if __name__ == "__main__":
    main()

Java

import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;

public class Main {

    public static void main(String[] args) {
        Main example = new Main();
        // Example: Delete a voice
        String voiceName = "myvoice"; // Replace with the actual voice name you want to delete
        System.out.println("Deleting voice: " + voiceName);
        example.deleteVoice(voiceName);
    }

    public void deleteVoice(String voiceName) {
        // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        // If you haven't set an environment variable, replace the line below with: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        // Create the JSON request body string
        String jsonBody = "{\n" +
                "    \"model\": \"qwen-voice-design\",\n" +
                "    \"input\": {\n" +
                "        \"action\": \"delete\",\n" +
                "        \"voice\": \"" + voiceName + "\"\n" +
                "    }\n" +
                "}";

        HttpURLConnection connection = null;
        try {
            // URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
            URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
            connection = (HttpURLConnection) url.openConnection();

            // Set request method and headers
            connection.setRequestMethod("POST");
            connection.setRequestProperty("Authorization", "Bearer " + apiKey);
            connection.setRequestProperty("Content-Type", "application/json");
            connection.setDoOutput(true);
            connection.setDoInput(true);

            // Send request body
            try (OutputStream os = connection.getOutputStream()) {
                byte[] input = jsonBody.getBytes("UTF-8");
                os.write(input, 0, input.length);
                os.flush();
            }

            // Get response
            int responseCode = connection.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK) {
                // Read response content
                StringBuilder response = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        response.append(responseLine.trim());
                    }
                }

                // Parse JSON response
                JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();

                // Check for error message
                if (jsonResponse.has("code") && jsonResponse.get("code").getAsString().contains("VoiceNotFound")) {
                    String errorMessage = jsonResponse.has("message") ?
                            jsonResponse.get("message").getAsString() : "Voice not found";
                    System.out.println("Voice does not exist: " + voiceName);
                    System.out.println("Error message: " + errorMessage);
                    // Voice not existing is also a successful operation (target is already gone)
                } else if (jsonResponse.has("usage")) {
                    // Check if deletion was successful
                    System.out.println("Voice deleted successfully: " + voiceName);
                    String requestId = jsonResponse.has("request_id") ?
                            jsonResponse.get("request_id").getAsString() : "N/A";
                    System.out.println("Request ID: " + requestId);
                } else {
                    System.out.println("Delete operation returned an unexpected format: " + response.toString());
                }

            } else {
                // Read error response
                StringBuilder errorResponse = new StringBuilder();
                try (BufferedReader br = new BufferedReader(
                        new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                    String responseLine;
                    while ((responseLine = br.readLine()) != null) {
                        errorResponse.append(responseLine.trim());
                    }
                }

                System.out.println("Delete voice request failed. Status code: " + responseCode);
                System.out.println("Error response: " + errorResponse.toString());
            }

        } catch (Exception e) {
            System.err.println("Request error: " + e.getMessage());
            e.printStackTrace();
        } finally {
            if (connection != null) {
                connection.disconnect();
            }
        }
    }
}

Speech synthesis

To learn how to use a custom voice from voice design for personalized speech synthesis, see Getting started: From voice design to speech synthesis.

Speech synthesis models for voice design, such as qwen3-tts-vd-realtime-2025-12-16, are specialized. They only support voices generated by voice design and do not support public preset voices such as Chelsie, Serena, Ethan, or Cherry.

Voice quota and auto-cleanup

Total limit: 1,000 voices per account
You can check the number of voices (total_count) by calling the List voices
Auto-cleanup: If a voice has not been used in any speech synthesis request in the past year, the system automatically deletes it.

Billing

Voice design and speech synthesis are billed separately:

Voice design: Creating a voice is billed at $0.2 per voice. Failed creations are not billed.
Note
Free quota details (Singapore region only):
- Within 90 days of activating Alibaba Cloud Model Studio, you receive 1,000 free voice creation opportunities.
- Failed creations do not consume the free quota.
- Deleting a voice does not restore the free quota.
- After the free quota is used up or the 90-day validity period expires, voice creation is billed at $0.2 per voice.
Speech synthesis using custom voices from voice design: Billed based on the number of text characters. For details, see Real-time speech synthesis - Qwen.

Error messages

If you encounter an error, see Error messages for troubleshooting.