All Products
Search
Document Center

Alibaba Cloud Model Studio:Qwen-TTS voice design API reference

Last Updated:Dec 17, 2025

Voice design generates custom voices from text descriptions. It supports multi-language and multi-dimensional voice characteristics, making it suitable for applications such as ad voiceovers, character creation, and audiobook production. Voice design and speech synthesis are two sequential steps. This document focuses on the parameters and interface details of voice design. For more information about speech synthesis, see Real-time speech synthesis - Qwen.

User guide: For model introduction and selection recommendations, see Real-time speech synthesis - Qwen.

Supported languages

Voice design supports voice creation and speech synthesis in multiple languages, including the following: Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), Russian (ru).

How to write high-quality voice descriptions

Limitations

When writing a voice description (voice_prompt), adhere to the following constraints:

  • Length limit: voice_prompt must not exceed 2048 characters.

  • Supported languages: The description text can only be in Chinese or English.

Core principles

A high-quality voice description (voice_prompt) is key to creating your ideal voice. It acts as a blueprint that directly guides the model to generate a voice with specific characteristics.

Follow these core principles when describing a voice:

  1. Be specific, not vague: Use words that clearly describe vocal traits, such as "deep," "crisp," or "fast-paced." Avoid subjective and uninformative terms such as "nice-sounding" or "ordinary."

  2. Be multi-dimensional, not single-dimensional: Effective descriptions combine multiple dimensions, such as gender, age, and emotion, as described below. A single-dimension description, such as "female voice," is too broad to produce a distinctive voice.

  3. Be objective, not subjective: Focus on the physical and perceptual features of the voice itself, not personal preferences. For example, use "high-pitched and energetic" instead of "my favorite voice."

  4. Be original, not imitative: Describe vocal traits rather than requesting the imitation of specific people, such as celebrities. Such requests involve copyright risks, and the model does not support direct imitation.

  5. Be concise, not redundant: Ensure every word adds meaning. Avoid repeating synonyms or using meaningless intensifiers, such as "very very nice voice."

Description dimensions reference

Dimension

Example descriptions

Gender

Male, female, neutral

Age

Child (5–12 years), teen (13–18 years), young adult (19–35 years), middle-aged (36–55 years), senior (55+ years)

Pitch

High, mid, low, slightly high, slightly low

Speaking rate

Fast, medium, slow, slightly fast, slightly slow

Emotion

Cheerful, calm, gentle, serious, lively, composed, soothing

Characteristics

Magnetic, crisp, raspy, smooth, sweet, rich, powerful

Use case

News broadcast, ad voiceover, audiobook, animated character, voice assistant, documentary narration

Example comparison

✅ Recommended examples

  • "A young, lively female voice with a fast speaking rate and noticeably rising intonation, suitable for introducing fashion products."

    Analysis: This description combines age, personality, speaking rate, and intonation, and specifies a use case, creating a vivid and clear image.

  • "A calm middle-aged male voice with a slow speaking rate, deep and magnetic tone, ideal for news reading or documentary narration."

    Analysis: This description clearly defines gender, age range, speaking rate, tonal qualities, and application domain.

  • "A cute child’s voice, approximately an 8-year-old girl, with a slightly childish tone, perfect for animated character dubbing."

    Analysis: This description specifies an exact age and vocal trait ("childish"), with a clear purpose.

  • "A gentle and intellectual female voice, around 30 years old, with a calm tone, suitable for audiobook narration."

    Analysis: This description effectively conveys emotional and stylistic qualities through words such as "intellectual" and "calm."

❌ Not recommended examples and suggestions

Example

Main issue

Suggestion

Nice-sounding voice

Too vague and subjective. Lacks actionable features.

Add specific dimensions, for example, "a clear-toned young female voice with a gentle intonation."

Sounds like a certain celebrity

Involves copyright risk. The model cannot directly imitate a specific person.

Describe the vocal traits instead, for example, "a mature, magnetic male voice with a steady pace."

Very very very nice female voice

Redundant. Repeated words do not help define the voice.

Remove repetition and add meaningful descriptors, for example, "a 20–24-year-old female voice with a light, upbeat tone and sweet timbre."

123456

Invalid input. It cannot be parsed as voice characteristics.

Provide meaningful text descriptions. For more information, see the recommended examples above.

Getting started: From voice design to speech synthesis

image

1. Workflow

Voice design and speech synthesis are two closely linked but independent steps that follow a "create first, then use" workflow:

  1. Prepare the voice description and preview text for voice design.

    • Voice description (voice_prompt): Defines the target voice characteristics. For guidance, see "How to write high-quality voice descriptions."

    • Preview text (preview_text): The text that the preview audio will read aloud, for example, "Hello everyone, welcome to the show."

  2. Call the Create voice API to generate a custom voice and get its name and preview audio.

    In this step, you must specify target_model to declare which speech synthesis model will drive the created voice.

    Listen to the preview audio to evaluate if it meets your expectations. If it does, proceed. If not, redesign the voice.

    If you already have a created voice, which you can verify using the List voices API, you can skip this step and proceed to the next one.

  3. Use the voice for speech synthesis.

    Call the speech synthesis API and pass the voice obtained in the previous step. The speech synthesis model used here must match the target_model specified in the previous step.

2. Model configurations and preparations

Select the appropriate model and complete the setup tasks.

Model configurations

Specify the following two models during voice design:

  • Voice design model: qwen-voice-design

  • Speech synthesis model that drives the voice: Currently, only qwen3-tts-vd-realtime-2025-12-16 is supported.

Preparations

  1. Get an API key: Get and Configure an API Key. For security, store your API key in an environment variable.

  2. Install the SDK: Install the latest DashScope SDK.

3. Sample Code

  1. Generate a custom voice and listen to the preview. If you are satisfied, proceed. Otherwise, regenerate the voice.

    Python

    import requests
    import base64
    import os
    
    def create_voice_and_play():
        # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        # If you haven't set an environment variable, replace the line below with: api_key = "sk-xxx"
        api_key = os.getenv("DASHSCOPE_API_KEY")
        
        if not api_key:
            print("Error: DASHSCOPE_API_KEY environment variable not found. Please set your API key.")
            return None, None, None
        
        # Prepare request data
        headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }
        
        data = {
            "model": "qwen-voice-design",
            "input": {
                "action": "create",
                "target_model": "qwen3-tts-vd-realtime-2025-12-16",
                "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.",
                "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
                "preferred_name": "announcer",
                "language": "en"
            },
            "parameters": {
                "sample_rate": 24000,
                "response_format": "wav"
            }
        }
        
        # URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
        
        try:
            # Send request
            response = requests.post(
                url,
                headers=headers,
                json=data,
                timeout=60  # Add timeout setting
            )
            
            if response.status_code == 200:
                result = response.json()
                
                # Get voice name
                voice_name = result["output"]["voice"]
                print(f"Voice name: {voice_name}")
                
                # Get preview audio data
                base64_audio = result["output"]["preview_audio"]["data"]
                
                # Decode Base64 audio data
                audio_bytes = base64.b64decode(base64_audio)
                
                # Save audio file locally
                filename = f"{voice_name}_preview.wav"
                
                # Write audio data to local file
                with open(filename, 'wb') as f:
                    f.write(audio_bytes)
                
                print(f"Audio saved to local file: {filename}")
                print(f"File path: {os.path.abspath(filename)}")
                
                return voice_name, audio_bytes, filename
            else:
                print(f"Request failed. Status code: {response.status_code}")
                print(f"Response: {response.text}")
                return None, None, None
                
        except requests.exceptions.RequestException as e:
            print(f"Network request error: {e}")
            return None, None, None
        except KeyError as e:
            print(f"Response format error: missing required field: {e}")
            print(f"Response: {response.text if 'response' in locals() else 'No response'}")
            return None, None, None
        except Exception as e:
            print(f"Unexpected error: {e}")
            return None, None, None
    
    if __name__ == "__main__":
        print("Creating voice...")
        voice_name, audio_data, saved_filename = create_voice_and_play()
        
        if voice_name:
            print(f"\nSuccessfully created voice '{voice_name}'")
            print(f"Audio file saved: '{saved_filename}'")
            print(f"File size: {os.path.getsize(saved_filename)} bytes")
        else:
            print("\nVoice creation failed")

    Java

    You need to import the Gson dependency. If you use Maven or Gradle, add the dependency:

    Maven

    Add the following content to pom.xml:

    <!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
    <dependency>
        <groupId>com.google.code.gson</groupId>
        <artifactId>gson</artifactId>
        <version>2.13.1</version>
    </dependency>

    Gradle

    Add the following content to build.gradle:

    // https://mvnrepository.com/artifact/com.google.code.gson/gson
    implementation("com.google.code.gson:gson:2.13.1")
    import com.google.gson.JsonObject;
    import com.google.gson.JsonParser;
    import java.io.*;
    import java.net.HttpURLConnection;
    import java.net.URL;
    import java.util.Base64;
    
    public class Main {
        public static void main(String[] args) {
            Main example = new Main();
            example.createVoice();
        }
    
        public void createVoice() {
            // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
            // If you haven't set an environment variable, replace the line below with: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
    
            // Create the JSON request body string
            String jsonBody = "{\n" +
                    "    \"model\": \"qwen-voice-design\",\n" +
                    "    \"input\": {\n" +
                    "        \"action\": \"create\",\n" +
                    "        \"target_model\": \"qwen3-tts-vd-realtime-2025-12-16\",\n" +
                    "        \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" +
                    "        \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" +
                    "        \"preferred_name\": \"announcer\",\n" +
                    "        \"language\": \"en\"\n" +
                    "    },\n" +
                    "    \"parameters\": {\n" +
                    "        \"sample_rate\": 24000,\n" +
                    "        \"response_format\": \"wav\"\n" +
                    "    }\n" +
                    "}";
    
            HttpURLConnection connection = null;
            try {
                // URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
                URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
                connection = (HttpURLConnection) url.openConnection();
    
                // Set request method and headers
                connection.setRequestMethod("POST");
                connection.setRequestProperty("Authorization", "Bearer " + apiKey);
                connection.setRequestProperty("Content-Type", "application/json");
                connection.setDoOutput(true);
                connection.setDoInput(true);
    
                // Send request body
                try (OutputStream os = connection.getOutputStream()) {
                    byte[] input = jsonBody.getBytes("UTF-8");
                    os.write(input, 0, input.length);
                    os.flush();
                }
    
                // Get response
                int responseCode = connection.getResponseCode();
                if (responseCode == HttpURLConnection.HTTP_OK) {
                    // Read response content
                    StringBuilder response = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            response.append(responseLine.trim());
                        }
                    }
    
                    // Parse JSON response
                    JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
                    JsonObject outputObj = jsonResponse.getAsJsonObject("output");
                    JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio");
    
                    // Get voice name
                    String voiceName = outputObj.get("voice").getAsString();
                    System.out.println("Voice name: " + voiceName);
    
                    // Get Base64-encoded audio data
                    String base64Audio = previewAudioObj.get("data").getAsString();
    
                    // Decode Base64 audio data
                    byte[] audioBytes = Base64.getDecoder().decode(base64Audio);
    
                    // Save audio to a local file
                    String filename = voiceName + "_preview.wav";
                    saveAudioToFile(audioBytes, filename);
    
                    System.out.println("Audio saved to local file: " + filename);
    
                } else {
                    // Read error response
                    StringBuilder errorResponse = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            errorResponse.append(responseLine.trim());
                        }
                    }
    
                    System.out.println("Request failed. Status code: " + responseCode);
                    System.out.println("Error response: " + errorResponse.toString());
                }
    
            } catch (Exception e) {
                System.err.println("Request error: " + e.getMessage());
                e.printStackTrace();
            } finally {
                if (connection != null) {
                    connection.disconnect();
                }
            }
        }
    
        private void saveAudioToFile(byte[] audioBytes, String filename) {
            try {
                File file = new File(filename);
                try (FileOutputStream fos = new FileOutputStream(file)) {
                    fos.write(audioBytes);
                }
                System.out.println("Audio saved to: " + file.getAbsolutePath());
            } catch (IOException e) {
                System.err.println("Error saving audio file: " + e.getMessage());
                e.printStackTrace();
            }
        }
    }
  2. Use the custom voice generated in the previous step for speech synthesis.

    This example is based on the "server commit mode" of the DashScope SDK for speech synthesis using a system voice. Replace the voice parameter with the custom voice generated by voice design.

    Key Principle: The model used during voice design (target_model) must be the same as the model used for subsequent speech synthesis (model). Otherwise, the synthesis will fail.

    Python

    # coding=utf-8
    # Installation instructions for pyaudio:
    # APPLE Mac OS X
    #   brew install portaudio
    #   pip install pyaudio
    # Debian/Ubuntu
    #   sudo apt-get install python-pyaudio python3-pyaudio
    #   or
    #   pip install pyaudio
    # CentOS
    #   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
    # Microsoft Windows
    #   python -m pip install pyaudio
    
    import pyaudio
    import os
    import base64
    import threading
    import time
    import dashscope  # DashScope Python SDK version 1.23.9 or later is required
    from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat
    
    # ======= Constant Configuration =======
    TEXT_TO_SYNTHESIZE = [
        'Right? I just love this kind of supermarket,',
        'especially during the New Year.',
        'Going to the supermarket',
        'just makes me feel',
        'super, super happy!',
        'I want to buy so many things!'
    ]
    
    def init_dashscope_api_key():
        """
        Initializes the DashScope SDK API key
        """
        # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        # If you haven't set an environment variable, replace the line below with: dashscope.api_key = "sk-xxx"
        dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
    
    # ======= Callback Class =======
    class MyCallback(QwenTtsRealtimeCallback):
        """
        Custom TTS streaming callback
        """
        def __init__(self):
            self.complete_event = threading.Event()
            self._player = pyaudio.PyAudio()
            self._stream = self._player.open(
                format=pyaudio.paInt16, channels=1, rate=24000, output=True
            )
    
        def on_open(self) -> None:
            print('[TTS] Connection established')
    
        def on_close(self, close_status_code, close_msg) -> None:
            self._stream.stop_stream()
            self._stream.close()
            self._player.terminate()
            print(f'[TTS] Connection closed, code={close_status_code}, msg={close_msg}')
    
        def on_event(self, response: dict) -> None:
            try:
                event_type = response.get('type', '')
                if event_type == 'session.created':
                    print(f'[TTS] Session started: {response["session"]["id"]}')
                elif event_type == 'response.audio.delta':
                    audio_data = base64.b64decode(response['delta'])
                    self._stream.write(audio_data)
                elif event_type == 'response.done':
                    print(f'[TTS] Response complete, Response ID: {qwen_tts_realtime.get_last_response_id()}')
                elif event_type == 'session.finished':
                    print('[TTS] Session finished')
                    self.complete_event.set()
            except Exception as e:
                print(f'[Error] Exception processing callback event: {e}')
    
        def wait_for_finished(self):
            self.complete_event.wait()
    
    # ======= Main Execution Logic =======
    if __name__ == '__main__':
        init_dashscope_api_key()
        print('[System] Initializing Qwen TTS Realtime ...')
    
        callback = MyCallback()
        qwen_tts_realtime = QwenTtsRealtime(
            # Voice design and speech synthesis must use the same model
            model="qwen3-tts-vd-realtime-2025-12-16",
            callback=callback,
            # URL for Singapore region. For Beijing region, use: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
            url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
        )
        qwen_tts_realtime.connect()
        
        qwen_tts_realtime.update_session(
            voice="myvoice", # Replace the voice parameter with the custom voice generated by voice design
            response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
            mode='server_commit'
        )
    
        for text_chunk in TEXT_TO_SYNTHESIZE:
            print(f'[Sending text]: {text_chunk}')
            qwen_tts_realtime.append_text(text_chunk)
            time.sleep(0.1)
    
        qwen_tts_realtime.finish()
        callback.wait_for_finished()
    
        print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, '
              f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')

    Java

    import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
    import com.alibaba.dashscope.exception.NoApiKeyException;
    import com.google.gson.JsonObject;
    
    import javax.sound.sampled.*;
    import java.io.*;
    import java.util.Base64;
    import java.util.Queue;
    import java.util.concurrent.CountDownLatch;
    import java.util.concurrent.atomic.AtomicReference;
    import java.util.concurrent.ConcurrentLinkedQueue;
    import java.util.concurrent.atomic.AtomicBoolean;
    
    public class Main {
        // ===== Constant Definitions =====
        private static String[] textToSynthesize = {
                "Right? I just love this kind of supermarket,",
                "especially during the New Year.",
                "Going to the supermarket",
                "just makes me feel",
                "super, super happy!",
                "I want to buy so many things!"
        };
    
        // Real-time audio player class
        public static class RealtimePcmPlayer {
            private int sampleRate;
            private SourceDataLine line;
            private AudioFormat audioFormat;
            private Thread decoderThread;
            private Thread playerThread;
            private AtomicBoolean stopped = new AtomicBoolean(false);
            private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
            private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
    
            // Constructor initializes audio format and audio line
            public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
                this.sampleRate = sampleRate;
                this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
                DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
                line = (SourceDataLine) AudioSystem.getLine(info);
                line.open(audioFormat);
                line.start();
                decoderThread = new Thread(new Runnable() {
                    @Override
                    public void run() {
                        while (!stopped.get()) {
                            String b64Audio = b64AudioBuffer.poll();
                            if (b64Audio != null) {
                                byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
                                RawAudioBuffer.add(rawAudio);
                            } else {
                                try {
                                    Thread.sleep(100);
                                } catch (InterruptedException e) {
                                    throw new RuntimeException(e);
                                }
                            }
                        }
                    }
                });
                playerThread = new Thread(new Runnable() {
                    @Override
                    public void run() {
                        while (!stopped.get()) {
                            byte[] rawAudio = RawAudioBuffer.poll();
                            if (rawAudio != null) {
                                try {
                                    playChunk(rawAudio);
                                } catch (IOException e) {
                                    throw new RuntimeException(e);
                                } catch (InterruptedException e) {
                                    throw new RuntimeException(e);
                                }
                            } else {
                                try {
                                    Thread.sleep(100);
                                } catch (InterruptedException e) {
                                    throw new RuntimeException(e);
                                }
                            }
                        }
                    }
                });
                decoderThread.start();
                playerThread.start();
            }
    
            // Plays an audio chunk and blocks until playback is complete
            private void playChunk(byte[] chunk) throws IOException, InterruptedException {
                if (chunk == null || chunk.length == 0) return;
    
                int bytesWritten = 0;
                while (bytesWritten < chunk.length) {
                    bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
                }
                int audioLength = chunk.length / (this.sampleRate*2/1000);
                // Wait for the audio in the buffer to finish playing
                Thread.sleep(audioLength - 10);
            }
    
            public void write(String b64Audio) {
                b64AudioBuffer.add(b64Audio);
            }
    
            public void cancel() {
                b64AudioBuffer.clear();
                RawAudioBuffer.clear();
            }
    
            public void waitForComplete() throws InterruptedException {
                while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
                    Thread.sleep(100);
                }
                line.drain();
            }
    
            public void shutdown() throws InterruptedException {
                stopped.set(true);
                decoderThread.join();
                playerThread.join();
                if (line != null && line.isRunning()) {
                    line.drain();
                    line.close();
                }
            }
        }
    
        public static void main(String[] args) throws Exception {
            QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
                    // Voice design and speech synthesis must use the same model
                    .model("qwen3-tts-vd-realtime-2025-12-16")
                    // URL for Singapore region. For Beijing region, use: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
                    .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                    // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                    // If you haven't set an environment variable, replace the line below with: .apikey("sk-xxx")
                    .apikey(System.getenv("DASHSCOPE_API_KEY"))
                    .build();
            AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
            final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
    
            // Create a real-time audio player instance
            RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
    
            QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
                @Override
                public void onOpen() {
                    // Handle connection open
                }
                @Override
                public void onEvent(JsonObject message) {
                    String type = message.get("type").getAsString();
                    switch(type) {
                        case "session.created":
                            // Handle session creation
                            break;
                        case "response.audio.delta":
                            String recvAudioB64 = message.get("delta").getAsString();
                            // Play audio in real time
                            audioPlayer.write(recvAudioB64);
                            break;
                        case "response.done":
                            // Handle response completion
                            break;
                        case "session.finished":
                            // Handle session finish
                            completeLatch.get().countDown();
                        default:
                            break;
                    }
                }
                @Override
                public void onClose(int code, String reason) {
                    // Handle connection close
                }
            });
            qwenTtsRef.set(qwenTtsRealtime);
            try {
                qwenTtsRealtime.connect();
            } catch (NoApiKeyException e) {
                throw new RuntimeException(e);
            }
            QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
                    .voice("myvoice") // Replace the voice parameter with the custom voice generated by voice design
                    .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
                    .mode("server_commit")
                    .build();
            qwenTtsRealtime.updateSession(config);
            for (String text:textToSynthesize) {
                qwenTtsRealtime.appendText(text);
                Thread.sleep(100);
            }
            qwenTtsRealtime.finish();
            completeLatch.get().await();
    
            // Wait for audio playback to complete and then shut down the player
            audioPlayer.waitForComplete();
            audioPlayer.shutdown();
            System.exit(0);
        }
    }

API reference

Ensure that you use the same account when calling different APIs.

Create voice

Creates a custom voice by providing a voice description and preview text.

  • URL

    China (Beijing):

    POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

    International (Singapore):

    POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
  • Request headers

    Parameter

    Type

    Required

    Description

    Authorization

    string

    Yes

    Authentication token. Format: Bearer <your_api_key>. Replace "<your_api_key>" with your actual API key.

    Content-Type

    string

    Supported

    The media type of the data transmitted in the request body. Fixed value: application/json.

  • Request body

    The request body contains all request parameters. Omit optional fields as needed.

    Important

    Note the difference between the following parameters:

    • model: The voice design model. The value is fixed at qwen-voice-design.

    • target_model: The speech synthesis model that drives the voice. It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, the synthesis will fail.

    {
        "model": "qwen-voice-design",
        "input": {
            "action": "create",
            "target_model": "qwen3-tts-vd-realtime-2025-12-16",
            "voice_prompt": "A calm middle-aged male announcer with a deep, rich, and magnetic voice, steady speaking speed, and clear articulation, suitable for news broadcasting or documentary narration.",
            "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
            "preferred_name": "announcer",
            "language": "en"
        },
        "parameters": {
            "sample_rate": 24000,
            "response_format": "wav"
        }
    }
  • Request parameters

    Parameter

    Type

    Default

    Required

    Description

    model

    string

    -

    Supported

    The voice design model. Fixed value: qwen-voice-design.

    action

    string

    -

    Yes

    The operation type. Fixed value: create.

    target_model

    string

    -

    Yes

    The speech synthesis model that drives the voice. Currently, only qwen3-tts-vd-realtime-2025-12-16 is supported.

    It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, synthesis will fail.

    voice_prompt

    string

    -

    Supported

    Voice description. Maximum length: 2048 characters.

    Only Chinese and English are supported.

    For guidance on writing voice descriptions, see "How to write high-quality voice descriptions".

    preview_text

    string

    -

    Yes

    The text for the preview audio. Maximum length: 1024 characters.

    Supports Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), Russian (ru).

    preferred_name

    string

    -

    Supported

    Assign an easy-to-identify name to the voice (only numbers, letters, and underscores are allowed; max 16 characters). We recommend using an identifier related to the character or scenario.

    The keyword will appear in the designed voice name. For example, if the keyword is "announcer", the final voice name will be "qwen-tts-vd-announcer-voice-20251201102800-a1b2"

    language

    string

    zh

    No

    Language code. Specifies the language preference for the generated voice. This parameter affects the linguistic features and pronunciation tendencies of the voice. Choose the code that matches your use case.

    If you use this parameter, the language must match the language of the preview_text.

    Valid values: zh (Chinese), en (English), de (German), it (Italian), pt (Portuguese), es (Spanish), ja (Japanese), ko (Korean), fr (French), ru (Russian).

    sample_rate

    int

    24000

    No

    The sample rate (in Hz) of the preview audio generated by voice design.

    Valid values:

    • 8000

    • 16000

    • 24000

    • 48000

    response_format

    string

    wav

    No

    The format of the preview audio generated by voice design.

    Valid values:

    • pcm

    • wav

    • mp3

    • opus

  • Response parameters

    Click to view a response example

    {
        "output": {
            "preview_audio": {
                "data": "{base64_encoded_audio}",
                "sample_rate": 24000,
                "response_format": "wav"
            },
            "target_model": "qwen3-tts-vd-realtime-2025-12-16",
            "voice": "yourVoice"
        },
        "usage": {
            "count": 1
        },
        "request_id": "yourRequestId"
    }

    The key parameters are:

    Parameter

    Type

    Description

    voice

    string

    The voice name. You can use it directly as the voice parameter in the speech synthesis API.

    data

    string

    The preview audio data generated by voice design, returned as a Base64-encoded string.

    sample_rate

    int

    The sample rate (in Hz) of the preview audio generated by voice design. It matches the sample rate set during voice creation. The default is 24000 Hz.

    response_format

    string

    The format of the preview audio generated by voice design. It matches the audio format set during voice creation. The default is wav.

    target_model

    string

    The speech synthesis model that drives the voice. Currently, only qwen3-tts-vd-realtime-2025-12-16 is supported.

    It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, synthesis will fail.

    request_id

    string

    Request ID.

    count

    integer

    The number of "Create voice" operations billed for this request. The cost for this request is $ .

    For voice creation, count is always 1.

  • Sample code

    Important

    Note the difference between the following parameters:

    • model: The voice design model. The value is fixed at qwen-voice-design.

    • target_model: The speech synthesis model that drives the voice. It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, the synthesis will fail.

    cURL

    If you have not set your API key as an environment variable, replace $DASHSCOPE_API_KEY in the example with your actual API key.

    # ======= Important =======
    # The URL below is for the Singapore region. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # === Delete this comment before execution ===
    
    curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-voice-design",
        "input": {
            "action": "create",
            "target_model": "qwen3-tts-vd-realtime-2025-12-16",
            "voice_prompt": "A calm middle-aged male announcer with a deep, rich, and magnetic voice, steady speaking speed, and clear articulation, suitable for news broadcasting or documentary narration.",
            "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
            "preferred_name": "announcer",
            "language": "en"
        },
        "parameters": {
            "sample_rate": 24000,
            "response_format": "wav"
        }
    }'

    Python

    import requests
    import base64
    import os
    
    def create_voice_and_play():
        # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        # If you haven't set an environment variable, replace the line below with: api_key = "sk-xxx"
        api_key = os.getenv("DASHSCOPE_API_KEY")
        
        if not api_key:
            print("Error: DASHSCOPE_API_KEY environment variable not found. Please set your API key.")
            return None, None, None
        
        # Prepare request data
        headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }
        
        data = {
            "model": "qwen-voice-design",
            "input": {
                "action": "create",
                "target_model": "qwen3-tts-vd-realtime-2025-12-16",
                "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.",
                "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
                "preferred_name": "announcer",
                "language": "en"
            },
            "parameters": {
                "sample_rate": 24000,
                "response_format": "wav"
            }
        }
        
        # URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
        
        try:
            # Send request
            response = requests.post(
                url,
                headers=headers,
                json=data,
                timeout=60  # Add timeout setting
            )
            
            if response.status_code == 200:
                result = response.json()
                
                # Get voice name
                voice_name = result["output"]["voice"]
                print(f"Voice name: {voice_name}")
                
                # Get preview audio data
                base64_audio = result["output"]["preview_audio"]["data"]
                
                # Decode Base64 audio data
                audio_bytes = base64.b64decode(base64_audio)
                
                # Save audio file locally
                filename = f"{voice_name}_preview.wav"
                
                # Write audio data to local file
                with open(filename, 'wb') as f:
                    f.write(audio_bytes)
                
                print(f"Audio saved to local file: {filename}")
                print(f"File path: {os.path.abspath(filename)}")
                
                return voice_name, audio_bytes, filename
            else:
                print(f"Request failed. Status code: {response.status_code}")
                print(f"Response: {response.text}")
                return None, None, None
                
        except requests.exceptions.RequestException as e:
            print(f"Network request error: {e}")
            return None, None, None
        except KeyError as e:
            print(f"Response format error: missing required field: {e}")
            print(f"Response: {response.text if 'response' in locals() else 'No response'}")
            return None, None, None
        except Exception as e:
            print(f"Unexpected error: {e}")
            return None, None, None
    
    if __name__ == "__main__":
        print("Creating voice...")
        voice_name, audio_data, saved_filename = create_voice_and_play()
        
        if voice_name:
            print(f"\nSuccessfully created voice '{voice_name}'")
            print(f"Audio file saved: '{saved_filename}'")
            print(f"File size: {os.path.getsize(saved_filename)} bytes")
        else:
            print("\nVoice creation failed")

    Java

    import com.google.gson.JsonObject;
    import com.google.gson.JsonParser;
    import java.io.*;
    import java.net.HttpURLConnection;
    import java.net.URL;
    import java.util.Base64;
    
    public class Main {
        public static void main(String[] args) {
            Main example = new Main();
            example.createVoice();
        }
    
        public void createVoice() {
            // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
            // If you haven't set an environment variable, replace the line below with: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
    
            // Create the JSON request body string
            String jsonBody = "{\n" +
                    "    \"model\": \"qwen-voice-design\",\n" +
                    "    \"input\": {\n" +
                    "        \"action\": \"create\",\n" +
                    "        \"target_model\": \"qwen3-tts-vd-realtime-2025-12-16\",\n" +
                    "        \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" +
                    "        \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" +
                    "        \"preferred_name\": \"announcer\",\n" +
                    "        \"language\": \"en\"\n" +
                    "    },\n" +
                    "    \"parameters\": {\n" +
                    "        \"sample_rate\": 24000,\n" +
                    "        \"response_format\": \"wav\"\n" +
                    "    }\n" +
                    "}";
    
            HttpURLConnection connection = null;
            try {
                // URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
                URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
                connection = (HttpURLConnection) url.openConnection();
    
                // Set request method and headers
                connection.setRequestMethod("POST");
                connection.setRequestProperty("Authorization", "Bearer " + apiKey);
                connection.setRequestProperty("Content-Type", "application/json");
                connection.setDoOutput(true);
                connection.setDoInput(true);
    
                // Send request body
                try (OutputStream os = connection.getOutputStream()) {
                    byte[] input = jsonBody.getBytes("UTF-8");
                    os.write(input, 0, input.length);
                    os.flush();
                }
    
                // Get response
                int responseCode = connection.getResponseCode();
                if (responseCode == HttpURLConnection.HTTP_OK) {
                    // Read response content
                    StringBuilder response = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            response.append(responseLine.trim());
                        }
                    }
    
                    // Parse JSON response
                    JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
                    JsonObject outputObj = jsonResponse.getAsJsonObject("output");
                    JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio");
    
                    // Get voice name
                    String voiceName = outputObj.get("voice").getAsString();
                    System.out.println("Voice name: " + voiceName);
    
                    // Get Base64-encoded audio data
                    String base64Audio = previewAudioObj.get("data").getAsString();
    
                    // Decode Base64 audio data
                    byte[] audioBytes = Base64.getDecoder().decode(base64Audio);
    
                    // Save audio to a local file
                    String filename = voiceName + "_preview.wav";
                    saveAudioToFile(audioBytes, filename);
    
                    System.out.println("Audio saved to local file: " + filename);
    
                } else {
                    // Read error response
                    StringBuilder errorResponse = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            errorResponse.append(responseLine.trim());
                        }
                    }
    
                    System.out.println("Request failed. Status code: " + responseCode);
                    System.out.println("Error response: " + errorResponse.toString());
                }
    
            } catch (Exception e) {
                System.err.println("Request error: " + e.getMessage());
                e.printStackTrace();
            } finally {
                if (connection != null) {
                    connection.disconnect();
                }
            }
        }
    
        private void saveAudioToFile(byte[] audioBytes, String filename) {
            try {
                File file = new File(filename);
                try (FileOutputStream fos = new FileOutputStream(file)) {
                    fos.write(audioBytes);
                }
                System.out.println("Audio saved to: " + file.getAbsolutePath());
            } catch (IOException e) {
                System.err.println("Error saving audio file: " + e.getMessage());
                e.printStackTrace();
            }
        }
    }

List voices

Performa a paged query to list created voices.

  • URL

    China (Beijing):

    POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

    International (Singapore):

    POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
  • Request header

    Parameter

    Type

    Required

    Description

    Authorization

    string

    Yes

    An authentication token. The format is Bearer <your_api_key>. Replace "<your_api_key>" with your actual API key.

    Content-Type

    string

    Supported

    The media type of the data in the request body. The value is fixed to application/json.

  • Request body

    The request body contains all request parameters. You can omit optional fields as needed.

    Important

    model: The voice design model. The value is fixed at qwen-voice-design. Do not modify this value.

    {
        "model": "qwen-voice-design",
        "input": {
            "action": "list",
            "page_size": 10,
            "page_index": 0
        }
    }
  • Request parameters

    Parameter

    Type

    Default

    Required

    Description

    model

    string

    -

    Supported

    The voice design model. Fixed value: qwen-voice-design.

    action

    string

    -

    Supported

    The operation type. Fixed value: list.

    page_index

    integer

    0

    No

    Page index. Value range: [0, 200].

    page_size

    integer

    10

    No

    The number of data entries per page. Value must be greater than 0.

  • Response parameters

    Click to view a response example

    {
        "output": {
            "page_index": 0,
            "page_size": 2,
            "total_count": 26,
            "voice_list": [
                {
                    "gmt_create": "2025-12-10 17:04:54",
                    "gmt_modified": "2025-12-10 17:04:54",
                    "language": "en",
                    "preview_text": "Dear listeners, hello everyone. Welcome to today's program.",
                    "target_model": "qwen3-tts-vd-realtime-2025-12-16",
                    "voice": "yourVoice1",
                    "voice_prompt": "A calm middle-aged male announcer with a deep, rich, and magnetic voice, steady speaking speed, and clear articulation, suitable for news broadcasting or documentary narration., deep and magnetic, steady speaking speed"
                },
                {
                    "gmt_create": "2025-12-10 15:31:35",
                    "gmt_modified": "2025-12-10 15:31:35",
                    "language": "en",
                    "preview_text": "Dear listeners, hello everyone",
                    "target_model": "qwen3-tts-vd-realtime-2025-12-16",
                    "voice": "yourVoice2",
                    "voice_prompt": "A calm middle-aged male announcer with a deep, rich, and magnetic voice, steady speaking speed, and clear articulation, suitable for news broadcasting or documentary narration."
                }
            ]
        },
        "usage": {},
        "request_id": "yourRequestId"
    }

    The key parameters are:

    Parameter

    Type

    Description

    voice

    string

    The voice name. You can use it directly as the voice parameter in the speech synthesis API.

    target_model

    string

    The speech synthesis model that drives the voice. Currently, only qwen3-tts-vd-realtime-2025-12-16 is supported.

    It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, synthesis will fail.

    language

    string

    Language code.

    Valid values: zh (Chinese), en (English), de (German), it (Italian), pt (Portuguese), es (Spanish), ja (Japanese), ko (Korean), fr (French), ru (Russian).

    voice_prompt

    string

    Voice description.

    preview_text

    string

    Preview text.

    gmt_create

    string

    The time the voice was created.

    gmt_modified

    string

    The time the voice was modified.

    page_index

    integer

    Page index.

    page_size

    integer

    The number of data entries per page.

    total_count

    integer

    The total number of data entries found.

    request_id

    string

    Request ID.

  • Sample code

    Important

    model: The voice design model. The value is fixed at qwen-voice-design. Do not modify this value.

    cURL

    If you have not set the API key as an environment variable, replace $DASHSCOPE_API_KEY in the example with your actual API key.

    # ======= Important =======
    # The URL below is for the Singapore region. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # === Delete this comment before execution ===
    
    curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-voice-design",
        "input": {
            "action": "list",
            "page_size": 10,
            "page_index": 0
        }
    }'

    Python

    import os
    import requests
    
    # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you haven't set an environment variable, replace the line below with: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")
    # URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    
    payload = {
        "model": "qwen-voice-design", # Do not modify this value
        "input": {
            "action": "list",
            "page_size": 10,
            "page_index": 0
        }
    }
    
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    
    response = requests.post(url, json=payload, headers=headers)
    
    print("HTTP Status Code:", response.status_code)
    
    if response.status_code == 200:
        data = response.json()
        voice_list = data["output"]["voice_list"]
    
        print("Queried voice list:")
        for item in voice_list:
            print(f"- Voice: {item['voice']}  Created: {item['gmt_create']}  Model: {item['target_model']}")
    else:
        print("Request failed:", response.text)

    Java

    import com.google.gson.Gson;
    import com.google.gson.JsonArray;
    import com.google.gson.JsonObject;
    
    import java.io.BufferedReader;
    import java.io.InputStreamReader;
    import java.io.OutputStream;
    import java.net.HttpURLConnection;
    import java.net.URL;
    
    public class Main {
        public static void main(String[] args) {
            // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
            // If you haven't set an environment variable, replace the line below with: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
            // URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
            String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
    
            // JSON request body (older Java versions do not support """ multiline strings)
            String jsonPayload =
                    "{"
                            + "\"model\": \"qwen-voice-design\"," // Do not modify this value
                            + "\"input\": {"
                            +     "\"action\": \"list\","
                            +     "\"page_size\": 10,"
                            +     "\"page_index\": 0"
                            + "}"
                            + "}";
    
            try {
                HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
                con.setRequestMethod("POST");
                con.setRequestProperty("Authorization", "Bearer " + apiKey);
                con.setRequestProperty("Content-Type", "application/json");
                con.setDoOutput(true);
    
                try (OutputStream os = con.getOutputStream()) {
                    os.write(jsonPayload.getBytes("UTF-8"));
                }
    
                int status = con.getResponseCode();
                BufferedReader br = new BufferedReader(new InputStreamReader(
                        status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8"));
    
                StringBuilder response = new StringBuilder();
                String line;
                while ((line = br.readLine()) != null) {
                    response.append(line);
                }
                br.close();
    
                System.out.println("HTTP Status Code: " + status);
                System.out.println("Returned JSON: " + response.toString());
    
                if (status == 200) {
                    Gson gson = new Gson();
                    JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
                    JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list");
    
                    System.out.println("\n Queried voice list:");
                    for (int i = 0; i < voiceList.size(); i++) {
                        JsonObject voiceItem = voiceList.get(i).getAsJsonObject();
                        String voice = voiceItem.get("voice").getAsString();
                        String gmtCreate = voiceItem.get("gmt_create").getAsString();
                        String targetModel = voiceItem.get("target_model").getAsString();
    
                        System.out.printf("- Voice: %s  Created: %s  Model: %s\n",
                                voice, gmtCreate, targetModel);
                    }
                }
    
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

Query a specific voice

Retrieves detailed information about a specific voice by its name.

  • URL

    China (Beijing):

    POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

    International (Singapore):

    POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
  • Request header

    Parameter

    Type

    Required

    Description

    Authorization

    string

    Yes

    An authentication token. The format is Bearer <your_api_key>. Replace "<your_api_key>" with your actual API key.

    Content-Type

    string

    Supported

    The media type of the data in the request body. The value is fixed to application/json.

  • Request body

    The request body contains all request parameters. You can omit optional fields as needed.

    Important

    model: The voice design model. The value is fixed at qwen-voice-design. Do not modify this value.

    {
        "model": "qwen-voice-design",
        "input": {
            "action": "query",
            "voice": "voiceName"
        }
    }
  • Request parameters

    Parameter

    Type

    Default

    Required

    Description

    model

    string

    -

    Supported

    The voice design model. Fixed value: qwen-voice-design.

    action

    string

    -

    Supported

    The operation type. Fixed value: query.

    voice

    string

    -

    Supported

    The name of the voice to query.

  • Response parameters

    Click to view a response example

    Data found

    {
        "output": {
            "gmt_create": "2025-12-10 14:54:09",
            "gmt_modified": "2025-12-10 17:47:48",
            "language": "en",
            "preview_text": "Dear listeners, hello everyone",
            "target_model": "qwen3-tts-vd-realtime-2025-12-16",
            "voice": "yourVoice",
            "voice_prompt": "A calm middle-aged male announcer with a deep, rich, and magnetic voice, steady speaking speed, and clear articulation, suitable for news broadcasting or documentary narration."
        },
        "usage": {},
        "request_id": "yourRequestId"
    }

    No data found

    When the queried voice does not exist, the API returns an HTTP 400 status code, and the response body contains the VoiceNotFound error code.

    {
        "request_id":"yourRequestId",
        "code":"VoiceNotFound",
        "message":"Voice not found: qwen-tts-vd-announcer-voice-xxxx"
    }

    The key parameters are:

    Parameter

    Type

    Description

    voice

    string

    The voice name. You can use it directly as the voice parameter in the speech synthesis API.

    target_model

    string

    The speech synthesis model that drives the voice. Currently, only qwen3-tts-vd-realtime-2025-12-16 is supported.

    It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, synthesis will fail.

    language

    string

    Language code.

    Valid values: zh (Chinese), en (English), de (German), it (Italian), pt (Portuguese), es (Spanish), ja (Japanese), ko (Korean), fr (French), ru (Russian).

    voice_prompt

    string

    Voice description.

    preview_text

    string

    Preview text.

    gmt_create

    string

    The time the voice was created.

    gmt_modified

    string

    The time the voice was modified.

    request_id

    string

    Request ID.

  • Sample code

    Important

    model: The voice design model. The value is fixed at qwen-voice-design. Do not modify this value.

    cURL

    If you have not set the API key as an environment variable, replace $DASHSCOPE_API_KEY in the example with your actual API key.

    # ======= Important =======
    # The URL below is for the Singapore region. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # === Delete this comment before execution ===
    
    curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-voice-design",
        "input": {
            "action": "query",
            "voice": "voiceName"
        }
    }'

    Python

    import requests
    import os
    
    def query_voice(voice_name):
        """
        Query information for a specific voice
        :param voice_name: The name of the voice
        :return: A dictionary with voice information, or None if not found
        """
        # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        # If you haven't set an environment variable, replace the line below with: api_key = "sk-xxx"
        api_key = os.getenv("DASHSCOPE_API_KEY")
        
        # Prepare request data
        headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }
        
        data = {
            "model": "qwen-voice-design",
            "input": {
                "action": "query",
                "voice": voice_name
            }
        }
        
        # URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
        # Send request
        response = requests.post(
            url,
            headers=headers,
            json=data
        )
        
        if response.status_code == 200:
            result = response.json()
            
            # Check for error message
            if "code" in result and result["code"] == "VoiceNotFound":
                print(f"Voice not found: {voice_name}")
                print(f"Error message: {result.get('message', 'Voice not found')}")
                return None
            
            # Get voice information
            voice_info = result["output"]
            print(f"Successfully queried voice information:")
            print(f"  Voice Name: {voice_info.get('voice')}")
            print(f"  Created: {voice_info.get('gmt_create')}")
            print(f"  Modified: {voice_info.get('gmt_modified')}")
            print(f"  Language: {voice_info.get('language')}")
            print(f"  Preview Text: {voice_info.get('preview_text')}")
            print(f"  Model: {voice_info.get('target_model')}")
            print(f"  Voice Prompt: {voice_info.get('voice_prompt')}")
            
            return voice_info
        else:
            print(f"Request failed. Status code: {response.status_code}")
            print(f"Response: {response.text}")
            return None
    
    def main():
        # Example: Query a voice
        voice_name = "myvoice"  # Replace with the actual voice name you want to query
        
        print(f"Querying voice: {voice_name}")
        voice_info = query_voice(voice_name)
        
        if voice_info:
            print("\nVoice query successful!")
        else:
            print("\nVoice query failed or voice does not exist.")
    
    if __name__ == "__main__":
        main()

    Java

    import com.google.gson.JsonObject;
    import com.google.gson.JsonParser;
    import java.io.BufferedReader;
    import java.io.InputStreamReader;
    import java.io.OutputStream;
    import java.net.HttpURLConnection;
    import java.net.URL;
    
    public class Main {
    
        public static void main(String[] args) {
            Main example = new Main();
            // Example: Query a voice
            String voiceName = "myvoice"; // Replace with the actual voice name you want to query
            System.out.println("Querying voice: " + voiceName);
            example.queryVoice(voiceName);
        }
    
        public void queryVoice(String voiceName) {
            // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
            // If you haven't set an environment variable, replace the line below with: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
    
            // Create the JSON request body string
            String jsonBody = "{\n" +
                    "    \"model\": \"qwen-voice-design\",\n" +
                    "    \"input\": {\n" +
                    "        \"action\": \"query\",\n" +
                    "        \"voice\": \"" + voiceName + "\"\n" +
                    "    }\n" +
                    "}";
    
            HttpURLConnection connection = null;
            try {
                // URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
                URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
                connection = (HttpURLConnection) url.openConnection();
    
                // Set request method and headers
                connection.setRequestMethod("POST");
                connection.setRequestProperty("Authorization", "Bearer " + apiKey);
                connection.setRequestProperty("Content-Type", "application/json");
                connection.setDoOutput(true);
                connection.setDoInput(true);
    
                // Send request body
                try (OutputStream os = connection.getOutputStream()) {
                    byte[] input = jsonBody.getBytes("UTF-8");
                    os.write(input, 0, input.length);
                    os.flush();
                }
    
                // Get response
                int responseCode = connection.getResponseCode();
                if (responseCode == HttpURLConnection.HTTP_OK) {
                    // Read response content
                    StringBuilder response = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            response.append(responseLine.trim());
                        }
                    }
    
                    // Parse JSON response
                    JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
    
                    // Check for error message
                    if (jsonResponse.has("code") && "VoiceNotFound".equals(jsonResponse.get("code").getAsString())) {
                        String errorMessage = jsonResponse.has("message") ?
                                jsonResponse.get("message").getAsString() : "Voice not found";
                        System.out.println("Voice not found: " + voiceName);
                        System.out.println("Error message: " + errorMessage);
                        return;
                    }
    
                    // Get voice information
                    JsonObject outputObj = jsonResponse.getAsJsonObject("output");
    
                    System.out.println("Successfully queried voice information:");
                    System.out.println("  Voice Name: " + outputObj.get("voice").getAsString());
                    System.out.println("  Created: " + outputObj.get("gmt_create").getAsString());
                    System.out.println("  Modified: " + outputObj.get("gmt_modified").getAsString());
                    System.out.println("  Language: " + outputObj.get("language").getAsString());
                    System.out.println("  Preview Text: " + outputObj.get("preview_text").getAsString());
                    System.out.println("  Model: " + outputObj.get("target_model").getAsString());
                    System.out.println("  Voice Prompt: " + outputObj.get("voice_prompt").getAsString());
    
                } else {
                    // Read error response
                    StringBuilder errorResponse = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            errorResponse.append(responseLine.trim());
                        }
                    }
    
                    System.out.println("Request failed. Status code: " + responseCode);
                    System.out.println("Error response: " + errorResponse.toString());
                }
    
            } catch (Exception e) {
                System.err.println("Request error: " + e.getMessage());
                e.printStackTrace();
            } finally {
                if (connection != null) {
                    connection.disconnect();
                }
            }
        }
    }

Delete voice

Deletes a specified voice and release the corresponding quota.

  • URL

    China (Beijing):

    POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

    International (Singapore):

    POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
  • Request header

    Parameter

    Type

    Required

    Description

    Authorization

    string

    Yes

    An authentication token. The format is Bearer <your_api_key>. Replace "<your_api_key>" with your actual API key.

    Content-Type

    string

    Supported

    The media type of the data in the request body. The value is fixed to application/json.

  • Request body

    The request body contains all request parameters. You can omit optional fields as needed:

    Important

    model: The voice design model. The value is fixed at qwen-voice-design. Do not modify this value.

    {
        "model": "qwen-voice-design",
        "input": {
            "action": "delete",
            "voice": "yourVoice"
        }
    }
  • Request parameters

    Parameter

    Type

    Default

    Required

    Description

    model

    string

    -

    Supported

    The voice design model. Fixed value: qwen-voice-design.

    action

    string

    -

    Supported

    The operation type. Fixed value: delete.

    voice

    string

    -

    Supported

    The voice to be deleted.

  • Response parameters

    Click to view a response example

    {
        "output": {
            "voice": "yourVoice"
        },
        "usage": {},
        "request_id": "yourRequestId"
    }

    The key parameters are:

    Parameter

    Type

    Description

    request_id

    string

    Request ID.

    voice

    string

    The deleted voice.

  • Sample code

    Important

    model: The voice design model. The value is fixed at qwen-voice-design. Do not modify this value.

    cURL

    If you have not set the API key as an environment variable, replace $DASHSCOPE_API_KEY in the example with your actual API key.

    # ======= Important =======
    # The URL below is for the Singapore region. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # === Delete this comment before execution ===
    
    curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-voice-design",
        "input": {
            "action": "delete",
            "voice": "yourVoice"
        }
    }'

    Python

    import requests
    import os
    
    def delete_voice(voice_name):
        """
        Delete a specified voice
        :param voice_name: The name of the voice
        :return: True if deletion is successful or the voice does not exist but the request succeeds, False if the operation fails
        """
        # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        # If you haven't set an environment variable, replace the line below with: api_key = "sk-xxx"
        api_key = os.getenv("DASHSCOPE_API_KEY")
        
        # Prepare request data
        headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json"
        }
        
        data = {
            "model": "qwen-voice-design",
            "input": {
                "action": "delete",
                "voice": voice_name
            }
        }
        
        # URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
        # Send request
        response = requests.post(
            url,
            headers=headers,
            json=data
        )
        
        if response.status_code == 200:
            result = response.json()
            
            # Check for error message
            if "code" in result and "VoiceNotFound" in result["code"]:
                print(f"Voice does not exist: {voice_name}")
                print(f"Error message: {result.get('message', 'Voice not found')}")
                return True  # Voice not existing is also a successful operation (target is already gone)
            
            # Check if deletion was successful
            if "usage" in result:
                print(f"Voice deleted successfully: {voice_name}")
                print(f"Request ID: {result.get('request_id', 'N/A')}")
                return True
            else:
                print(f"Delete operation returned an unexpected format: {result}")
                return False
        else:
            print(f"Delete voice request failed. Status code: {response.status_code}")
            print(f"Response: {response.text}")
            return False
    
    def main():
        # Example: Delete a voice
        voice_name = "myvoice"  # Replace with the actual voice name you want to delete
        
        print(f"Deleting voice: {voice_name}")
        success = delete_voice(voice_name)
        
        if success:
            print(f"\nVoice '{voice_name}' deletion operation complete!")
        else:
            print(f"\nVoice '{voice_name}' deletion operation failed!")
    
    if __name__ == "__main__":
        main()

    Java

    import com.google.gson.JsonObject;
    import com.google.gson.JsonParser;
    import java.io.BufferedReader;
    import java.io.InputStreamReader;
    import java.io.OutputStream;
    import java.net.HttpURLConnection;
    import java.net.URL;
    
    public class Main {
    
        public static void main(String[] args) {
            Main example = new Main();
            // Example: Delete a voice
            String voiceName = "myvoice"; // Replace with the actual voice name you want to delete
            System.out.println("Deleting voice: " + voiceName);
            example.deleteVoice(voiceName);
        }
    
        public void deleteVoice(String voiceName) {
            // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
            // If you haven't set an environment variable, replace the line below with: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
    
            // Create the JSON request body string
            String jsonBody = "{\n" +
                    "    \"model\": \"qwen-voice-design\",\n" +
                    "    \"input\": {\n" +
                    "        \"action\": \"delete\",\n" +
                    "        \"voice\": \"" + voiceName + "\"\n" +
                    "    }\n" +
                    "}";
    
            HttpURLConnection connection = null;
            try {
                // URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
                URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
                connection = (HttpURLConnection) url.openConnection();
    
                // Set request method and headers
                connection.setRequestMethod("POST");
                connection.setRequestProperty("Authorization", "Bearer " + apiKey);
                connection.setRequestProperty("Content-Type", "application/json");
                connection.setDoOutput(true);
                connection.setDoInput(true);
    
                // Send request body
                try (OutputStream os = connection.getOutputStream()) {
                    byte[] input = jsonBody.getBytes("UTF-8");
                    os.write(input, 0, input.length);
                    os.flush();
                }
    
                // Get response
                int responseCode = connection.getResponseCode();
                if (responseCode == HttpURLConnection.HTTP_OK) {
                    // Read response content
                    StringBuilder response = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            response.append(responseLine.trim());
                        }
                    }
    
                    // Parse JSON response
                    JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
    
                    // Check for error message
                    if (jsonResponse.has("code") && jsonResponse.get("code").getAsString().contains("VoiceNotFound")) {
                        String errorMessage = jsonResponse.has("message") ?
                                jsonResponse.get("message").getAsString() : "Voice not found";
                        System.out.println("Voice does not exist: " + voiceName);
                        System.out.println("Error message: " + errorMessage);
                        // Voice not existing is also a successful operation (target is already gone)
                    } else if (jsonResponse.has("usage")) {
                        // Check if deletion was successful
                        System.out.println("Voice deleted successfully: " + voiceName);
                        String requestId = jsonResponse.has("request_id") ?
                                jsonResponse.get("request_id").getAsString() : "N/A";
                        System.out.println("Request ID: " + requestId);
                    } else {
                        System.out.println("Delete operation returned an unexpected format: " + response.toString());
                    }
    
                } else {
                    // Read error response
                    StringBuilder errorResponse = new StringBuilder();
                    try (BufferedReader br = new BufferedReader(
                            new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
                        String responseLine;
                        while ((responseLine = br.readLine()) != null) {
                            errorResponse.append(responseLine.trim());
                        }
                    }
    
                    System.out.println("Delete voice request failed. Status code: " + responseCode);
                    System.out.println("Error response: " + errorResponse.toString());
                }
    
            } catch (Exception e) {
                System.err.println("Request error: " + e.getMessage());
                e.printStackTrace();
            } finally {
                if (connection != null) {
                    connection.disconnect();
                }
            }
        }
    }

Speech synthesis

To learn how to use a custom voice from voice design for personalized speech synthesis, see Getting started: From voice design to speech synthesis.

Speech synthesis models for voice design, such as qwen3-tts-vd-realtime-2025-12-16, are specialized. They only support voices generated by voice design and do not support public preset voices such as Chelsie, Serena, Ethan, or Cherry.

Voice quota and auto-cleanup

  • Total limit: 1,000 voices per account

    You can check the number of voices (total_count) by calling the List voices
  • Auto-cleanup: If a voice has not been used in any speech synthesis request in the past year, the system automatically deletes it.

Billing

Voice design and speech synthesis are billed separately:

  • Voice design: Creating a voice is billed at $0.2 per voice. Failed creations are not billed.

    Note

    Free quota details (Singapore region only):

    • Within 90 days of activating Alibaba Cloud Model Studio, you receive 1,000 free voice creation opportunities.

    • Failed creations do not consume the free quota.

    • Deleting a voice does not restore the free quota.

    • After the free quota is used up or the 90-day validity period expires, voice creation is billed at $0.2 per voice.

  • Speech synthesis using custom voices from voice design: Billed based on the number of text characters. For details, see Real-time speech synthesis - Qwen.

Error messages

If you encounter an error, see Error messages for troubleshooting.