Voice design generates custom voices from text descriptions. It supports multi-language and multi-dimensional voice feature definitions and is suitable for applications such as ad voice-overs, character creation, and audio content production. Voice design and speech synthesis are sequential steps. This document focuses on the parameters and interface details of voice design. For speech synthesis, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.
User guide: For model introductions and selection recommendations, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.
Language support
The voice design service supports creating voices and synthesizing speech in multiple languages, including the following: Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), and Russian (ru).
Write voice descriptions
Requirements and limitations
When writing a voice description (voice_prompt), follow these technical constraints:
Length limit: The content of
voice_promptcannot exceed 2048 characters.Supported languages: The description text supports only Chinese and English.
Core principles
A high-quality voice description (voice_prompt) is key to creating your ideal voice. It acts as a blueprint for voice design and directly guides the model to generate a voice with specific features.
Follow these core principles to describe the voice:
Be specific, not vague. Use words that describe concrete voice qualities, such as "deep," "crisp," or "fast-paced." Avoid subjective and uninformative terms like "nice" or "normal."
Use multiple dimensions, not just one. A good description combines multiple dimensions, such as gender, age, and emotion, as described below. A single-dimension description, such as "female voice," is too broad to generate a distinctive voice.
Be objective, not subjective. Focus on the physical and perceptual features of the voice, not personal preferences. For example, use "high-pitched and energetic" instead of "my favorite voice."
Be original, not imitative. Describe the qualities of the voice instead of asking to imitate a specific person, such as a celebrity or actor. Such requests involve copyright risks and are not supported by the model.
Be concise, not redundant. Make sure every word has a purpose. Avoid using synonyms or meaningless intensifiers, such as "a very, very great voice."
Description dimensions
Dimension | Example |
Gender | Male, female, neutral |
Age | Child (5-12 years), teenager (13-18 years), young adult (19-35 years), middle-aged (36-55 years), elderly (55+ years) |
Pitch | High, medium, low, high-pitched, low-pitched |
Pace | Fast, medium, slow, fast-paced, slow-paced |
Emotion | Cheerful, calm, gentle, serious, lively, composed, soothing |
Characteristics | Magnetic, crisp, hoarse, mellow, sweet, rich, powerful |
Use case | News broadcast, ad voice-over, audiobook, animation character, voice assistant, documentary narration |
Example comparison
✅ Recommended
"A young, lively female voice, with a fast pace and a noticeable upward inflection, suitable for introducing fashion products."
Analysis: This combines age, personality, pace, and intonation, and specifies a suitable scenario, creating a well-rounded profile.
"A calm, middle-aged male voice, with a slow pace and a deep, magnetic tone, suitable for reading news or narrating documentaries."
Analysis: This clearly defines the gender, age group, pace, vocal characteristics, and application domain.
"A cute child's voice, around 8 years old, speaking with a slightly childish tone, suitable for animation character voice-overs."
Analysis: This is precise down to a specific age and vocal quality (childish), with a clear objective.
"A gentle, intellectual female voice, around 30 years old, with a calm tone, suitable for audiobook narration."
Analysis: Words like "intellectual" and "calm" effectively convey the emotion and style of the voice.
❌ Non-recommended
Example | Main issue | Improvement |
A nice voice | Too vague, highly subjective, and lacks actionable features. | Add specific dimensions, such as: "A young female voice with a clear vocal line and a gentle tone." |
A voice like a certain celebrity | Due to copyright risk, the model cannot directly imitate. | Extract and describe the voice's characteristics, such as: "A mature, magnetic male voice with a calm pace." |
A very, very, very nice female voice | Redundant information. Repetitive words do not help define the voice. | Remove repetitive words and add effective descriptions, such as: "A female voice, 20–24 years old, with a light tone, lively pitch, and sweet quality." |
123456 | Invalid input. Cannot be parsed as voice features. | Provide a meaningful text description. Refer to the recommended examples above. |
Getting started: From voice design to speech synthesis
1. Workflow
Voice design and speech synthesis are two closely related but separate steps that follow a "create first, use later" workflow:
Prepare the voice description and preview text required for voice design.
Voice description (voice_prompt): Defines the features of the target voice. For information about how to write a voice description, see "Write voice descriptions".
Preview text (preview_text): The content that the preview audio of the target voice will read aloud, such as "Hello everyone, and welcome."
Call the Create voice API to create a custom voice and get the voice name and preview audio.
In this step, you must specify
target_modelto declare which speech synthesis model will drive the created voice.Listen to the preview audio to determine if it meets your expectations. If it does, proceed to the next step. Otherwise, redesign the voice.
If you already have a created voice (call the Query voice list API to check), you can skip this step and proceed to the next one.
Use voice for (speech synthesis).
Call the speech synthesis API and pass in the voice obtained in the previous step. The speech synthesis model specified in this step must be the same as the
target_modelfrom the previous step.
2. Model configuration and preparations
Select the appropriate model and complete the preparations.
Model configuration
Specify the following two models for voice design:
Voice design model: qwen-voice-design
Voice-driven speech synthesis models fall into two categories:
Qwen3-TTS-VD-Realtime (see Real-time Speech Synthesis - Qwen):
qwen3-tts-vd-realtime-2026-01-15
qwen3-tts-vd-realtime-2025-12-16
Qwen3-TTS-VD (see Speech Synthesis - Qwen):
qwen3-tts-vd-2026-01-26
Preparations
Get an API key: Get an API key. For security, we recommend configuring the API key as an environment variable.
Install the SDK: Make sure you have installed the latest version of the DashScope SDK.
3. Example code
Bidirectional streaming synthesis
This applies to the Qwen3-TTS-VC-Realtime series of models. For more information, see Real-time Speech Synthesis - Qwen.
Generate a custom voice and preview the result. If you are satisfied with the result, proceed to the next step. Otherwise, generate it again.
Python
import requests import base64 import os def create_voice_and_play(): # API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") if not api_key: print("Error: DASHSCOPE_API_KEY environment variable not found. Please set the API key first.") return None, None, None # Prepare request data headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-realtime-2026-01-15", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "en" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } } # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" try: # Send the request response = requests.post( url, headers=headers, json=data, timeout=60 # Add a timeout setting ) if response.status_code == 200: result = response.json() # Get the voice name voice_name = result["output"]["voice"] print(f"Voice name: {voice_name}") # Get the preview audio data base64_audio = result["output"]["preview_audio"]["data"] # Decode the Base64 audio data audio_bytes = base64.b64decode(base64_audio) # Save the audio file locally filename = f"{voice_name}_preview.wav" # Write the audio data to a local file with open(filename, 'wb') as f: f.write(audio_bytes) print(f"Audio saved to local file: {filename}") print(f"File path: {os.path.abspath(filename)}") return voice_name, audio_bytes, filename else: print(f"Request failed with status code: {response.status_code}") print(f"Response content: {response.text}") return None, None, None except requests.exceptions.RequestException as e: print(f"A network request error occurred: {e}") return None, None, None except KeyError as e: print(f"Response data format error, missing required field: {e}") print(f"Response content: {response.text if 'response' in locals() else 'No response'}") return None, None, None except Exception as e: print(f"An unknown error occurred: {e}") return None, None, None if __name__ == "__main__": print("Starting to create voice...") voice_name, audio_data, saved_filename = create_voice_and_play() if voice_name: print(f"\nSuccessfully created voice '{voice_name}'") print(f"Audio file saved as: '{saved_filename}'") print(f"File size: {os.path.getsize(saved_filename)} bytes") else: print("\nVoice creation failed")Java
You need to import the Gson dependency. If you are using Maven or Gradle, add the dependency as follows:
Maven
Add the following content to
pom.xml:<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson --> <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.13.1</version> </dependency>Gradle
Add the following content to
build.gradle:// https://mvnrepository.com/artifact/com.google.code.gson/gson implementation("com.google.code.gson:gson:2.13.1")import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.*; import java.net.HttpURLConnection; import java.net.URL; import java.util.Base64; public class Main { public static void main(String[] args) { Main example = new Main(); example.createVoice(); } public void createVoice() { // API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key // If the environment variable is not set, replace the following line with your Model Studio API key: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Create the JSON request body string String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"create\",\n" + " \"target_model\": \"qwen3-tts-vd-realtime-2026-01-15\",\n" + " \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" + " \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" + " \"preferred_name\": \"announcer\",\n" + " \"language\": \"en\"\n" + " },\n" + " \"parameters\": {\n" + " \"sample_rate\": 24000,\n" + " \"response_format\": \"wav\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // Set the request method and headers connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // Send the request body try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // Get the response int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // Read the response content StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // Parse the JSON response JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); JsonObject outputObj = jsonResponse.getAsJsonObject("output"); JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio"); // Get the voice name String voiceName = outputObj.get("voice").getAsString(); System.out.println("Voice name: " + voiceName); // Get the Base64-encoded audio data String base64Audio = previewAudioObj.get("data").getAsString(); // Decode the Base64 audio data byte[] audioBytes = Base64.getDecoder().decode(base64Audio); // Save the audio to a local file String filename = voiceName + "_preview.wav"; saveAudioToFile(audioBytes, filename); System.out.println("Audio saved to local file: " + filename); } else { // Read the error response StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("Request failed with status code: " + responseCode); System.out.println("Error response: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("An error occurred during the request: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } private void saveAudioToFile(byte[] audioBytes, String filename) { try { File file = new File(filename); try (FileOutputStream fos = new FileOutputStream(file)) { fos.write(audioBytes); } System.out.println("Audio saved to: " + file.getAbsolutePath()); } catch (IOException e) { System.err.println("An error occurred while saving the audio file: " + e.getMessage()); e.printStackTrace(); } } }Use the custom voice generated in the previous step for speech synthesis.
This example refers to the "server commit mode" sample code from the DashScope SDK for speech synthesis with a system voice. It replaces the
voiceparameter with the custom voice generated by voice design.Key principle: The model used for voice design (
target_model) must be the same as the model used for subsequent speech synthesis (model). Otherwise, the synthesis will fail.Python
# coding=utf-8 # Installation instructions for pyaudio: # APPLE Mac OS X # brew install portaudio # pip install pyaudio # Debian/Ubuntu # sudo apt-get install python-pyaudio python3-pyaudio # or # pip install pyaudio # CentOS # sudo yum install -y portaudio portaudio-devel && pip install pyaudio # Microsoft Windows # python -m pip install pyaudio import pyaudio import os import base64 import threading import time import dashscope # DashScope Python SDK version must be 1.23.9 or later from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat # ======= Constant Configuration ======= TEXT_TO_SYNTHESIZE = [ 'Right? I really like this kind of supermarket,', 'especially during the New Year.', 'Going to the supermarket', 'just makes me feel', 'super, super happy!', 'I want to buy so many things!' ] def init_dashscope_api_key(): """ Initialize the API key for the DashScope SDK. """ # API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key # If the environment variable is not set, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx" dashscope.api_key = os.getenv("DASHSCOPE_API_KEY") # ======= Callback Class ======= class MyCallback(QwenTtsRealtimeCallback): """ Custom TTS streaming callback. """ def __init__(self): self.complete_event = threading.Event() self._player = pyaudio.PyAudio() self._stream = self._player.open( format=pyaudio.paInt16, channels=1, rate=24000, output=True ) def on_open(self) -> None: print('[TTS] Connection established') def on_close(self, close_status_code, close_msg) -> None: self._stream.stop_stream() self._stream.close() self._player.terminate() print(f'[TTS] Connection closed, code={close_status_code}, msg={close_msg}') def on_event(self, response: dict) -> None: try: event_type = response.get('type', '') if event_type == 'session.created': print(f'[TTS] Session started: {response["session"]["id"]}') elif event_type == 'response.audio.delta': audio_data = base64.b64decode(response['delta']) self._stream.write(audio_data) elif event_type == 'response.done': print(f'[TTS] Response complete, Response ID: {qwen_tts_realtime.get_last_response_id()}') elif event_type == 'session.finished': print('[TTS] Session finished') self.complete_event.set() except Exception as e: print(f'[Error] Exception processing callback event: {e}') def wait_for_finished(self): self.complete_event.wait() # ======= Main Execution Logic ======= if __name__ == '__main__': init_dashscope_api_key() print('[System] Initializing Qwen TTS Realtime ...') callback = MyCallback() qwen_tts_realtime = QwenTtsRealtime( # Use the same model for voice design and speech synthesis model="qwen3-tts-vd-realtime-2026-01-15", callback=callback, # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime' ) qwen_tts_realtime.connect() qwen_tts_realtime.update_session( voice="myvoice", # Replace the voice parameter with the custom voice generated by voice design response_format=AudioFormat.PCM_24000HZ_MONO_16BIT, mode='server_commit' ) for text_chunk in TEXT_TO_SYNTHESIZE: print(f'[Sending text]: {text_chunk}') qwen_tts_realtime.append_text(text_chunk) time.sleep(0.1) qwen_tts_realtime.finish() callback.wait_for_finished() print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, ' f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')Java
import com.alibaba.dashscope.audio.qwen_tts_realtime.*; import com.alibaba.dashscope.exception.NoApiKeyException; import com.google.gson.JsonObject; import javax.sound.sampled.*; import java.io.*; import java.util.Base64; import java.util.Queue; import java.util.concurrent.CountDownLatch; import java.util.concurrent.atomic.AtomicReference; import java.util.concurrent.ConcurrentLinkedQueue; import java.util.concurrent.atomic.AtomicBoolean; public class Main { // ===== Constant Definitions ===== private static String[] textToSynthesize = { "Right? I really like this kind of supermarket,", "especially during the New Year.", "Going to the supermarket", "just makes me feel", "super, super happy!", "I want to buy so many things!" }; // Real-time audio player class public static class RealtimePcmPlayer { private int sampleRate; private SourceDataLine line; private AudioFormat audioFormat; private Thread decoderThread; private Thread playerThread; private AtomicBoolean stopped = new AtomicBoolean(false); private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>(); private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>(); // Constructor initializes audio format and audio line public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException { this.sampleRate = sampleRate; this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false); DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat); line = (SourceDataLine) AudioSystem.getLine(info); line.open(audioFormat); line.start(); decoderThread = new Thread(new Runnable() { @Override public void run() { while (!stopped.get()) { String b64Audio = b64AudioBuffer.poll(); if (b64Audio != null) { byte[] rawAudio = Base64.getDecoder().decode(b64Audio); RawAudioBuffer.add(rawAudio); } else { try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); } } } } }); playerThread = new Thread(new Runnable() { @Override public void run() { while (!stopped.get()) { byte[] rawAudio = RawAudioBuffer.poll(); if (rawAudio != null) { try { playChunk(rawAudio); } catch (IOException e) { throw new RuntimeException(e); } catch (InterruptedException e) { throw new RuntimeException(e); } } else { try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); } } } } }); decoderThread.start(); playerThread.start(); } // Plays an audio chunk and blocks until playback is complete private void playChunk(byte[] chunk) throws IOException, InterruptedException { if (chunk == null || chunk.length == 0) return; int bytesWritten = 0; while (bytesWritten < chunk.length) { bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten); } int audioLength = chunk.length / (this.sampleRate*2/1000); // Wait for the audio in the buffer to finish playing Thread.sleep(audioLength - 10); } public void write(String b64Audio) { b64AudioBuffer.add(b64Audio); } public void cancel() { b64AudioBuffer.clear(); RawAudioBuffer.clear(); } public void waitForComplete() throws InterruptedException { while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) { Thread.sleep(100); } line.drain(); } public void shutdown() throws InterruptedException { stopped.set(true); decoderThread.join(); playerThread.join(); if (line != null && line.isRunning()) { line.drain(); line.close(); } } } public static void main(String[] args) throws Exception { QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder() // Use the same model for voice design and speech synthesis .model("qwen3-tts-vd-realtime-2026-01-15") // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime") // API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key // If the environment variable is not set, replace the following line with your Model Studio API key: .apikey("sk-xxx") .apikey(System.getenv("DASHSCOPE_API_KEY")) .build(); AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1)); final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null); // Create a real-time audio player instance RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000); QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() { @Override public void onOpen() { // Handling for when the connection is established } @Override public void onEvent(JsonObject message) { String type = message.get("type").getAsString(); switch(type) { case "session.created": // Handling for when the session is created break; case "response.audio.delta": String recvAudioB64 = message.get("delta").getAsString(); // Play audio in real time audioPlayer.write(recvAudioB64); break; case "response.done": // Handling for when the response is complete break; case "session.finished": // Handling for when the session is finished completeLatch.get().countDown(); default: break; } } @Override public void onClose(int code, String reason) { // Handling for when the connection is closed } }); qwenTtsRef.set(qwenTtsRealtime); try { qwenTtsRealtime.connect(); } catch (NoApiKeyException e) { throw new RuntimeException(e); } QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder() .voice("myvoice") // Replace the voice parameter with the custom voice generated by voice design .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT) .mode("server_commit") .build(); qwenTtsRealtime.updateSession(config); for (String text:textToSynthesize) { qwenTtsRealtime.appendText(text); Thread.sleep(100); } qwenTtsRealtime.finish(); completeLatch.get().await(); // Wait for audio playback to complete and shut down the player audioPlayer.waitForComplete(); audioPlayer.shutdown(); System.exit(0); } }
Non-streaming/Unidirectional streaming synthesis
This applies to the Qwen3-TTS-VC series of models. For more information, see Speech Synthesis - Qwen.
Generate a custom voice and preview the result. If you are satisfied with the result, proceed to the next step. Otherwise, generate it again.
Python
import requests import base64 import os def create_voice_and_play(): # API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") if not api_key: print("Error: DASHSCOPE_API_KEY environment variable not found. Please set the API key first.") return None, None, None # Prepare request data headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-2026-01-26", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "en" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } } # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" try: # Send the request response = requests.post( url, headers=headers, json=data, timeout=60 # Add a timeout setting ) if response.status_code == 200: result = response.json() # Get the voice name voice_name = result["output"]["voice"] print(f"Voice name: {voice_name}") # Get the preview audio data base64_audio = result["output"]["preview_audio"]["data"] # Decode the Base64 audio data audio_bytes = base64.b64decode(base64_audio) # Save the audio file locally filename = f"{voice_name}_preview.wav" # Write the audio data to a local file with open(filename, 'wb') as f: f.write(audio_bytes) print(f"Audio saved to local file: {filename}") print(f"File path: {os.path.abspath(filename)}") return voice_name, audio_bytes, filename else: print(f"Request failed with status code: {response.status_code}") print(f"Response content: {response.text}") return None, None, None except requests.exceptions.RequestException as e: print(f"A network request error occurred: {e}") return None, None, None except KeyError as e: print(f"Response data format error, missing required field: {e}") print(f"Response content: {response.text if 'response' in locals() else 'No response'}") return None, None, None except Exception as e: print(f"An unknown error occurred: {e}") return None, None, None if __name__ == "__main__": print("Starting to create voice...") voice_name, audio_data, saved_filename = create_voice_and_play() if voice_name: print(f"\nSuccessfully created voice '{voice_name}'") print(f"Audio file saved as: '{saved_filename}'") print(f"File size: {os.path.getsize(saved_filename)} bytes") else: print("\nVoice creation failed")Java
You need to import the Gson dependency. If you are using Maven or Gradle, add the dependency as follows:
Maven
Add the following content to
pom.xml:<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson --> <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.13.1</version> </dependency>Gradle
Add the following content to
build.gradle:// https://mvnrepository.com/artifact/com.google.code.gson/gson implementation("com.google.code.gson:gson:2.13.1")ImportantWhen using a custom voice generated by voice design for speech synthesis, you must set the voice as follows:
MultiModalConversationParam param = MultiModalConversationParam.builder() .parameter("voice", "your_voice") // Replace the voice parameter with the custom voice generated by voice design .build();import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.*; import java.net.HttpURLConnection; import java.net.URL; import java.util.Base64; public class Main { public static void main(String[] args) { Main example = new Main(); example.createVoice(); } public void createVoice() { // API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key // If the environment variable is not set, replace the following line with your Model Studio API key: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Create the JSON request body string String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"create\",\n" + " \"target_model\": \"qwen3-tts-vd-2026-01-26\",\n" + " \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" + " \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" + " \"preferred_name\": \"announcer\",\n" + " \"language\": \"en\"\n" + " },\n" + " \"parameters\": {\n" + " \"sample_rate\": 24000,\n" + " \"response_format\": \"wav\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // Set the request method and headers connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // Send the request body try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // Get the response int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // Read the response content StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // Parse the JSON response JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); JsonObject outputObj = jsonResponse.getAsJsonObject("output"); JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio"); // Get the voice name String voiceName = outputObj.get("voice").getAsString(); System.out.println("Voice name: " + voiceName); // Get the Base64-encoded audio data String base64Audio = previewAudioObj.get("data").getAsString(); // Decode the Base64 audio data byte[] audioBytes = Base64.getDecoder().decode(base64Audio); // Save the audio to a local file String filename = voiceName + "_preview.wav"; saveAudioToFile(audioBytes, filename); System.out.println("Audio saved to local file: " + filename); } else { // Read the error response StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("Request failed with status code: " + responseCode); System.out.println("Error response: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("An error occurred during the request: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } private void saveAudioToFile(byte[] audioBytes, String filename) { try { File file = new File(filename); try (FileOutputStream fos = new FileOutputStream(file)) { fos.write(audioBytes); } System.out.println("Audio saved to: " + file.getAbsolutePath()); } catch (IOException e) { System.err.println("An error occurred while saving the audio file: " + e.getMessage()); e.printStackTrace(); } } }Use the custom voice generated in the previous step for non-streaming speech synthesis.
This example refers to the "non-streaming output" sample code from the DashScope SDK for speech synthesis with a system voice. It replaces the
voiceparameter with the custom voice generated by voice design. For unidirectional streaming synthesis, see Speech Synthesis - Qwen.Key principle: The model used for voice design (
target_model) must be the same as the model used for subsequent speech synthesis (model). Otherwise, the synthesis will fail.Python
import os import dashscope if __name__ == '__main__': # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1 dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1' text = "How is the weather today?" # How to use the SpeechSynthesizer interface: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...) response = dashscope.MultiModalConversation.call( model="qwen3-tts-vd-2026-01-26", # API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx" api_key=os.getenv("DASHSCOPE_API_KEY"), text=text, voice="myvoice", # Replace the voice parameter with the custom voice generated by voice design stream=False ) print(response)Java
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation; import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam; import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult; import com.alibaba.dashscope.exception.ApiException; import com.alibaba.dashscope.exception.NoApiKeyException; import com.alibaba.dashscope.exception.UploadFileException; import com.alibaba.dashscope.utils.Constants; import java.io.FileOutputStream; import java.io.InputStream; import java.net.URL; public class Main { private static final String MODEL = "qwen3-tts-vd-2026-01-26"; public static void call() throws ApiException, NoApiKeyException, UploadFileException { MultiModalConversation conv = new MultiModalConversation(); MultiModalConversationParam param = MultiModalConversationParam.builder() // API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key // If the environment variable is not set, replace the following line with your Model Studio API key: .apiKey("sk-xxx") .apiKey(System.getenv("DASHSCOPE_API_KEY")) .model(MODEL) .text("Today is a wonderful day to build something people love!") .parameter("voice", "myvoice") // Replace the voice parameter with the custom voice generated by voice design .build(); MultiModalConversationResult result = conv.call(param); String audioUrl = result.getOutput().getAudio().getUrl(); System.out.print(audioUrl); // Download the audio file locally try (InputStream in = new URL(audioUrl).openStream(); FileOutputStream out = new FileOutputStream("downloaded_audio.wav")) { byte[] buffer = new byte[1024]; int bytesRead; while ((bytesRead = in.read(buffer)) != -1) { out.write(buffer, 0, bytesRead); } System.out.println("\nAudio file downloaded to local: downloaded_audio.wav"); } catch (Exception e) { System.out.println("\nError downloading audio file: " + e.getMessage()); } } public static void main(String[] args) { try { // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1 Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1"; call(); } catch (ApiException | NoApiKeyException | UploadFileException e) { System.out.println(e.getMessage()); } System.exit(0); } }
API reference
When using different APIs, make sure to use the same account for all operations.
Create a voice
Creates a custom voice by providing a voice description and preview text.
URL
Chinese Mainland:
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational:
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customizationRequest headers
Parameter
Type
Required
Description
Authorization
string
The authentication token in the format
Bearer <your_api_key>. When using it, replace "<your_api_key>" with your actual API key.Content-Type
string
The media type of the data transferred in the request body. This is fixed to
application/json.Message Body
The request body contains all request parameters as shown below. You can omit optional fields as needed for your business.
ImportantNote the distinction between the following parameters:
model: The voice design model, fixed to qwen-voice-design.target_model: The speech synthesis model that drives the voice. It must be the same as the speech synthesis model used when calling the speech synthesis API later. Otherwise, the synthesis will fail.
{ "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-realtime-2026-01-15", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "zh" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } }Request parameters
Parameter
Type
Default
Required
Description
model
string
-
The voice design model, fixed to
qwen-voice-design.action
string
-
The operation type, fixed to
create.target_model
string
-
The speech synthesis model that drives the voice. Supported models include (two types):
Qwen3-TTS-VD-Realtime (see Real-time speech synthesis - Qwen):
qwen3-tts-vd-realtime-2026-01-15
qwen3-tts-vd-realtime-2025-12-16
Qwen3-TTS-VD (see Speech synthesis - Qwen):
qwen3-tts-vd-2026-01-26
This must be the same as the speech synthesis model used when calling the speech synthesis API later. Otherwise, the synthesis will fail.
voice_prompt
string
-
The voice description. Maximum length: 2048 characters.
Only Chinese and English are supported.
For information about how to write a voice description, see "Write voice descriptions".
preview_text
string
-
The text corresponding to the preview audio. Maximum length: 1024 characters.
Supports Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), and Russian (ru).
preferred_name
string
-
Assign an easily recognizable name to the voice (only numbers, English letters, and underscores are allowed, up to 16 characters). We recommend using an identifier related to the character or scenario.
This keyword will appear in the designed voice name. For example, if the keyword is "announcer", the final voice name will be "qwen-tts-vd-announcer-voice-20251201102800-a1b2".
language
string
zh
The language code, which specifies the language preference for the generated voice. This parameter affects the language features and pronunciation tendencies of the generated voice. We recommend selecting the appropriate language code based on your actual use case.
If you use this parameter, the language you set must match the language of the
preview_text.Valid values:
zh(Chinese),en(English),de(German),it(Italian),pt(Portuguese),es(Spanish),ja(Japanese),ko(Korean),fr(French),ru(Russian).sample_rate
int
24000
The sample rate (in Hz) of the preview audio generated by voice design.
Valid values:
8000
16000
24000
48000
response_format
string
wav
The format of the preview audio generated by voice design.
Valid values:
pcm
wav
mp3
opus
Response parameters
The parameters to follow are:
Parameter
Type
Description
voice
string
The voice name, which can be used directly for the
voiceparameter in the speech synthesis API.data
string
The preview audio data generated by voice design, returned as a Base64-encoded string.
sample_rate
int
The sample rate (in Hz) of the preview audio generated by voice design. It matches the sample rate set during voice creation, defaulting to 24000 Hz if not specified.
response_format
string
The format of the preview audio generated by voice design. It matches the audio format set during voice creation, defaulting to wav if not specified.
target_model
string
The speech synthesis model that drives the voice. Supported models include (two types):
Qwen3-TTS-VD-Realtime (see Real-time speech synthesis - Qwen):
qwen3-tts-vd-realtime-2026-01-15
qwen3-tts-vd-realtime-2025-12-16
Qwen3-TTS-VD (see Speech synthesis - Qwen):
qwen3-tts-vd-2026-01-26
This must be the same as the speech synthesis model used when calling the speech synthesis API later. Otherwise, the synthesis will fail.
request_id
string
The request ID.
count
integer
The number of "create voice" operations billed for this request. The cost for this request is $
When creating a voice, count is always 1.
Example code
ImportantNote the distinction between the following parameters:
model: The voice design model, fixed to qwen-voice-design.target_model: The speech synthesis model that drives the voice. It must be the same as the speech synthesis model used when calling the speech synthesis API later. Otherwise, the synthesis will fail.
cURL
If you have not configured the API key as an environment variable, replace
$DASHSCOPE_API_KEYin the example with your actual API key.https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization# ======= Important Note ======= # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key # === Delete this comment before execution === curl -X POST <a data-init-id="9f104f338c7kz" href="https://poc-dashscope.aliyuncs.com/api/v1/services/audio/tts/customization" id="28f184e9f7vq7">https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization</a> \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-realtime-2026-01-15", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "zh" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } }'Python
import requests import base64 import os def create_voice_and_play(): # API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") if not api_key: print("Error: DASHSCOPE_API_KEY environment variable not found. Please set the API key first.") return None, None, None # Prepare request data headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-realtime-2026-01-15", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "en" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } } # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" try: # Send the request response = requests.post( url, headers=headers, json=data, timeout=60 # Add a timeout setting ) if response.status_code == 200: result = response.json() # Get the voice name voice_name = result["output"]["voice"] print(f"Voice name: {voice_name}") # Get the preview audio data base64_audio = result["output"]["preview_audio"]["data"] # Decode the Base64 audio data audio_bytes = base64.b64decode(base64_audio) # Save the audio file locally filename = f"{voice_name}_preview.wav" # Write the audio data to a local file with open(filename, 'wb') as f: f.write(audio_bytes) print(f"Audio saved to local file: {filename}") print(f"File path: {os.path.abspath(filename)}") return voice_name, audio_bytes, filename else: print(f"Request failed with status code: {response.status_code}") print(f"Response content: {response.text}") return None, None, None except requests.exceptions.RequestException as e: print(f"A network request error occurred: {e}") return None, None, None except KeyError as e: print(f"Response data format error, missing required field: {e}") print(f"Response content: {response.text if 'response' in locals() else 'No response'}") return None, None, None except Exception as e: print(f"An unknown error occurred: {e}") return None, None, None if __name__ == "__main__": print("Starting to create voice...") voice_name, audio_data, saved_filename = create_voice_and_play() if voice_name: print(f"\nSuccessfully created voice '{voice_name}'") print(f"Audio file saved as: '{saved_filename}'") print(f"File size: {os.path.getsize(saved_filename)} bytes") else: print("\nVoice creation failed")Java
import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.*; import java.net.HttpURLConnection; import java.net.URL; import java.util.Base64; public class Main { public static void main(String[] args) { Main example = new Main(); example.createVoice(); } public void createVoice() { // API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key // If the environment variable is not set, replace the following line with your Model Studio API key: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Create the JSON request body string String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"create\",\n" + " \"target_model\": \"qwen3-tts-vd-realtime-2026-01-15\",\n" + " \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" + " \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" + " \"preferred_name\": \"announcer\",\n" + " \"language\": \"en\"\n" + " },\n" + " \"parameters\": {\n" + " \"sample_rate\": 24000,\n" + " \"response_format\": \"wav\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // Set the request method and headers connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // Send the request body try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // Get the response int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // Read the response content StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // Parse the JSON response JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); JsonObject outputObj = jsonResponse.getAsJsonObject("output"); JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio"); // Get the voice name String voiceName = outputObj.get("voice").getAsString(); System.out.println("Voice name: " + voiceName); // Get the Base64-encoded audio data String base64Audio = previewAudioObj.get("data").getAsString(); // Decode the Base64 audio data byte[] audioBytes = Base64.getDecoder().decode(base64Audio); // Save the audio to a local file String filename = voiceName + "_preview.wav"; saveAudioToFile(audioBytes, filename); System.out.println("Audio saved to local file: " + filename); } else { // Read the error response StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("Request failed with status code: " + responseCode); System.out.println("Error response: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("An error occurred during the request: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } private void saveAudioToFile(byte[] audioBytes, String filename) { try { File file = new File(filename); try (FileOutputStream fos = new FileOutputStream(file)) { fos.write(audioBytes); } System.out.println("Audio saved to: " + file.getAbsolutePath()); } catch (IOException e) { System.err.println("An error occurred while saving the audio file: " + e.getMessage()); e.printStackTrace(); } } }
Query voice list
Performs a paged query for the list of created voices.
-
URL
Chinese Mainland:
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational:
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization -
Request headers
Parameter
Type
Required
Description
Authorization
string
Authentication token in the format
Bearer <your_api_key>. Replace "<your_api_key>" with your actual API key.Content-Type
string
Media type of the data transmitted in the request body. Fixed as
application/json. Request body
The request body contains all request parameters as shown below. Omit optional fields as needed for your business.
Importantmodel: The voice design model, fixed toqwen-voice-design. Do not modify this value.{ "model": "qwen-voice-design", "input": { "action": "list", "page_size": 10, "page_index": 0 } }Request parameters
Parameter
Type
Default
Required
Description
model
string
-
The voice design model, fixed to
qwen-voice-design.action
string
-
The operation type, fixed to
list.page_index
integer
0
The page number index. Range: [0, 200].
page_size
integer
10
Entries per page. The value must be greater than 0.
Response parameters
The parameters to follow are:
Parameter
Type
Description
voice
string
The voice name, which can be used directly for the
voiceparameter in the speech synthesis API.target_model
string
The speech synthesis model that drives the voice. Supported models include (two types):
Qwen3-TTS-VD-Realtime (see Real-time speech synthesis - Qwen):
qwen3-tts-vd-realtime-2026-01-15
qwen3-tts-vd-realtime-2025-12-16
Qwen3-TTS-VD (see Speech synthesis - Qwen):
qwen3-tts-vd-2026-01-26
This must be the same as the speech synthesis model used when calling the speech synthesis API later. Otherwise, the synthesis will fail.
language
string
The language code.
Valid values:
zh(Chinese),en(English),de(German),it(Italian),pt(Portuguese),es(Spanish),ja(Japanese),ko(Korean),fr(French),ru(Russian).voice_prompt
string
The voice description.
preview_text
string
The preview text.
gmt_create
string
The time the voice was created.
gmt_modified
string
The time the voice was modified.
page_index
integer
The page number index.
page_size
integer
Entries per page.
total_count
integer
The total number of data entries found.
request_id
string
The request ID.
Example code
Importantmodel: The voice design model, fixed toqwen-voice-design. Do not modify this value.cURL
If you haven't configured your API key as an environment variable, replace
$DASHSCOPE_API_KEYin the examples with your actual API key.# ======= Important Note ======= # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key # === Delete this comment before execution === curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-voice-design", "input": { "action": "list", "page_size": 10, "page_index": 0 } }'Python
import os import requests # API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" payload = { "model": "qwen-voice-design", # Do not modify this value "input": { "action": "list", "page_size": 10, "page_index": 0 } } headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } response = requests.post(url, json=payload, headers=headers) print("HTTP Status Code:", response.status_code) if response.status_code == 200: data = response.json() voice_list = data["output"]["voice_list"] print("Queried voice list:") for item in voice_list: print(f"- Voice: {item['voice']} Creation Time: {item['gmt_create']} Model: {item['target_model']}") else: print("Request failed:", response.text)Java
import com.google.gson.Gson; import com.google.gson.JsonArray; import com.google.gson.JsonObject; import java.io.BufferedReader; import java.io.InputStreamReader; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; public class Main { public static void main(String[] args) { // API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key // If the environment variable is not set, replace the following line with your Model Studio API key: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"; // JSON request body (older Java versions do not have """ multiline strings) String jsonPayload = "{" + "\"model\": \"qwen-voice-design\"," // Do not modify this value + "\"input\": {" + "\"action\": \"list\"," + "\"page_size\": 10," + "\"page_index\": 0" + "}" + "}"; try { HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection(); con.setRequestMethod("POST"); con.setRequestProperty("Authorization", "Bearer " + apiKey); con.setRequestProperty("Content-Type", "application/json"); con.setDoOutput(true); try (OutputStream os = con.getOutputStream()) { os.write(jsonPayload.getBytes("UTF-8")); } int status = con.getResponseCode(); BufferedReader br = new BufferedReader(new InputStreamReader( status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8")); StringBuilder response = new StringBuilder(); String line; while ((line = br.readLine()) != null) { response.append(line); } br.close(); System.out.println("HTTP Status Code: " + status); System.out.println("Returned JSON: " + response.toString()); if (status == 200) { Gson gson = new Gson(); JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class); JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list"); System.out.println("\n Queried voice list:"); for (int i = 0; i < voiceList.size(); i++) { JsonObject voiceItem = voiceList.get(i).getAsJsonObject(); String voice = voiceItem.get("voice").getAsString(); String gmtCreate = voiceItem.get("gmt_create").getAsString(); String targetModel = voiceItem.get("target_model").getAsString(); System.out.printf("- Voice: %s Creation Time: %s Model: %s\n", voice, gmtCreate, targetModel); } } } catch (Exception e) { e.printStackTrace(); } } }
Query specific voice
Gets detailed information about a specific voice by its name.
-
URL
Chinese Mainland:
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational:
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization -
Request headers
Parameter
Type
Required
Description
Authorization
string
Authentication token in the format
Bearer <your_api_key>. Replace "<your_api_key>" with your actual API key.Content-Type
string
Media type of the data transmitted in the request body. Fixed as
application/json. Request body
The request body contains all request parameters as shown below. Omit optional fields as needed for your business.
Importantmodel: The voice design model, fixed toqwen-voice-design. Do not modify this value.{ "model": "qwen-voice-design", "input": { "action": "query", "voice": "voiceName" } }Request parameters
Parameter
Type
Default
Required
Description
model
string
-
The voice design model, fixed to
qwen-voice-design.action
string
-
The operation type, fixed to
query.voice
string
-
The name of the voice to query.
Response parameters
The parameters to follow are:
Parameter
Type
Description
voice
string
The voice name, which can be used directly for the
voiceparameter in the speech synthesis API.target_model
string
The speech synthesis model that drives the voice. Supported models include (two types):
Qwen3-TTS-VD-Realtime (see Real-time speech synthesis - Qwen):
qwen3-tts-vd-realtime-2026-01-15
qwen3-tts-vd-realtime-2025-12-16
Qwen3-TTS-VD (see Speech synthesis - Qwen):
qwen3-tts-vd-2026-01-26
This must be the same as the speech synthesis model used when calling the speech synthesis API later. Otherwise, the synthesis will fail.
language
string
The language code.
Valid values:
zh(Chinese),en(English),de(German),it(Italian),pt(Portuguese),es(Spanish),ja(Japanese),ko(Korean),fr(French),ru(Russian).voice_prompt
string
The voice description.
preview_text
string
The preview text.
gmt_create
string
The time the voice was created.
gmt_modified
string
The time the voice was modified.
request_id
string
The request ID.
Example code
Importantmodel: The voice design model, fixed toqwen-voice-design. Do not modify this value.cURL
If you haven't configured your API key as an environment variable, replace
$DASHSCOPE_API_KEYin the examples with your actual API key.# ======= Important Note ======= # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key # === Delete this comment before execution === curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-voice-design", "input": { "action": "query", "voice": "voiceName" } }'Python
import requests import os def query_voice(voice_name): """ Query information for a specific voice. :param voice_name: The name of the voice. :return: A dictionary with voice information, or None if not found. """ # API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") # Prepare request data headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "query", "voice": voice_name } } # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" # Send the request response = requests.post( url, headers=headers, json=data ) if response.status_code == 200: result = response.json() # Check for an error message if "code" in result and result["code"] == "VoiceNotFound": print(f"Voice not found: {voice_name}") print(f"Error message: {result.get('message', 'Voice not found')}") return None # Get the voice information voice_info = result["output"] print(f"Successfully queried voice information:") print(f" Voice Name: {voice_info.get('voice')}") print(f" Creation Time: {voice_info.get('gmt_create')}") print(f" Modification Time: {voice_info.get('gmt_modified')}") print(f" Language: {voice_info.get('language')}") print(f" Preview Text: {voice_info.get('preview_text')}") print(f" Model: {voice_info.get('target_model')}") print(f" Voice Description: {voice_info.get('voice_prompt')}") return voice_info else: print(f"Request failed with status code: {response.status_code}") print(f"Response content: {response.text}") return None def main(): # Example: Query a voice voice_name = "myvoice" # Replace with the actual voice name you want to query print(f"Querying voice: {voice_name}") voice_info = query_voice(voice_name) if voice_info: print("\nVoice query successful!") else: print("\nVoice query failed or voice does not exist.") if __name__ == "__main__": main()Java
import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.BufferedReader; import java.io.InputStreamReader; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; public class Main { public static void main(String[] args) { Main example = new Main(); // Example: Query a voice String voiceName = "myvoice"; // Replace with the actual voice name you want to query System.out.println("Querying voice: " + voiceName); example.queryVoice(voiceName); } public void queryVoice(String voiceName) { // API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key // If the environment variable is not set, replace the following line with your Model Studio API key: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Create the JSON request body string String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"query\",\n" + " \"voice\": \"" + voiceName + "\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // Set the request method and headers connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // Send the request body try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // Get the response int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // Read the response content StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // Parse the JSON response JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); // Check for an error message if (jsonResponse.has("code") && "VoiceNotFound".equals(jsonResponse.get("code").getAsString())) { String errorMessage = jsonResponse.has("message") ? jsonResponse.get("message").getAsString() : "Voice not found"; System.out.println("Voice not found: " + voiceName); System.out.println("Error message: " + errorMessage); return; } // Get the voice information JsonObject outputObj = jsonResponse.getAsJsonObject("output"); System.out.println("Successfully queried voice information:"); System.out.println(" Voice Name: " + outputObj.get("voice").getAsString()); System.out.println(" Creation Time: " + outputObj.get("gmt_create").getAsString()); System.out.println(" Modification Time: " + outputObj.get("gmt_modified").getAsString()); System.out.println(" Language: " + outputObj.get("language").getAsString()); System.out.println(" Preview Text: " + outputObj.get("preview_text").getAsString()); System.out.println(" Model: " + outputObj.get("target_model").getAsString()); System.out.println(" Voice Description: " + outputObj.get("voice_prompt").getAsString()); } else { // Read the error response StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("Request failed with status code: " + responseCode); System.out.println("Error response: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("An error occurred during the request: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } }
Delete voice
Deletes a voice and release the corresponding quota.
-
URL
Chinese Mainland:
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational:
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization -
Request headers
Parameter
Type
Required
Description
Authorization
string
Authentication token in the format
Bearer <your_api_key>. Replace "<your_api_key>" with your actual API key.Content-Type
string
Media type of the data transmitted in the request body. Fixed as
application/json. Request body
The request body contains all request parameters as shown below. Omit optional fields as needed for your business:
Importantmodel: The voice design model, fixed toqwen-voice-design. Do not modify this value.{ "model": "qwen-voice-design", "input": { "action": "delete", "voice": "yourVoice" } }Request parameters
Parameter
Type
Default
Required
Description
model
string
-
The voice design model, fixed to
qwen-voice-design.action
string
-
The operation type, fixed to
delete.voice
string
-
The voice to be deleted.
Response parameters
The parameters to follow are:
Parameter
Type
Description
request_id
string
The request ID.
voice
string
The deleted voice.
Example code
Importantmodel: The voice design model, fixed toqwen-voice-design. Do not modify this value.cURL
If you haven't configured your API key as an environment variable, replace
$DASHSCOPE_API_KEYin the examples with your actual API key.# ======= Important Note ======= # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key # === Delete this comment before execution === curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-voice-design", "input": { "action": "delete", "voice": "yourVoice" } }'Python
import requests import os def delete_voice(voice_name): """ Delete a specified voice. :param voice_name: The name of the voice. :return: True if deletion is successful or the voice does not exist but the request is successful, False if the operation fails. """ # API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") # Prepare request data headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "delete", "voice": voice_name } } # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" # Send the request response = requests.post( url, headers=headers, json=data ) if response.status_code == 200: result = response.json() # Check for an error message if "code" in result and "VoiceNotFound" in result["code"]: print(f"Voice does not exist: {voice_name}") print(f"Error message: {result.get('message', 'Voice not found')}") return True # If the voice does not exist, the operation is considered successful (because the target is already gone) # Check if deletion was successful if "usage" in result: print(f"Voice deleted successfully: {voice_name}") print(f"Request ID: {result.get('request_id', 'N/A')}") return True else: print(f"Unexpected response format from deletion operation: {result}") return False else: print(f"Request to delete voice failed with status code: {response.status_code}") print(f"Response content: {response.text}") return False def main(): # Example: Delete a voice voice_name = "myvoice" # Replace with the actual voice name you want to delete print(f"Deleting voice: {voice_name}") success = delete_voice(voice_name) if success: print(f"\nDeletion of voice '{voice_name}' completed!") else: print(f"\nDeletion of voice '{voice_name}' failed!") if __name__ == "__main__": main()Java
import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.BufferedReader; import java.io.InputStreamReader; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; public class Main { public static void main(String[] args) { Main example = new Main(); // Example: Delete a voice String voiceName = "myvoice"; // Replace with the actual voice name you want to delete System.out.println("Deleting voice: " + voiceName); example.deleteVoice(voiceName); } public void deleteVoice(String voiceName) { // API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/model-studio/get-api-key // If the environment variable is not set, replace the following line with your Model Studio API key: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Create the JSON request body string String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"delete\",\n" + " \"voice\": \"" + voiceName + "\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // Set the request method and headers connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // Send the request body try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // Get the response int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // Read the response content StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // Parse the JSON response JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); // Check for an error message if (jsonResponse.has("code") && jsonResponse.get("code").getAsString().contains("VoiceNotFound")) { String errorMessage = jsonResponse.has("message") ? jsonResponse.get("message").getAsString() : "Voice not found"; System.out.println("Voice does not exist: " + voiceName); System.out.println("Error message: " + errorMessage); // If the voice does not exist, the operation is considered successful (because the target is already gone) } else if (jsonResponse.has("usage")) { // Check if deletion was successful System.out.println("Voice deleted successfully: " + voiceName); String requestId = jsonResponse.has("request_id") ? jsonResponse.get("request_id").getAsString() : "N/A"; System.out.println("Request ID: " + requestId); } else { System.out.println("Unexpected response format from deletion operation: " + response.toString()); } } else { // Read the error response StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("Request to delete voice failed with status code: " + responseCode); System.out.println("Error response: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("An error occurred during the request: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } }
Speech synthesis
To synthesize personalized speech using voices created by voice design, see Getting started: From voice design to speech synthesis.
Voice design speech synthesis models (such as qwen3-tts-vd-realtime-2026-01-15) are dedicated models. They only support voices created by voice design and do not support system voices such as Chelsie, Serena, Ethan, and Cherry.
Voice quota and automatic cleanup rules
Total limit: 1000 voices per account
You can query the number of voices (
total_count) by calling the Query voice listAutomatic cleanup: If a voice has not been used for any speech synthesis requests in the past year, the system will automatically delete it.
Billing information
Voice design and speech synthesis are billed separately.
Voice design: Create voice is billed at $0.2 per voice. Failed creations are not billed.
NoteFree quota information (only available in the Singapore region):
You get 10 free voice creation opportunities within 90 days after activating Alibaba Cloud Model Studio.
Failed creations do not count against your free quota.
Deleting a voice does not restore free quota usage.
After your free quota is exhausted or the 90-day validity period ends, voice creation is billed at $0.2 per voice.
Using custom voices created by voice design for speech synthesis: Billed per character count. For details, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.
Error messages
If you encounter errors, see Error messages for troubleshooting.