Voice design generates custom voices from text descriptions. It supports multi-language and multi-dimensional voice characteristics, making it suitable for applications such as ad voiceovers, character creation, and audiobook production. Voice design and speech synthesis are two sequential steps. This document focuses on the parameters and interface details of voice design. For more information about speech synthesis, see Real-time speech synthesis - Qwen.
User guide: For model introduction and selection recommendations, see Real-time speech synthesis - Qwen.
Supported languages
Voice design supports voice creation and speech synthesis in multiple languages, including the following: Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), Russian (ru).
How to write high-quality voice descriptions
Limitations
When writing a voice description (voice_prompt), adhere to the following constraints:
Length limit:
voice_promptmust not exceed 2048 characters.Supported languages: The description text can only be in Chinese or English.
Core principles
A high-quality voice description (voice_prompt) is key to creating your ideal voice. It acts as a blueprint that directly guides the model to generate a voice with specific characteristics.
Follow these core principles when describing a voice:
Be specific, not vague: Use words that clearly describe vocal traits, such as "deep," "crisp," or "fast-paced." Avoid subjective and uninformative terms such as "nice-sounding" or "ordinary."
Be multi-dimensional, not single-dimensional: Effective descriptions combine multiple dimensions, such as gender, age, and emotion, as described below. A single-dimension description, such as "female voice," is too broad to produce a distinctive voice.
Be objective, not subjective: Focus on the physical and perceptual features of the voice itself, not personal preferences. For example, use "high-pitched and energetic" instead of "my favorite voice."
Be original, not imitative: Describe vocal traits rather than requesting the imitation of specific people, such as celebrities. Such requests involve copyright risks, and the model does not support direct imitation.
Be concise, not redundant: Ensure every word adds meaning. Avoid repeating synonyms or using meaningless intensifiers, such as "very very nice voice."
Description dimensions reference
Dimension | Example descriptions |
Gender | Male, female, neutral |
Age | Child (5–12 years), teen (13–18 years), young adult (19–35 years), middle-aged (36–55 years), senior (55+ years) |
Pitch | High, mid, low, slightly high, slightly low |
Speaking rate | Fast, medium, slow, slightly fast, slightly slow |
Emotion | Cheerful, calm, gentle, serious, lively, composed, soothing |
Characteristics | Magnetic, crisp, raspy, smooth, sweet, rich, powerful |
Use case | News broadcast, ad voiceover, audiobook, animated character, voice assistant, documentary narration |
Example comparison
✅ Recommended examples
"A young, lively female voice with a fast speaking rate and noticeably rising intonation, suitable for introducing fashion products."
Analysis: This description combines age, personality, speaking rate, and intonation, and specifies a use case, creating a vivid and clear image.
"A calm middle-aged male voice with a slow speaking rate, deep and magnetic tone, ideal for news reading or documentary narration."
Analysis: This description clearly defines gender, age range, speaking rate, tonal qualities, and application domain.
"A cute child’s voice, approximately an 8-year-old girl, with a slightly childish tone, perfect for animated character dubbing."
Analysis: This description specifies an exact age and vocal trait ("childish"), with a clear purpose.
"A gentle and intellectual female voice, around 30 years old, with a calm tone, suitable for audiobook narration."
Analysis: This description effectively conveys emotional and stylistic qualities through words such as "intellectual" and "calm."
❌ Not recommended examples and suggestions
Example | Main issue | Suggestion |
Nice-sounding voice | Too vague and subjective. Lacks actionable features. | Add specific dimensions, for example, "a clear-toned young female voice with a gentle intonation." |
Sounds like a certain celebrity | Involves copyright risk. The model cannot directly imitate a specific person. | Describe the vocal traits instead, for example, "a mature, magnetic male voice with a steady pace." |
Very very very nice female voice | Redundant. Repeated words do not help define the voice. | Remove repetition and add meaningful descriptors, for example, "a 20–24-year-old female voice with a light, upbeat tone and sweet timbre." |
123456 | Invalid input. It cannot be parsed as voice characteristics. | Provide meaningful text descriptions. For more information, see the recommended examples above. |
Getting started: From voice design to speech synthesis
1. Workflow
Voice design and speech synthesis are two closely linked but independent steps that follow a "create first, then use" workflow:
Prepare the voice description and preview text for voice design.
Voice description (voice_prompt): Defines the target voice characteristics. For guidance, see "How to write high-quality voice descriptions."
Preview text (preview_text): The text that the preview audio will read aloud, for example, "Hello everyone, welcome to the show."
Call the Create voice API to generate a custom voice and get its name and preview audio.
In this step, you must specify
target_modelto declare which speech synthesis model will drive the created voice.Listen to the preview audio to evaluate if it meets your expectations. If it does, proceed. If not, redesign the voice.
If you already have a created voice, which you can verify using the List voices API, you can skip this step and proceed to the next one.
Use the voice for speech synthesis.
Call the speech synthesis API and pass the voice obtained in the previous step. The speech synthesis model used here must match the
target_modelspecified in the previous step.
2. Model configurations and preparations
Select the appropriate model and complete the setup tasks.
Model configurations
Specify the following two models during voice design:
Voice design model: qwen-voice-design
Speech synthesis model that drives the voice: Currently, only qwen3-tts-vd-realtime-2025-12-16 is supported.
Preparations
Get an API key: Get and Configure an API Key. For security, store your API key in an environment variable.
Install the SDK: Install the latest DashScope SDK.
3. Sample Code
Generate a custom voice and listen to the preview. If you are satisfied, proceed. Otherwise, regenerate the voice.
Python
import requests import base64 import os def create_voice_and_play(): # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If you haven't set an environment variable, replace the line below with: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") if not api_key: print("Error: DASHSCOPE_API_KEY environment variable not found. Please set your API key.") return None, None, None # Prepare request data headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-realtime-2025-12-16", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "en" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } } # URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" try: # Send request response = requests.post( url, headers=headers, json=data, timeout=60 # Add timeout setting ) if response.status_code == 200: result = response.json() # Get voice name voice_name = result["output"]["voice"] print(f"Voice name: {voice_name}") # Get preview audio data base64_audio = result["output"]["preview_audio"]["data"] # Decode Base64 audio data audio_bytes = base64.b64decode(base64_audio) # Save audio file locally filename = f"{voice_name}_preview.wav" # Write audio data to local file with open(filename, 'wb') as f: f.write(audio_bytes) print(f"Audio saved to local file: {filename}") print(f"File path: {os.path.abspath(filename)}") return voice_name, audio_bytes, filename else: print(f"Request failed. Status code: {response.status_code}") print(f"Response: {response.text}") return None, None, None except requests.exceptions.RequestException as e: print(f"Network request error: {e}") return None, None, None except KeyError as e: print(f"Response format error: missing required field: {e}") print(f"Response: {response.text if 'response' in locals() else 'No response'}") return None, None, None except Exception as e: print(f"Unexpected error: {e}") return None, None, None if __name__ == "__main__": print("Creating voice...") voice_name, audio_data, saved_filename = create_voice_and_play() if voice_name: print(f"\nSuccessfully created voice '{voice_name}'") print(f"Audio file saved: '{saved_filename}'") print(f"File size: {os.path.getsize(saved_filename)} bytes") else: print("\nVoice creation failed")Java
You need to import the Gson dependency. If you use Maven or Gradle, add the dependency:
Maven
Add the following content to
pom.xml:<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson --> <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.13.1</version> </dependency>Gradle
Add the following content to
build.gradle:// https://mvnrepository.com/artifact/com.google.code.gson/gson implementation("com.google.code.gson:gson:2.13.1")import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.*; import java.net.HttpURLConnection; import java.net.URL; import java.util.Base64; public class Main { public static void main(String[] args) { Main example = new Main(); example.createVoice(); } public void createVoice() { // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If you haven't set an environment variable, replace the line below with: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Create the JSON request body string String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"create\",\n" + " \"target_model\": \"qwen3-tts-vd-realtime-2025-12-16\",\n" + " \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" + " \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" + " \"preferred_name\": \"announcer\",\n" + " \"language\": \"en\"\n" + " },\n" + " \"parameters\": {\n" + " \"sample_rate\": 24000,\n" + " \"response_format\": \"wav\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // Set request method and headers connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // Send request body try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // Get response int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // Read response content StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // Parse JSON response JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); JsonObject outputObj = jsonResponse.getAsJsonObject("output"); JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio"); // Get voice name String voiceName = outputObj.get("voice").getAsString(); System.out.println("Voice name: " + voiceName); // Get Base64-encoded audio data String base64Audio = previewAudioObj.get("data").getAsString(); // Decode Base64 audio data byte[] audioBytes = Base64.getDecoder().decode(base64Audio); // Save audio to a local file String filename = voiceName + "_preview.wav"; saveAudioToFile(audioBytes, filename); System.out.println("Audio saved to local file: " + filename); } else { // Read error response StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("Request failed. Status code: " + responseCode); System.out.println("Error response: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("Request error: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } private void saveAudioToFile(byte[] audioBytes, String filename) { try { File file = new File(filename); try (FileOutputStream fos = new FileOutputStream(file)) { fos.write(audioBytes); } System.out.println("Audio saved to: " + file.getAbsolutePath()); } catch (IOException e) { System.err.println("Error saving audio file: " + e.getMessage()); e.printStackTrace(); } } }Use the custom voice generated in the previous step for speech synthesis.
This example is based on the "server commit mode" of the DashScope SDK for speech synthesis using a system voice. Replace the
voiceparameter with the custom voice generated by voice design.Key Principle: The model used during voice design (
target_model) must be the same as the model used for subsequent speech synthesis (model). Otherwise, the synthesis will fail.Python
# coding=utf-8 # Installation instructions for pyaudio: # APPLE Mac OS X # brew install portaudio # pip install pyaudio # Debian/Ubuntu # sudo apt-get install python-pyaudio python3-pyaudio # or # pip install pyaudio # CentOS # sudo yum install -y portaudio portaudio-devel && pip install pyaudio # Microsoft Windows # python -m pip install pyaudio import pyaudio import os import base64 import threading import time import dashscope # DashScope Python SDK version 1.23.9 or later is required from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat # ======= Constant Configuration ======= TEXT_TO_SYNTHESIZE = [ 'Right? I just love this kind of supermarket,', 'especially during the New Year.', 'Going to the supermarket', 'just makes me feel', 'super, super happy!', 'I want to buy so many things!' ] def init_dashscope_api_key(): """ Initializes the DashScope SDK API key """ # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If you haven't set an environment variable, replace the line below with: dashscope.api_key = "sk-xxx" dashscope.api_key = os.getenv("DASHSCOPE_API_KEY") # ======= Callback Class ======= class MyCallback(QwenTtsRealtimeCallback): """ Custom TTS streaming callback """ def __init__(self): self.complete_event = threading.Event() self._player = pyaudio.PyAudio() self._stream = self._player.open( format=pyaudio.paInt16, channels=1, rate=24000, output=True ) def on_open(self) -> None: print('[TTS] Connection established') def on_close(self, close_status_code, close_msg) -> None: self._stream.stop_stream() self._stream.close() self._player.terminate() print(f'[TTS] Connection closed, code={close_status_code}, msg={close_msg}') def on_event(self, response: dict) -> None: try: event_type = response.get('type', '') if event_type == 'session.created': print(f'[TTS] Session started: {response["session"]["id"]}') elif event_type == 'response.audio.delta': audio_data = base64.b64decode(response['delta']) self._stream.write(audio_data) elif event_type == 'response.done': print(f'[TTS] Response complete, Response ID: {qwen_tts_realtime.get_last_response_id()}') elif event_type == 'session.finished': print('[TTS] Session finished') self.complete_event.set() except Exception as e: print(f'[Error] Exception processing callback event: {e}') def wait_for_finished(self): self.complete_event.wait() # ======= Main Execution Logic ======= if __name__ == '__main__': init_dashscope_api_key() print('[System] Initializing Qwen TTS Realtime ...') callback = MyCallback() qwen_tts_realtime = QwenTtsRealtime( # Voice design and speech synthesis must use the same model model="qwen3-tts-vd-realtime-2025-12-16", callback=callback, # URL for Singapore region. For Beijing region, use: wss://dashscope.aliyuncs.com/api-ws/v1/realtime url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime' ) qwen_tts_realtime.connect() qwen_tts_realtime.update_session( voice="myvoice", # Replace the voice parameter with the custom voice generated by voice design response_format=AudioFormat.PCM_24000HZ_MONO_16BIT, mode='server_commit' ) for text_chunk in TEXT_TO_SYNTHESIZE: print(f'[Sending text]: {text_chunk}') qwen_tts_realtime.append_text(text_chunk) time.sleep(0.1) qwen_tts_realtime.finish() callback.wait_for_finished() print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, ' f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')Java
import com.alibaba.dashscope.audio.qwen_tts_realtime.*; import com.alibaba.dashscope.exception.NoApiKeyException; import com.google.gson.JsonObject; import javax.sound.sampled.*; import java.io.*; import java.util.Base64; import java.util.Queue; import java.util.concurrent.CountDownLatch; import java.util.concurrent.atomic.AtomicReference; import java.util.concurrent.ConcurrentLinkedQueue; import java.util.concurrent.atomic.AtomicBoolean; public class Main { // ===== Constant Definitions ===== private static String[] textToSynthesize = { "Right? I just love this kind of supermarket,", "especially during the New Year.", "Going to the supermarket", "just makes me feel", "super, super happy!", "I want to buy so many things!" }; // Real-time audio player class public static class RealtimePcmPlayer { private int sampleRate; private SourceDataLine line; private AudioFormat audioFormat; private Thread decoderThread; private Thread playerThread; private AtomicBoolean stopped = new AtomicBoolean(false); private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>(); private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>(); // Constructor initializes audio format and audio line public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException { this.sampleRate = sampleRate; this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false); DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat); line = (SourceDataLine) AudioSystem.getLine(info); line.open(audioFormat); line.start(); decoderThread = new Thread(new Runnable() { @Override public void run() { while (!stopped.get()) { String b64Audio = b64AudioBuffer.poll(); if (b64Audio != null) { byte[] rawAudio = Base64.getDecoder().decode(b64Audio); RawAudioBuffer.add(rawAudio); } else { try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); } } } } }); playerThread = new Thread(new Runnable() { @Override public void run() { while (!stopped.get()) { byte[] rawAudio = RawAudioBuffer.poll(); if (rawAudio != null) { try { playChunk(rawAudio); } catch (IOException e) { throw new RuntimeException(e); } catch (InterruptedException e) { throw new RuntimeException(e); } } else { try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); } } } } }); decoderThread.start(); playerThread.start(); } // Plays an audio chunk and blocks until playback is complete private void playChunk(byte[] chunk) throws IOException, InterruptedException { if (chunk == null || chunk.length == 0) return; int bytesWritten = 0; while (bytesWritten < chunk.length) { bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten); } int audioLength = chunk.length / (this.sampleRate*2/1000); // Wait for the audio in the buffer to finish playing Thread.sleep(audioLength - 10); } public void write(String b64Audio) { b64AudioBuffer.add(b64Audio); } public void cancel() { b64AudioBuffer.clear(); RawAudioBuffer.clear(); } public void waitForComplete() throws InterruptedException { while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) { Thread.sleep(100); } line.drain(); } public void shutdown() throws InterruptedException { stopped.set(true); decoderThread.join(); playerThread.join(); if (line != null && line.isRunning()) { line.drain(); line.close(); } } } public static void main(String[] args) throws Exception { QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder() // Voice design and speech synthesis must use the same model .model("qwen3-tts-vd-realtime-2025-12-16") // URL for Singapore region. For Beijing region, use: wss://dashscope.aliyuncs.com/api-ws/v1/realtime .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime") // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If you haven't set an environment variable, replace the line below with: .apikey("sk-xxx") .apikey(System.getenv("DASHSCOPE_API_KEY")) .build(); AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1)); final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null); // Create a real-time audio player instance RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000); QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() { @Override public void onOpen() { // Handle connection open } @Override public void onEvent(JsonObject message) { String type = message.get("type").getAsString(); switch(type) { case "session.created": // Handle session creation break; case "response.audio.delta": String recvAudioB64 = message.get("delta").getAsString(); // Play audio in real time audioPlayer.write(recvAudioB64); break; case "response.done": // Handle response completion break; case "session.finished": // Handle session finish completeLatch.get().countDown(); default: break; } } @Override public void onClose(int code, String reason) { // Handle connection close } }); qwenTtsRef.set(qwenTtsRealtime); try { qwenTtsRealtime.connect(); } catch (NoApiKeyException e) { throw new RuntimeException(e); } QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder() .voice("myvoice") // Replace the voice parameter with the custom voice generated by voice design .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT) .mode("server_commit") .build(); qwenTtsRealtime.updateSession(config); for (String text:textToSynthesize) { qwenTtsRealtime.appendText(text); Thread.sleep(100); } qwenTtsRealtime.finish(); completeLatch.get().await(); // Wait for audio playback to complete and then shut down the player audioPlayer.waitForComplete(); audioPlayer.shutdown(); System.exit(0); } }
API reference
Ensure that you use the same account when calling different APIs.
Create voice
Creates a custom voice by providing a voice description and preview text.
URL
China (Beijing):
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational (Singapore):
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customizationRequest headers
Parameter
Type
Required
Description
Authorization
string
Authentication token. Format:
Bearer <your_api_key>. Replace "<your_api_key>" with your actual API key.Content-Type
string
The media type of the data transmitted in the request body. Fixed value:
application/json.Request body
The request body contains all request parameters. Omit optional fields as needed.
ImportantNote the difference between the following parameters:
model: The voice design model. The value is fixed at qwen-voice-design.target_model: The speech synthesis model that drives the voice. It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, the synthesis will fail.
{ "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-realtime-2025-12-16", "voice_prompt": "A calm middle-aged male announcer with a deep, rich, and magnetic voice, steady speaking speed, and clear articulation, suitable for news broadcasting or documentary narration.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "en" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } }Request parameters
Parameter
Type
Default
Required
Description
model
string
-
The voice design model. Fixed value:
qwen-voice-design.action
string
-
The operation type. Fixed value:
create.target_model
string
-
The speech synthesis model that drives the voice. Currently, only qwen3-tts-vd-realtime-2025-12-16 is supported.
It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, synthesis will fail.
voice_prompt
string
-
Voice description. Maximum length: 2048 characters.
Only Chinese and English are supported.
For guidance on writing voice descriptions, see "How to write high-quality voice descriptions".
preview_text
string
-
The text for the preview audio. Maximum length: 1024 characters.
Supports Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), Russian (ru).
preferred_name
string
-
Assign an easy-to-identify name to the voice (only numbers, letters, and underscores are allowed; max 16 characters). We recommend using an identifier related to the character or scenario.
The keyword will appear in the designed voice name. For example, if the keyword is "announcer", the final voice name will be "qwen-tts-vd-announcer-voice-20251201102800-a1b2"
language
string
zh
Language code. Specifies the language preference for the generated voice. This parameter affects the linguistic features and pronunciation tendencies of the voice. Choose the code that matches your use case.
If you use this parameter, the language must match the language of the
preview_text.Valid values:
zh(Chinese),en(English),de(German),it(Italian),pt(Portuguese),es(Spanish),ja(Japanese),ko(Korean),fr(French),ru(Russian).sample_rate
int
24000
The sample rate (in Hz) of the preview audio generated by voice design.
Valid values:
8000
16000
24000
48000
response_format
string
wav
The format of the preview audio generated by voice design.
Valid values:
pcm
wav
mp3
opus
Response parameters
The key parameters are:
Parameter
Type
Description
voice
string
The voice name. You can use it directly as the
voiceparameter in the speech synthesis API.data
string
The preview audio data generated by voice design, returned as a Base64-encoded string.
sample_rate
int
The sample rate (in Hz) of the preview audio generated by voice design. It matches the sample rate set during voice creation. The default is 24000 Hz.
response_format
string
The format of the preview audio generated by voice design. It matches the audio format set during voice creation. The default is wav.
target_model
string
The speech synthesis model that drives the voice. Currently, only qwen3-tts-vd-realtime-2025-12-16 is supported.
It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, synthesis will fail.
request_id
string
Request ID.
count
integer
The number of "Create voice" operations billed for this request. The cost for this request is $
. For voice creation, count is always 1.
Sample code
ImportantNote the difference between the following parameters:
model: The voice design model. The value is fixed at qwen-voice-design.target_model: The speech synthesis model that drives the voice. It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, the synthesis will fail.
cURL
If you have not set your API key as an environment variable, replace
$DASHSCOPE_API_KEYin the example with your actual API key.# ======= Important ======= # The URL below is for the Singapore region. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # === Delete this comment before execution === curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-realtime-2025-12-16", "voice_prompt": "A calm middle-aged male announcer with a deep, rich, and magnetic voice, steady speaking speed, and clear articulation, suitable for news broadcasting or documentary narration.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "en" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } }'Python
import requests import base64 import os def create_voice_and_play(): # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If you haven't set an environment variable, replace the line below with: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") if not api_key: print("Error: DASHSCOPE_API_KEY environment variable not found. Please set your API key.") return None, None, None # Prepare request data headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-realtime-2025-12-16", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "en" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } } # URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" try: # Send request response = requests.post( url, headers=headers, json=data, timeout=60 # Add timeout setting ) if response.status_code == 200: result = response.json() # Get voice name voice_name = result["output"]["voice"] print(f"Voice name: {voice_name}") # Get preview audio data base64_audio = result["output"]["preview_audio"]["data"] # Decode Base64 audio data audio_bytes = base64.b64decode(base64_audio) # Save audio file locally filename = f"{voice_name}_preview.wav" # Write audio data to local file with open(filename, 'wb') as f: f.write(audio_bytes) print(f"Audio saved to local file: {filename}") print(f"File path: {os.path.abspath(filename)}") return voice_name, audio_bytes, filename else: print(f"Request failed. Status code: {response.status_code}") print(f"Response: {response.text}") return None, None, None except requests.exceptions.RequestException as e: print(f"Network request error: {e}") return None, None, None except KeyError as e: print(f"Response format error: missing required field: {e}") print(f"Response: {response.text if 'response' in locals() else 'No response'}") return None, None, None except Exception as e: print(f"Unexpected error: {e}") return None, None, None if __name__ == "__main__": print("Creating voice...") voice_name, audio_data, saved_filename = create_voice_and_play() if voice_name: print(f"\nSuccessfully created voice '{voice_name}'") print(f"Audio file saved: '{saved_filename}'") print(f"File size: {os.path.getsize(saved_filename)} bytes") else: print("\nVoice creation failed")Java
import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.*; import java.net.HttpURLConnection; import java.net.URL; import java.util.Base64; public class Main { public static void main(String[] args) { Main example = new Main(); example.createVoice(); } public void createVoice() { // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If you haven't set an environment variable, replace the line below with: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Create the JSON request body string String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"create\",\n" + " \"target_model\": \"qwen3-tts-vd-realtime-2025-12-16\",\n" + " \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" + " \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" + " \"preferred_name\": \"announcer\",\n" + " \"language\": \"en\"\n" + " },\n" + " \"parameters\": {\n" + " \"sample_rate\": 24000,\n" + " \"response_format\": \"wav\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // Set request method and headers connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // Send request body try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // Get response int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // Read response content StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // Parse JSON response JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); JsonObject outputObj = jsonResponse.getAsJsonObject("output"); JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio"); // Get voice name String voiceName = outputObj.get("voice").getAsString(); System.out.println("Voice name: " + voiceName); // Get Base64-encoded audio data String base64Audio = previewAudioObj.get("data").getAsString(); // Decode Base64 audio data byte[] audioBytes = Base64.getDecoder().decode(base64Audio); // Save audio to a local file String filename = voiceName + "_preview.wav"; saveAudioToFile(audioBytes, filename); System.out.println("Audio saved to local file: " + filename); } else { // Read error response StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("Request failed. Status code: " + responseCode); System.out.println("Error response: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("Request error: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } private void saveAudioToFile(byte[] audioBytes, String filename) { try { File file = new File(filename); try (FileOutputStream fos = new FileOutputStream(file)) { fos.write(audioBytes); } System.out.println("Audio saved to: " + file.getAbsolutePath()); } catch (IOException e) { System.err.println("Error saving audio file: " + e.getMessage()); e.printStackTrace(); } } }
List voices
Performa a paged query to list created voices.
URL
China (Beijing):
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational (Singapore):
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customizationRequest header
Parameter
Type
Required
Description
Authorization
string
An authentication token. The format is
Bearer <your_api_key>. Replace "<your_api_key>" with your actual API key.Content-Type
string
The media type of the data in the request body. The value is fixed to
application/json.Request body
The request body contains all request parameters. You can omit optional fields as needed.
Importantmodel: The voice design model. The value is fixed atqwen-voice-design. Do not modify this value.{ "model": "qwen-voice-design", "input": { "action": "list", "page_size": 10, "page_index": 0 } }Request parameters
Parameter
Type
Default
Required
Description
model
string
-
The voice design model. Fixed value:
qwen-voice-design.action
string
-
The operation type. Fixed value:
list.page_index
integer
0
Page index. Value range: [0, 200].
page_size
integer
10
The number of data entries per page. Value must be greater than 0.
Response parameters
The key parameters are:
Parameter
Type
Description
voice
string
The voice name. You can use it directly as the
voiceparameter in the speech synthesis API.target_model
string
The speech synthesis model that drives the voice. Currently, only qwen3-tts-vd-realtime-2025-12-16 is supported.
It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, synthesis will fail.
language
string
Language code.
Valid values:
zh(Chinese),en(English),de(German),it(Italian),pt(Portuguese),es(Spanish),ja(Japanese),ko(Korean),fr(French),ru(Russian).voice_prompt
string
Voice description.
preview_text
string
Preview text.
gmt_create
string
The time the voice was created.
gmt_modified
string
The time the voice was modified.
page_index
integer
Page index.
page_size
integer
The number of data entries per page.
total_count
integer
The total number of data entries found.
request_id
string
Request ID.
Sample code
Importantmodel: The voice design model. The value is fixed atqwen-voice-design. Do not modify this value.cURL
If you have not set the API key as an environment variable, replace
$DASHSCOPE_API_KEYin the example with your actual API key.# ======= Important ======= # The URL below is for the Singapore region. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # === Delete this comment before execution === curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-voice-design", "input": { "action": "list", "page_size": 10, "page_index": 0 } }'Python
import os import requests # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If you haven't set an environment variable, replace the line below with: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") # URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" payload = { "model": "qwen-voice-design", # Do not modify this value "input": { "action": "list", "page_size": 10, "page_index": 0 } } headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } response = requests.post(url, json=payload, headers=headers) print("HTTP Status Code:", response.status_code) if response.status_code == 200: data = response.json() voice_list = data["output"]["voice_list"] print("Queried voice list:") for item in voice_list: print(f"- Voice: {item['voice']} Created: {item['gmt_create']} Model: {item['target_model']}") else: print("Request failed:", response.text)Java
import com.google.gson.Gson; import com.google.gson.JsonArray; import com.google.gson.JsonObject; import java.io.BufferedReader; import java.io.InputStreamReader; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; public class Main { public static void main(String[] args) { // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If you haven't set an environment variable, replace the line below with: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"; // JSON request body (older Java versions do not support """ multiline strings) String jsonPayload = "{" + "\"model\": \"qwen-voice-design\"," // Do not modify this value + "\"input\": {" + "\"action\": \"list\"," + "\"page_size\": 10," + "\"page_index\": 0" + "}" + "}"; try { HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection(); con.setRequestMethod("POST"); con.setRequestProperty("Authorization", "Bearer " + apiKey); con.setRequestProperty("Content-Type", "application/json"); con.setDoOutput(true); try (OutputStream os = con.getOutputStream()) { os.write(jsonPayload.getBytes("UTF-8")); } int status = con.getResponseCode(); BufferedReader br = new BufferedReader(new InputStreamReader( status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8")); StringBuilder response = new StringBuilder(); String line; while ((line = br.readLine()) != null) { response.append(line); } br.close(); System.out.println("HTTP Status Code: " + status); System.out.println("Returned JSON: " + response.toString()); if (status == 200) { Gson gson = new Gson(); JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class); JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list"); System.out.println("\n Queried voice list:"); for (int i = 0; i < voiceList.size(); i++) { JsonObject voiceItem = voiceList.get(i).getAsJsonObject(); String voice = voiceItem.get("voice").getAsString(); String gmtCreate = voiceItem.get("gmt_create").getAsString(); String targetModel = voiceItem.get("target_model").getAsString(); System.out.printf("- Voice: %s Created: %s Model: %s\n", voice, gmtCreate, targetModel); } } } catch (Exception e) { e.printStackTrace(); } } }
Query a specific voice
Retrieves detailed information about a specific voice by its name.
URL
China (Beijing):
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational (Singapore):
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customizationRequest header
Parameter
Type
Required
Description
Authorization
string
An authentication token. The format is
Bearer <your_api_key>. Replace "<your_api_key>" with your actual API key.Content-Type
string
The media type of the data in the request body. The value is fixed to
application/json.Request body
The request body contains all request parameters. You can omit optional fields as needed.
Importantmodel: The voice design model. The value is fixed atqwen-voice-design. Do not modify this value.{ "model": "qwen-voice-design", "input": { "action": "query", "voice": "voiceName" } }Request parameters
Parameter
Type
Default
Required
Description
model
string
-
The voice design model. Fixed value:
qwen-voice-design.action
string
-
The operation type. Fixed value:
query.voice
string
-
The name of the voice to query.
Response parameters
The key parameters are:
Parameter
Type
Description
voice
string
The voice name. You can use it directly as the
voiceparameter in the speech synthesis API.target_model
string
The speech synthesis model that drives the voice. Currently, only qwen3-tts-vd-realtime-2025-12-16 is supported.
It must be consistent with the speech synthesis model used in subsequent API calls. Otherwise, synthesis will fail.
language
string
Language code.
Valid values:
zh(Chinese),en(English),de(German),it(Italian),pt(Portuguese),es(Spanish),ja(Japanese),ko(Korean),fr(French),ru(Russian).voice_prompt
string
Voice description.
preview_text
string
Preview text.
gmt_create
string
The time the voice was created.
gmt_modified
string
The time the voice was modified.
request_id
string
Request ID.
Sample code
Importantmodel: The voice design model. The value is fixed atqwen-voice-design. Do not modify this value.cURL
If you have not set the API key as an environment variable, replace
$DASHSCOPE_API_KEYin the example with your actual API key.# ======= Important ======= # The URL below is for the Singapore region. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # === Delete this comment before execution === curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-voice-design", "input": { "action": "query", "voice": "voiceName" } }'Python
import requests import os def query_voice(voice_name): """ Query information for a specific voice :param voice_name: The name of the voice :return: A dictionary with voice information, or None if not found """ # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If you haven't set an environment variable, replace the line below with: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") # Prepare request data headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "query", "voice": voice_name } } # URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" # Send request response = requests.post( url, headers=headers, json=data ) if response.status_code == 200: result = response.json() # Check for error message if "code" in result and result["code"] == "VoiceNotFound": print(f"Voice not found: {voice_name}") print(f"Error message: {result.get('message', 'Voice not found')}") return None # Get voice information voice_info = result["output"] print(f"Successfully queried voice information:") print(f" Voice Name: {voice_info.get('voice')}") print(f" Created: {voice_info.get('gmt_create')}") print(f" Modified: {voice_info.get('gmt_modified')}") print(f" Language: {voice_info.get('language')}") print(f" Preview Text: {voice_info.get('preview_text')}") print(f" Model: {voice_info.get('target_model')}") print(f" Voice Prompt: {voice_info.get('voice_prompt')}") return voice_info else: print(f"Request failed. Status code: {response.status_code}") print(f"Response: {response.text}") return None def main(): # Example: Query a voice voice_name = "myvoice" # Replace with the actual voice name you want to query print(f"Querying voice: {voice_name}") voice_info = query_voice(voice_name) if voice_info: print("\nVoice query successful!") else: print("\nVoice query failed or voice does not exist.") if __name__ == "__main__": main()Java
import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.BufferedReader; import java.io.InputStreamReader; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; public class Main { public static void main(String[] args) { Main example = new Main(); // Example: Query a voice String voiceName = "myvoice"; // Replace with the actual voice name you want to query System.out.println("Querying voice: " + voiceName); example.queryVoice(voiceName); } public void queryVoice(String voiceName) { // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If you haven't set an environment variable, replace the line below with: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Create the JSON request body string String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"query\",\n" + " \"voice\": \"" + voiceName + "\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // Set request method and headers connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // Send request body try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // Get response int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // Read response content StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // Parse JSON response JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); // Check for error message if (jsonResponse.has("code") && "VoiceNotFound".equals(jsonResponse.get("code").getAsString())) { String errorMessage = jsonResponse.has("message") ? jsonResponse.get("message").getAsString() : "Voice not found"; System.out.println("Voice not found: " + voiceName); System.out.println("Error message: " + errorMessage); return; } // Get voice information JsonObject outputObj = jsonResponse.getAsJsonObject("output"); System.out.println("Successfully queried voice information:"); System.out.println(" Voice Name: " + outputObj.get("voice").getAsString()); System.out.println(" Created: " + outputObj.get("gmt_create").getAsString()); System.out.println(" Modified: " + outputObj.get("gmt_modified").getAsString()); System.out.println(" Language: " + outputObj.get("language").getAsString()); System.out.println(" Preview Text: " + outputObj.get("preview_text").getAsString()); System.out.println(" Model: " + outputObj.get("target_model").getAsString()); System.out.println(" Voice Prompt: " + outputObj.get("voice_prompt").getAsString()); } else { // Read error response StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("Request failed. Status code: " + responseCode); System.out.println("Error response: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("Request error: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } }
Delete voice
Deletes a specified voice and release the corresponding quota.
URL
China (Beijing):
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational (Singapore):
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customizationRequest header
Parameter
Type
Required
Description
Authorization
string
An authentication token. The format is
Bearer <your_api_key>. Replace "<your_api_key>" with your actual API key.Content-Type
string
The media type of the data in the request body. The value is fixed to
application/json.Request body
The request body contains all request parameters. You can omit optional fields as needed:
Importantmodel: The voice design model. The value is fixed atqwen-voice-design. Do not modify this value.{ "model": "qwen-voice-design", "input": { "action": "delete", "voice": "yourVoice" } }Request parameters
Parameter
Type
Default
Required
Description
model
string
-
The voice design model. Fixed value:
qwen-voice-design.action
string
-
The operation type. Fixed value:
delete.voice
string
-
The voice to be deleted.
Response parameters
The key parameters are:
Parameter
Type
Description
request_id
string
Request ID.
voice
string
The deleted voice.
Sample code
Importantmodel: The voice design model. The value is fixed atqwen-voice-design. Do not modify this value.cURL
If you have not set the API key as an environment variable, replace
$DASHSCOPE_API_KEYin the example with your actual API key.# ======= Important ======= # The URL below is for the Singapore region. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # === Delete this comment before execution === curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-voice-design", "input": { "action": "delete", "voice": "yourVoice" } }'Python
import requests import os def delete_voice(voice_name): """ Delete a specified voice :param voice_name: The name of the voice :return: True if deletion is successful or the voice does not exist but the request succeeds, False if the operation fails """ # API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If you haven't set an environment variable, replace the line below with: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") # Prepare request data headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "delete", "voice": voice_name } } # URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" # Send request response = requests.post( url, headers=headers, json=data ) if response.status_code == 200: result = response.json() # Check for error message if "code" in result and "VoiceNotFound" in result["code"]: print(f"Voice does not exist: {voice_name}") print(f"Error message: {result.get('message', 'Voice not found')}") return True # Voice not existing is also a successful operation (target is already gone) # Check if deletion was successful if "usage" in result: print(f"Voice deleted successfully: {voice_name}") print(f"Request ID: {result.get('request_id', 'N/A')}") return True else: print(f"Delete operation returned an unexpected format: {result}") return False else: print(f"Delete voice request failed. Status code: {response.status_code}") print(f"Response: {response.text}") return False def main(): # Example: Delete a voice voice_name = "myvoice" # Replace with the actual voice name you want to delete print(f"Deleting voice: {voice_name}") success = delete_voice(voice_name) if success: print(f"\nVoice '{voice_name}' deletion operation complete!") else: print(f"\nVoice '{voice_name}' deletion operation failed!") if __name__ == "__main__": main()Java
import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.BufferedReader; import java.io.InputStreamReader; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; public class Main { public static void main(String[] args) { Main example = new Main(); // Example: Delete a voice String voiceName = "myvoice"; // Replace with the actual voice name you want to delete System.out.println("Deleting voice: " + voiceName); example.deleteVoice(voiceName); } public void deleteVoice(String voiceName) { // API keys differ between Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If you haven't set an environment variable, replace the line below with: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Create the JSON request body string String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"delete\",\n" + " \"voice\": \"" + voiceName + "\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // URL for Singapore region. For Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // Set request method and headers connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // Send request body try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // Get response int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // Read response content StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // Parse JSON response JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); // Check for error message if (jsonResponse.has("code") && jsonResponse.get("code").getAsString().contains("VoiceNotFound")) { String errorMessage = jsonResponse.has("message") ? jsonResponse.get("message").getAsString() : "Voice not found"; System.out.println("Voice does not exist: " + voiceName); System.out.println("Error message: " + errorMessage); // Voice not existing is also a successful operation (target is already gone) } else if (jsonResponse.has("usage")) { // Check if deletion was successful System.out.println("Voice deleted successfully: " + voiceName); String requestId = jsonResponse.has("request_id") ? jsonResponse.get("request_id").getAsString() : "N/A"; System.out.println("Request ID: " + requestId); } else { System.out.println("Delete operation returned an unexpected format: " + response.toString()); } } else { // Read error response StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("Delete voice request failed. Status code: " + responseCode); System.out.println("Error response: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("Request error: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } }
Speech synthesis
To learn how to use a custom voice from voice design for personalized speech synthesis, see Getting started: From voice design to speech synthesis.
Speech synthesis models for voice design, such as qwen3-tts-vd-realtime-2025-12-16, are specialized. They only support voices generated by voice design and do not support public preset voices such as Chelsie, Serena, Ethan, or Cherry.
Voice quota and auto-cleanup
Total limit: 1,000 voices per account
You can check the number of voices (
total_count) by calling the List voicesAuto-cleanup: If a voice has not been used in any speech synthesis request in the past year, the system automatically deletes it.
Billing
Voice design and speech synthesis are billed separately:
Voice design: Creating a voice is billed at $0.2 per voice. Failed creations are not billed.
NoteFree quota details (Singapore region only):
Within 90 days of activating Alibaba Cloud Model Studio, you receive 1,000 free voice creation opportunities.
Failed creations do not consume the free quota.
Deleting a voice does not restore the free quota.
After the free quota is used up or the 90-day validity period expires, voice creation is billed at $0.2 per voice.
Speech synthesis using custom voices from voice design: Billed based on the number of text characters. For details, see Real-time speech synthesis - Qwen.
Error messages
If you encounter an error, see Error messages for troubleshooting.