Create custom voices from text descriptions specifying gender, age, tone, and pace. The API returns a reusable voice name and preview audio clip. Use the voice name with the Speech synthesis - Qwen or Real-time speech synthesis - Qwen API.
Voice creation and synthesis are separate steps. The target_model you specify when creating a voice must match the model you use for synthesis, or synthesis will fail.
Prerequisites
-
Get an API key and store it as the
DASHSCOPE_API_KEYenvironment variable. -
Install the latest DashScope SDK (SDK examples only).
How it works
-
Write a voice description (
voice_prompt) and preview text (preview_text). -
Submit a Create voice request with your chosen
target_model. -
The API returns a voice name and Base64-encoded preview audio.
-
Listen to the preview. If satisfied, use the voice name with the speech synthesis API. Otherwise, create a new voice.
target_model from step 2.Supported models
Voice design requires two models: a design model and a target speech synthesis model.
| Model | Value | Use with |
|---|---|---|
| Voice design model | qwen-voice-design |
All voice design operations (fixed value) |
| Real-time synthesis target | qwen3-tts-vd-realtime-2026-01-15 |
Real-time speech synthesis - Qwen |
| Real-time synthesis target (earlier version) | qwen3-tts-vd-realtime-2025-12-16 |
Real-time speech synthesis - Qwen |
| Non-real-time synthesis target | qwen3-tts-vd-2026-01-26 |
Speech synthesis - Qwen |
qwen3-tts-vd-*) only support voices created through voice design. System voices (Chelsie, Serena, Ethan, Cherry) are not supported.Language support
Supported languages for voice creation and speech synthesis:
| Code | Language |
|---|---|
zh |
Chinese |
en |
English |
de |
German |
it |
Italian |
pt |
Portuguese |
es |
Spanish |
ja |
Japanese |
ko |
Korean |
fr |
French |
ru |
Russian |
The voice_prompt description text supports Chinese and English only. The language parameter must match the preview_text language.
Write effective voice descriptions
A voice description (voice_prompt) defines what voice to generate. Combine attributes (gender, age, tone, use case) for a distinctive voice.
Requirements and limitations
-
Maximum length: 2,048 characters.
-
Supported languages: Chinese and English only.
Description dimensions
| Dimension | Examples |
|---|---|
| Gender | Male, female, neutral |
| Age | Child (5--12), teenager (13--18), young adult (19--35), middle-aged (36--55), elderly (55+) |
| Pitch | High, medium, low, high-pitched, low-pitched |
| Pace | Fast, medium, slow, fast-paced, slow-paced |
| Emotion | Cheerful, calm, gentle, serious, lively, composed, soothing |
| Characteristics | Magnetic, crisp, hoarse, mellow, sweet, rich, powerful |
| Use case | News broadcast, ad voice-over, audiobook, animation character, voice assistant, documentary narration |
Principles for effective descriptions
-
Be specific. Use concrete voice qualities like "deep," "crisp," or "fast-paced." Avoid subjective terms like "nice" or "normal."
-
Combine multiple dimensions (gender, age, emotion, use case). Single-dimension descriptions like "female voice" are too broad.
-
Be objective. Describe physical and perceptual features, not opinions ("high-pitched and energetic", not "my favorite voice").
-
Be original. Describe voice qualities, not celebrity imitations (copyright risk, not supported).
-
Be concise. Avoid synonyms and meaningless intensifiers ("a very, very great voice").
Examples
Good descriptions:
-
"A young, lively female voice with a fast pace and noticeable upward inflection, suitable for fashion product introductions." Combines age, personality, pace, intonation, and use case.
-
"A calm, middle-aged male voice with a slow pace and deep, magnetic tone, suitable for news or documentary narration." Defines gender, age, pace, vocal characteristics, and domain.
-
"A cute child's voice, around 8 years old, with a slightly childish tone, suitable for animation character voice-overs." Specifies age, vocal quality, and use case.
-
"A gentle, intellectual female voice, around 30 years old, with a calm tone, suitable for audiobook narration." Conveys emotion, style, age, and application clearly.
Ineffective descriptions:
| Description | Issue | Improvement |
|---|---|---|
| "A nice voice" | Too vague and subjective. | "A young female voice with a clear vocal line and gentle tone." |
| "A voice like a certain celebrity" | Celebrity imitation is not supported (copyright risk). | "A mature, magnetic male voice with a calm pace." |
| "A very, very, very nice female voice" | Redundant. Repetition does not improve results. | "A female voice, 20--24 years old, with a light tone, lively pitch, and sweet quality." |
| "123456" | Invalid input. Cannot be parsed as voice features. | Provide a meaningful text description using the dimensions above. |
API reference
All operations use the same endpoint and authentication. Specify the operation with the action parameter.
Common request details
Endpoint
| Region | URL |
|---|---|
| Chinese mainland | POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization |
| Singapore (International) | POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization |
Request headers
| Header | Type | Required | Description |
|---|---|---|---|
| Authorization | string | Yes | Bearer <your-api-key> |
| Content-Type | string | Yes | application/json |
Use one account for all voice design and synthesis operations.
Create a voice
Create a custom voice from text and get a preview audio clip.
Request syntax
{
"model": "qwen-voice-design",
"input": {
"action": "create",
"target_model": "<target-synthesis-model>",
"voice_prompt": "<voice-description>",
"preview_text": "<text-for-preview-audio>",
"preferred_name": "<keyword-for-voice-name>",
"language": "<language-code>"
},
"parameters": {
"sample_rate": 24000,
"response_format": "wav"
}
}
model is the voice design model (always qwen-voice-design). target_model is the synthesis model for the created voice. Do not confuse them.
Request parameters
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
| model | string | -- | Yes | Voice design model. Fixed to qwen-voice-design. |
| action | string | -- | Yes | Operation type. Fixed to create. |
| target_model | string | -- | Yes | Synthesis model for the voice. Must match subsequent synthesis calls. Values: qwen3-tts-vd-realtime-2026-01-15 or qwen3-tts-vd-realtime-2025-12-16 (real-time), qwen3-tts-vd-2026-01-26 (non-real-time). |
| voice_prompt | string | -- | Yes | Voice description (max 2,048 chars, Chinese/English only). See Write effective voice descriptions. |
| preview_text | string | -- | Yes | Preview audio text (max 1,024 chars, supported languages only). |
| preferred_name | string | -- | No | Keyword for the voice name (alphanumeric/underscores, max 16 chars). Appears in the generated voice name. Example: announcer → qwen-tts-vd-announcer-voice-20251201102800-a1b2. |
| language | string | zh |
No | Language code for the generated voice. Must match preview_text. Values: zh, en, de, it, pt, es, ja, ko, fr, ru. |
| sample_rate | int | 24000 | No | Preview audio sample rate (Hz). Values: 8000, 16000, 24000, 48000. |
| response_format | string | wav |
No | Preview audio format. Values: pcm, wav, mp3, opus. |
Response parameters
| Parameter | Type | Description |
|---|---|---|
| voice | string | Generated voice name. Use as the voice parameter in synthesis API calls. |
| preview_audio.data | string | Base64-encoded preview audio. |
| preview_audio.sample_rate | int | Preview audio sample rate (request value or default: 24000). |
| preview_audio.response_format | string | Preview audio format (request value or default: wav). |
| target_model | string | Synthesis model for this voice. |
| usage.count | int | Voice creation count (always 1 for success). Cost: $0.2. |
| request_id | string | Request ID for troubleshooting. |
List voices
Returns a paginated list of all voices created under your account.
Request syntax
{
"model": "qwen-voice-design",
"input": {
"action": "list",
"page_size": 10,
"page_index": 0
}
}
Request parameters
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
| model | string | -- | Yes | Fixed to qwen-voice-design. |
| action | string | -- | Yes | Fixed to list. |
| page_index | integer | 0 | No | Page number. Range: 0--200. |
| page_size | integer | 10 | No | Entries per page. Must be greater than 0. |
Response parameters
| Parameter | Type | Description |
|---|---|---|
| page_index | integer | Current page number. |
| page_size | integer | Entries per page. |
| total_count | integer | Total number of voices. |
| voice_list[].voice | string | Voice name. |
| voice_list[].target_model | string | Speech synthesis model bound to this voice. |
| voice_list[].language | string | Language code. |
| voice_list[].voice_prompt | string | Voice description used during creation. |
| voice_list[].preview_text | string | Preview text used during creation. |
| voice_list[].gmt_create | string | Creation time. |
| voice_list[].gmt_modified | string | Last modification time. |
| request_id | string | Request ID. |
Query a voice
Returns detailed information about a specific voice.
Request syntax
{
"model": "qwen-voice-design",
"input": {
"action": "query",
"voice": "<voice-name>"
}
}
Request parameters
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
| model | string | -- | Yes | Fixed to qwen-voice-design. |
| action | string | -- | Yes | Fixed to query. |
| voice | string | -- | Yes | Voice name to query. |
Response parameters
| Parameter | Type | Description |
|---|---|---|
| voice | string | Voice name. |
| target_model | string | Speech synthesis model bound to this voice. |
| language | string | Language code. |
| voice_prompt | string | Voice description. |
| preview_text | string | Preview text. |
| gmt_create | string | Creation time. |
| gmt_modified | string | Last modification time. |
| request_id | string | Request ID. |
Delete a voice
Deletes a voice and releases the corresponding quota.
Request syntax
{
"model": "qwen-voice-design",
"input": {
"action": "delete",
"voice": "<voice-name>"
}
}
Request parameters
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
| model | string | -- | Yes | Fixed to qwen-voice-design. |
| action | string | -- | Yes | Fixed to delete. |
| voice | string | -- | Yes | Voice name to delete. |
Response parameters
| Parameter | Type | Description |
|---|---|---|
| voice | string | Deleted voice name. |
| request_id | string | Request ID. |
Sample code
Examples use the Singapore endpoint. For Chinese mainland, replace URLs:
-
HTTP:
https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization -
WebSocket:
wss://dashscope.aliyuncs.com/api-ws/v1/realtime
Create a voice and preview
cURL
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-voice-design",
"input": {
"action": "create",
"target_model": "qwen3-tts-vd-realtime-2026-01-15",
"voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, suitable for news broadcasting or documentary commentary.",
"preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
"preferred_name": "announcer",
"language": "en"
},
"parameters": {
"sample_rate": 24000,
"response_format": "wav"
}
}'
Python
import requests
import base64
import os
def create_voice():
"""Create a custom voice and save the preview audio."""
# Load API key from environment variable
api_key = os.getenv("DASHSCOPE_API_KEY")
if not api_key:
print("Error: DASHSCOPE_API_KEY not set.")
return None, None
data = {
"model": "qwen-voice-design",
"input": {
"action": "create",
"target_model": "qwen3-tts-vd-realtime-2026-01-15",
"voice_prompt": "A composed middle-aged male announcer with a deep, rich "
"and magnetic voice, a steady speaking speed and clear "
"articulation, suitable for news broadcasting or "
"documentary commentary.",
"preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
"preferred_name": "announcer",
"language": "en"
},
"parameters": {
"sample_rate": 24000,
"response_format": "wav"
}
}
response = requests.post(
"https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},
json=data,
timeout=60
)
if response.status_code == 200:
result = response.json()
voice_name = result["output"]["voice"]
audio_bytes = base64.b64decode(result["output"]["preview_audio"]["data"])
# Save preview audio
filename = f"{voice_name}_preview.wav"
with open(filename, "wb") as f:
f.write(audio_bytes)
print(f"Voice created: {voice_name}")
print(f"Preview saved to: {filename}")
return voice_name, filename
else:
print(f"Request failed ({response.status_code}): {response.text}")
return None, None
if __name__ == "__main__":
create_voice()
Java
Add the Gson dependency to your project:
Maven
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.13.1</version>
</dependency>
Gradle
// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.Base64;
public class Main {
public static void main(String[] args) {
new Main().createVoice();
}
public void createVoice() {
// Load API key from environment variable
String apiKey = System.getenv("DASHSCOPE_API_KEY");
String jsonBody = "{\n" +
" \"model\": \"qwen-voice-design\",\n" +
" \"input\": {\n" +
" \"action\": \"create\",\n" +
" \"target_model\": \"qwen3-tts-vd-realtime-2026-01-15\",\n" +
" \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, suitable for news broadcasting or documentary commentary.\",\n" +
" \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" +
" \"preferred_name\": \"announcer\",\n" +
" \"language\": \"en\"\n" +
" },\n" +
" \"parameters\": {\n" +
" \"sample_rate\": 24000,\n" +
" \"response_format\": \"wav\"\n" +
" }\n" +
"}";
HttpURLConnection connection = null;
try {
URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("Authorization", "Bearer " + apiKey);
connection.setRequestProperty("Content-Type", "application/json");
connection.setDoOutput(true);
// Send request body
try (OutputStream os = connection.getOutputStream()) {
os.write(jsonBody.getBytes("UTF-8"));
os.flush();
}
int responseCode = connection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
StringBuilder response = new StringBuilder();
try (BufferedReader br = new BufferedReader(
new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
String line;
while ((line = br.readLine()) != null) {
response.append(line.trim());
}
}
// Parse response and save preview audio
JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
JsonObject output = jsonResponse.getAsJsonObject("output");
String voiceName = output.get("voice").getAsString();
String base64Audio = output.getAsJsonObject("preview_audio").get("data").getAsString();
byte[] audioBytes = Base64.getDecoder().decode(base64Audio);
String filename = voiceName + "_preview.wav";
try (FileOutputStream fos = new FileOutputStream(filename)) {
fos.write(audioBytes);
}
System.out.println("Voice created: " + voiceName);
System.out.println("Preview saved to: " + filename);
} else {
StringBuilder error = new StringBuilder();
try (BufferedReader br = new BufferedReader(
new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
String line;
while ((line = br.readLine()) != null) {
error.append(line.trim());
}
}
System.out.println("Request failed (" + responseCode + "): " + error);
}
} catch (Exception e) {
System.err.println("Error: " + e.getMessage());
e.printStackTrace();
} finally {
if (connection != null) connection.disconnect();
}
}
}
Use a custom voice for speech synthesis
Pass the returned voice name to the synthesis API. The synthesis model must match the design target_model.
Bidirectional streaming (real-time)
Uses qwen3-tts-vd-realtime-2026-01-15. See Real-time speech synthesis - Qwen for details.
Python
# pyaudio installation:
# macOS: brew install portaudio && pip install pyaudio
# Ubuntu: sudo apt-get install python3-pyaudio (or pip install pyaudio)
# CentOS: sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Windows: python -m pip install pyaudio
import pyaudio
import os
import base64
import threading
import time
import dashscope
from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat
TEXT_TO_SYNTHESIZE = [
"Right? I really like this kind of supermarket,",
"especially during the New Year.",
"Going to the supermarket",
"just makes me feel",
"super, super happy!",
"I want to buy so many things!"
]
def init_dashscope_api_key():
"""Load the API key from environment variable."""
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
class MyCallback(QwenTtsRealtimeCallback):
"""Callback for streaming TTS playback."""
def __init__(self):
self.complete_event = threading.Event()
self._player = pyaudio.PyAudio()
self._stream = self._player.open(
format=pyaudio.paInt16, channels=1, rate=24000, output=True
)
def on_open(self) -> None:
print("[TTS] Connection established")
def on_close(self, close_status_code, close_msg) -> None:
self._stream.stop_stream()
self._stream.close()
self._player.terminate()
print(f"[TTS] Connection closed, code={close_status_code}, msg={close_msg}")
def on_event(self, response: dict) -> None:
event_type = response.get("type", "")
if event_type == "session.created":
print(f'[TTS] Session started: {response["session"]["id"]}')
elif event_type == "response.audio.delta":
audio_data = base64.b64decode(response["delta"])
self._stream.write(audio_data)
elif event_type == "response.done":
print(f"[TTS] Response complete, ID: {qwen_tts_realtime.get_last_response_id()}")
elif event_type == "session.finished":
print("[TTS] Session finished")
self.complete_event.set()
def wait_for_finished(self):
self.complete_event.wait()
if __name__ == "__main__":
init_dashscope_api_key()
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
model="qwen3-tts-vd-realtime-2026-01-15",
callback=callback,
url="wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime"
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice="<your-voice-name>", # Replace with your voice design voice name
response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
mode="server_commit"
)
for text_chunk in TEXT_TO_SYNTHESIZE:
print(f"[Sending text]: {text_chunk}")
qwen_tts_realtime.append_text(text_chunk)
time.sleep(0.1)
qwen_tts_realtime.finish()
callback.wait_for_finished()
print(f"[Metric] session_id={qwen_tts_realtime.get_session_id()}, "
f"first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s")
Java
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.JsonObject;
import javax.sound.sampled.*;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class Main {
private static String[] textToSynthesize = {
"Right? I really like this kind of supermarket,",
"especially during the New Year.",
"Going to the supermarket",
"just makes me feel",
"super, super happy!",
"I want to buy so many things!"
};
// Real-time PCM audio player
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> rawAudioBuffer = new ConcurrentLinkedQueue<>();
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
AudioFormat audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(() -> {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
rawAudioBuffer.add(Base64.getDecoder().decode(b64Audio));
} else {
try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); }
}
}
});
playerThread = new Thread(() -> {
while (!stopped.get()) {
byte[] rawAudio = rawAudioBuffer.poll();
if (rawAudio != null) {
int bytesWritten = 0;
while (bytesWritten < rawAudio.length) {
bytesWritten += line.write(rawAudio, bytesWritten, rawAudio.length - bytesWritten);
}
int audioLength = rawAudio.length / (this.sampleRate * 2 / 1000);
try { Thread.sleep(audioLength - 10); } catch (InterruptedException e) { throw new RuntimeException(e); }
} else {
try { Thread.sleep(100); } catch (InterruptedException e) { throw new RuntimeException(e); }
}
}
});
decoderThread.start();
playerThread.start();
}
public void write(String b64Audio) { b64AudioBuffer.add(b64Audio); }
public void waitForComplete() throws InterruptedException {
while (!b64AudioBuffer.isEmpty() || !rawAudioBuffer.isEmpty()) { Thread.sleep(100); }
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) { line.drain(); line.close(); }
}
}
public static void main(String[] args) throws Exception {
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
.model("qwen3-tts-vd-realtime-2026-01-15")
.url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
@Override
public void onOpen() { }
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch (type) {
case "response.audio.delta":
audioPlayer.write(message.get("delta").getAsString());
break;
case "session.finished":
completeLatch.get().countDown();
break;
}
}
@Override
public void onClose(int code, String reason) { }
});
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice("<your-voice-name>") // Replace with your voice design voice name
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("server_commit")
.build();
qwenTtsRealtime.updateSession(config);
for (String text : textToSynthesize) {
qwenTtsRealtime.appendText(text);
Thread.sleep(100);
}
qwenTtsRealtime.finish();
completeLatch.get().await();
audioPlayer.waitForComplete();
audioPlayer.shutdown();
System.exit(0);
}
}
Non-streaming and unidirectional streaming
Uses qwen3-tts-vd-2026-01-26. See Speech synthesis - Qwen for details.
The setup follows the same pattern as above: create a voice first, then pass the returned voice name to the speech synthesis API with the matching model. For non-streaming and unidirectional streaming code examples, see Speech synthesis - Qwen.
Query voices
cURL
# Query a specific voice
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-voice-design",
"input": {
"action": "query",
"voice": "<your-voice-name>"
}
}'# List all voices (paginated)
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-voice-design",
"input": {
"action": "list",
"page_size": 10,
"page_index": 0
}
}'
Python
import requests
import os
def query_voice(voice_name):
"""Get details for a specific voice."""
api_key = os.getenv("DASHSCOPE_API_KEY")
response = requests.post(
"https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},
json={
"model": "qwen-voice-design",
"input": {
"action": "query",
"voice": voice_name
}
}
)
if response.status_code == 200:
result = response.json()
print(f"Voice: {result['output']['voice']}")
print(f"Model: {result['output']['target_model']}")
print(f"Created: {result['output']['gmt_create']}")
return result
else:
error = response.json()
if error.get("code") == "VoiceNotFound":
print(f"Voice not found: {voice_name}")
else:
print(f"Request failed ({response.status_code}): {response.text}")
return None
def list_voices(page_index=0, page_size=10):
"""List all voices with pagination."""
api_key = os.getenv("DASHSCOPE_API_KEY")
response = requests.post(
"https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},
json={
"model": "qwen-voice-design",
"input": {
"action": "list",
"page_size": page_size,
"page_index": page_index
}
}
)
if response.status_code == 200:
result = response.json()
total = result["output"]["total_count"]
voices = result["output"]["voice_list"]
print(f"Total voices: {total}")
for v in voices:
print(f" - {v['voice']} ({v['language']}, {v['target_model']})")
return result
else:
print(f"Request failed ({response.status_code}): {response.text}")
return None
if __name__ == "__main__":
list_voices()
Java
Query a specific voice:
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
public class Main {
public static void main(String[] args) {
Main example = new Main();
String voiceName = "<your-voice-name>"; // Replace with the actual voice name
System.out.println("Querying voice: " + voiceName);
example.queryVoice(voiceName);
}
public void queryVoice(String voiceName) {
String apiKey = System.getenv("DASHSCOPE_API_KEY");
String jsonBody = "{\n" +
" \"model\": \"qwen-voice-design\",\n" +
" \"input\": {\n" +
" \"action\": \"query\",\n" +
" \"voice\": \"" + voiceName + "\"\n" +
" }\n" +
"}";
HttpURLConnection connection = null;
try {
URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("Authorization", "Bearer " + apiKey);
connection.setRequestProperty("Content-Type", "application/json");
connection.setDoOutput(true);
connection.setDoInput(true);
try (OutputStream os = connection.getOutputStream()) {
byte[] input = jsonBody.getBytes("UTF-8");
os.write(input, 0, input.length);
os.flush();
}
int responseCode = connection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
StringBuilder response = new StringBuilder();
try (BufferedReader br = new BufferedReader(
new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
String responseLine;
while ((responseLine = br.readLine()) != null) {
response.append(responseLine.trim());
}
}
JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
if (jsonResponse.has("code") && "VoiceNotFound".equals(jsonResponse.get("code").getAsString())) {
String errorMessage = jsonResponse.has("message") ?
jsonResponse.get("message").getAsString() : "Voice not found";
System.out.println("Voice not found: " + voiceName);
System.out.println("Error message: " + errorMessage);
return;
}
JsonObject outputObj = jsonResponse.getAsJsonObject("output");
System.out.println("Successfully queried voice information:");
System.out.println(" Voice Name: " + outputObj.get("voice").getAsString());
System.out.println(" Creation Time: " + outputObj.get("gmt_create").getAsString());
System.out.println(" Modification Time: " + outputObj.get("gmt_modified").getAsString());
System.out.println(" Language: " + outputObj.get("language").getAsString());
System.out.println(" Preview Text: " + outputObj.get("preview_text").getAsString());
System.out.println(" Model: " + outputObj.get("target_model").getAsString());
System.out.println(" Voice Description: " + outputObj.get("voice_prompt").getAsString());
} else {
StringBuilder errorResponse = new StringBuilder();
try (BufferedReader br = new BufferedReader(
new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
String responseLine;
while ((responseLine = br.readLine()) != null) {
errorResponse.append(responseLine.trim());
}
}
System.out.println("Request failed with status code: " + responseCode);
System.out.println("Error response: " + errorResponse.toString());
}
} catch (Exception e) {
System.err.println("An error occurred during the request: " + e.getMessage());
e.printStackTrace();
} finally {
if (connection != null) {
connection.disconnect();
}
}
}
}
List all voices (paginated):
import com.google.gson.Gson;
import com.google.gson.JsonArray;
import com.google.gson.JsonObject;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
public class Main {
public static void main(String[] args) {
String apiKey = System.getenv("DASHSCOPE_API_KEY");
String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
String jsonPayload =
"{"
+ "\"model\": \"qwen-voice-design\","
+ "\"input\": {"
+ "\"action\": \"list\","
+ "\"page_size\": 10,"
+ "\"page_index\": 0"
+ "}"
+ "}";
try {
HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
con.setRequestMethod("POST");
con.setRequestProperty("Authorization", "Bearer " + apiKey);
con.setRequestProperty("Content-Type", "application/json");
con.setDoOutput(true);
try (OutputStream os = con.getOutputStream()) {
os.write(jsonPayload.getBytes("UTF-8"));
}
int status = con.getResponseCode();
BufferedReader br = new BufferedReader(new InputStreamReader(
status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8"));
StringBuilder response = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
response.append(line);
}
br.close();
System.out.println("HTTP Status Code: " + status);
if (status == 200) {
Gson gson = new Gson();
JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list");
System.out.println("\nQueried voice list:");
for (int i = 0; i < voiceList.size(); i++) {
JsonObject voiceItem = voiceList.get(i).getAsJsonObject();
String voice = voiceItem.get("voice").getAsString();
String gmtCreate = voiceItem.get("gmt_create").getAsString();
String targetModel = voiceItem.get("target_model").getAsString();
System.out.printf("- Voice: %s Creation Time: %s Model: %s\n",
voice, gmtCreate, targetModel);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
Delete a voice
cURL
curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen-voice-design",
"input": {
"action": "delete",
"voice": "<your-voice-name>"
}
}'
Python
import requests
import os
def delete_voice(voice_name):
"""Delete a voice and release the quota."""
api_key = os.getenv("DASHSCOPE_API_KEY")
response = requests.post(
"https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},
json={
"model": "qwen-voice-design",
"input": {
"action": "delete",
"voice": voice_name
}
}
)
if response.status_code == 200:
print(f"Deleted: {voice_name}")
return True
else:
print(f"Request failed ({response.status_code}): {response.text}")
return False
if __name__ == "__main__":
delete_voice("<your-voice-name>")
Java
import com.google.gson.JsonObject;
import com.google.gson.JsonParser;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
public class Main {
public static void main(String[] args) {
Main example = new Main();
String voiceName = "<your-voice-name>"; // Replace with the actual voice name
System.out.println("Deleting voice: " + voiceName);
example.deleteVoice(voiceName);
}
public void deleteVoice(String voiceName) {
String apiKey = System.getenv("DASHSCOPE_API_KEY");
String jsonBody = "{\n" +
" \"model\": \"qwen-voice-design\",\n" +
" \"input\": {\n" +
" \"action\": \"delete\",\n" +
" \"voice\": \"" + voiceName + "\"\n" +
" }\n" +
"}";
HttpURLConnection connection = null;
try {
URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization");
connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("POST");
connection.setRequestProperty("Authorization", "Bearer " + apiKey);
connection.setRequestProperty("Content-Type", "application/json");
connection.setDoOutput(true);
connection.setDoInput(true);
try (OutputStream os = connection.getOutputStream()) {
byte[] input = jsonBody.getBytes("UTF-8");
os.write(input, 0, input.length);
os.flush();
}
int responseCode = connection.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
StringBuilder response = new StringBuilder();
try (BufferedReader br = new BufferedReader(
new InputStreamReader(connection.getInputStream(), "UTF-8"))) {
String responseLine;
while ((responseLine = br.readLine()) != null) {
response.append(responseLine.trim());
}
}
JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject();
if (jsonResponse.has("code") && jsonResponse.get("code").getAsString().contains("VoiceNotFound")) {
String errorMessage = jsonResponse.has("message") ?
jsonResponse.get("message").getAsString() : "Voice not found";
System.out.println("Voice does not exist: " + voiceName);
System.out.println("Error message: " + errorMessage);
} else if (jsonResponse.has("usage")) {
System.out.println("Voice deleted successfully: " + voiceName);
String requestId = jsonResponse.has("request_id") ?
jsonResponse.get("request_id").getAsString() : "N/A";
System.out.println("Request ID: " + requestId);
} else {
System.out.println("Unexpected response format: " + response.toString());
}
} else {
StringBuilder errorResponse = new StringBuilder();
try (BufferedReader br = new BufferedReader(
new InputStreamReader(connection.getErrorStream(), "UTF-8"))) {
String responseLine;
while ((responseLine = br.readLine()) != null) {
errorResponse.append(responseLine.trim());
}
}
System.out.println("Request failed with status code: " + responseCode);
System.out.println("Error response: " + errorResponse.toString());
}
} catch (Exception e) {
System.err.println("An error occurred during the request: " + e.getMessage());
e.printStackTrace();
} finally {
if (connection != null) {
connection.disconnect();
}
}
}
}
Voice quota and automatic cleanup
-
Limit: 1,000 voices per account. Check count via
total_countin List voices response. -
Automatic cleanup: Unused voices (past year) are deleted.
Billing
Voice design and speech synthesis are billed separately.
Voice creation: $0.2 per voice (failed creations free).
Speech synthesis with custom voices: Billed per character. For pricing details, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.
Error handling
Failed requests return code and message fields. See Error messages for the full reference.
Common errors for voice design:
| HTTP status | Error code | Cause | Resolution |
|---|---|---|---|
| 400 | VoiceNotFound | The specified voice does not exist. | Verify via List voices or Query a voice API. |