Qwen voice cloning uses a feature extraction model to clone voices without training. Provide 10 to 20 seconds of audio to generate a highly similar, natural-sounding custom voice. Voice cloning and speech synthesis are sequential steps. This document covers voice cloning parameters and interface details. For speech synthesis, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.
User guide: For model descriptions and selection recommendations, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.
This document applies only to the Qwen voice cloning interface. If you use the CosyVoice model, see CosyVoice voice cloning API.
Audio requirements
High-quality input audio is essential for optimal cloning results.
|
Item |
Requirement |
|
Supported formats |
WAV (16-bit), MP3, M4A |
|
Audio duration |
Recommended: 10–20 seconds. Maximum: 60 seconds. |
|
File size |
< 10 MB |
|
Sample rate |
≥ 24 kHz |
|
Sound channel |
Mono |
|
Content |
The audio must contain at least 3 seconds of continuous, clear speech with no background noise. The rest may include short pauses (≤ 2 seconds). Avoid background music, noise, or other voices throughout clip to ensure high-quality core speech content. Use normal speaking audio as input. Do not upload songs or singing audio to ensure accurate and usable cloning results. |
|
Language |
Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), Russian (ru) |
Getting started: From cloning to synthesis
1. Workflow
Voice cloning and speech synthesis are two separate but closely linked steps that follow a "create first, use later" workflow:
-
Create a voice
Call the Create voice interface and upload an audio clip. The system analyzes the audio and creates a custom cloned voice. You must specify
target_model, which defines the speech synthesis model that will drive the created voice.If you already have a voice (check by calling the List voices interface), skip this step and proceed to the next.
-
Use the voice for speech synthesis
Call the speech synthesis interface and pass in the voice from the previous step. The speech synthesis model specified here must match the
target_modelfrom the previous step.
2. Model configuration and preparations
Select an appropriate model and complete necessary preparations.
Model configuration
Specify these two models for voice cloning:
-
Voice cloning model: qwen-voice-enrollment
-
Speech synthesis models that drive the voice:
-
Qwen3-TTS-VC-Realtime (see Real-time speech synthesis - Qwen):
-
qwen3-tts-vc-realtime-2026-01-15
-
qwen3-tts-vc-realtime-2025-11-27
-
-
Qwen3-TTS-VC (see Speech synthesis - Qwen):
-
qwen3-tts-vc-2026-01-22
-
-
Preparations
-
Get an API key: Get an API key. For security, set your API key as an environment variable.
-
Install the SDK: Ensure you have installed the latest DashScope SDK.
-
Prepare the audio for cloning: The audio must meet the audio requirements.
3. End-to-end example
The following example demonstrates how to use a custom cloned voice in speech synthesis to produce output that closely matches the original voice.
-
Key principle: The
target_model(the speech synthesis model that drives the voice) specified during voice cloning must match the model used in the speech synthesis call. Otherwise, synthesis fails. -
The example uses a local audio file
voice.mp3for voice cloning. Replace it with your own file when running the code.
Bidirectional streaming synthesis
Applies to Qwen3-TTS-VC-Realtime series models. For more information, see Real-time speech synthesis - Qwen.
Python
# coding=utf-8
# Installation instructions for pyaudio:
# APPLE Mac OS X
# brew install portaudio
# pip install pyaudio
# Debian/Ubuntu
# sudo apt-get install python-pyaudio python3-pyaudio
# or
# pip install pyaudio
# CentOS
# sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
# python -m pip install pyaudio
import pyaudio
import os
import requests
import base64
import pathlib
import threading
import time
import dashscope # DashScope Python SDK version must be 1.23.9 or higher
from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat
# ======= Constants =======
DEFAULT_TARGET_MODEL = "qwen3-tts-vc-realtime-2026-01-15" # Target model for voice cloning and speech synthesis must match
DEFAULT_PREFERRED_NAME = "guanyu"
DEFAULT_AUDIO_MIME_TYPE = "audio/mpeg"
VOICE_FILE_PATH = "voice.mp3" # Relative path to local audio file used for voice cloning
TEXT_TO_SYNTHESIZE = [
'Right? I love supermarkets like this.',
'Especially during Chinese New Year',
'When I go shopping',
'I feel',
'Extremely happy!',
'And want to buy so many things!'
]
def create_voice(file_path: str,
target_model: str = DEFAULT_TARGET_MODEL,
preferred_name: str = DEFAULT_PREFERRED_NAME,
audio_mime_type: str = DEFAULT_AUDIO_MIME_TYPE) -> str:
"""
Create a voice and return the voice parameter
"""
# API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured an environment variable, replace the next line with your Model Studio API key: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
raise FileNotFoundError(f"Audio file not found: {file_path}")
base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"
# This URL is for the Singapore region. To use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
payload = {
"model": "qwen-voice-enrollment", # Do not change this value
"input": {
"action": "create",
"target_model": target_model,
"preferred_name": preferred_name,
"audio": {"data": data_uri}
}
}
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
resp = requests.post(url, json=payload, headers=headers)
if resp.status_code != 200:
raise RuntimeError(f"Failed to create voice: {resp.status_code}, {resp.text}")
try:
return resp.json()["output"]["voice"]
except (KeyError, ValueError) as e:
raise RuntimeError(f"Failed to parse voice response: {e}")
def init_dashscope_api_key():
"""
Initialize DashScope SDK API key
"""
# API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured an environment variable, replace the next line with your Model Studio API key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
# ======= Callback class =======
class MyCallback(QwenTtsRealtimeCallback):
"""
Custom TTS streaming callback
"""
def __init__(self):
self.complete_event = threading.Event()
self._player = pyaudio.PyAudio()
self._stream = self._player.open(
format=pyaudio.paInt16, channels=1, rate=24000, output=True
)
def on_open(self) -> None:
print('[TTS] Connection established')
def on_close(self, close_status_code, close_msg) -> None:
self._stream.stop_stream()
self._stream.close()
self._player.terminate()
print(f'[TTS] Connection closed code={close_status_code}, msg={close_msg}')
def on_event(self, response: dict) -> None:
try:
event_type = response.get('type', '')
if event_type == 'session.created':
print(f'[TTS] Session started: {response["session"]["id"]}')
elif event_type == 'response.audio.delta':
audio_data = base64.b64decode(response['delta'])
self._stream.write(audio_data)
elif event_type == 'response.done':
print(f'[TTS] Response completed, Response ID: {qwen_tts_realtime.get_last_response_id()}')
elif event_type == 'session.finished':
print('[TTS] Session ended')
self.complete_event.set()
except Exception as e:
print(f'[Error] Exception in callback event handler: {e}')
def wait_for_finished(self):
self.complete_event.wait()
# ======= Main execution logic =======
if __name__ == '__main__':
init_dashscope_api_key()
print('[System] Initializing Qwen TTS Realtime ...')
callback = MyCallback()
qwen_tts_realtime = QwenTtsRealtime(
model=DEFAULT_TARGET_MODEL,
callback=callback,
# This URL is for the Singapore region. To use a model from the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
)
qwen_tts_realtime.connect()
qwen_tts_realtime.update_session(
voice=create_voice(VOICE_FILE_PATH), # Replace voice parameter with cloned voice
response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
mode='server_commit'
)
for text_chunk in TEXT_TO_SYNTHESIZE:
print(f'[Sending text]: {text_chunk}')
qwen_tts_realtime.append_text(text_chunk)
time.sleep(0.1)
qwen_tts_realtime.finish()
callback.wait_for_finished()
print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, '
f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')
Java
You need to import the Gson dependency. If you use Maven or Gradle, add the dependency as follows:
Maven
Add the following to your pom.xml:
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.13.1</version>
</dependency>Gradle
Add the following to your build.gradle:
// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import javax.sound.sampled.*;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class Main {
// ===== Constants =====
// Target model for voice cloning and speech synthesis must match
private static final String TARGET_MODEL = "qwen3-tts-vc-realtime-2026-01-15";
private static final String PREFERRED_NAME = "guanyu";
// Relative path to local audio file used for voice cloning
private static final String AUDIO_FILE = "voice.mp3";
private static final String AUDIO_MIME_TYPE = "audio/mpeg";
private static String[] textToSynthesize = {
"Right? I love supermarkets like this.",
"Especially during Chinese New Year",
"When I go shopping",
"I feel",
"Extremely happy!",
"And want to buy so many things!"
};
// Generate data URI
public static String toDataUrl(String filePath) throws IOException {
byte[] bytes = Files.readAllBytes(Paths.get(filePath));
String encoded = Base64.getEncoder().encodeToString(bytes);
return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
}
// Call API to create voice
public static String createVoice() throws Exception {
// API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you have not configured an environment variable, replace the next line with your Model Studio API key: String apiKey = "sk-xxx"
String apiKey = System.getenv("DASHSCOPE_API_KEY");
String jsonPayload =
"{"
+ "\"model\": \"qwen-voice-enrollment\"," // Do not change this value
+ "\"input\": {"
+ "\"action\": \"create\","
+ "\"target_model\": \"" + TARGET_MODEL + "\","
+ "\"preferred_name\": \"" + PREFERRED_NAME + "\","
+ "\"audio\": {"
+ "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
+ "}"
+ "}"
+ "}";
HttpURLConnection con = (HttpURLConnection) new URL("https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization").openConnection();
con.setRequestMethod("POST");
con.setRequestProperty("Authorization", "Bearer " + apiKey);
con.setRequestProperty("Content-Type", "application/json");
con.setDoOutput(true);
try (OutputStream os = con.getOutputStream()) {
os.write(jsonPayload.getBytes(StandardCharsets.UTF_8));
}
int status = con.getResponseCode();
System.out.println("HTTP Status Code: " + status);
try (BufferedReader br = new BufferedReader(
new InputStreamReader(status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(),
StandardCharsets.UTF_8))) {
StringBuilder response = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
response.append(line);
}
System.out.println("Response: " + response);
if (status == 200) {
JsonObject jsonObj = new Gson().fromJson(response.toString(), JsonObject.class);
return jsonObj.getAsJsonObject("output").get("voice").getAsString();
}
throw new IOException("Failed to create voice: " + status + " - " + response);
}
}
// Real-time PCM audio player class
public static class RealtimePcmPlayer {
private int sampleRate;
private SourceDataLine line;
private AudioFormat audioFormat;
private Thread decoderThread;
private Thread playerThread;
private AtomicBoolean stopped = new AtomicBoolean(false);
private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();
// Constructor initializes audio format and audio line
public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
this.sampleRate = sampleRate;
this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
line = (SourceDataLine) AudioSystem.getLine(info);
line.open(audioFormat);
line.start();
decoderThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
String b64Audio = b64AudioBuffer.poll();
if (b64Audio != null) {
byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
RawAudioBuffer.add(rawAudio);
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
playerThread = new Thread(new Runnable() {
@Override
public void run() {
while (!stopped.get()) {
byte[] rawAudio = RawAudioBuffer.poll();
if (rawAudio != null) {
try {
playChunk(rawAudio);
} catch (IOException e) {
throw new RuntimeException(e);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
} else {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
}
}
}
});
decoderThread.start();
playerThread.start();
}
// Play an audio chunk and block until playback completes
private void playChunk(byte[] chunk) throws IOException, InterruptedException {
if (chunk == null || chunk.length == 0) return;
int bytesWritten = 0;
while (bytesWritten < chunk.length) {
bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
}
int audioLength = chunk.length / (this.sampleRate*2/1000);
// Wait for audio in buffer to finish playing
Thread.sleep(audioLength - 10);
}
public void write(String b64Audio) {
b64AudioBuffer.add(b64Audio);
}
public void cancel() {
b64AudioBuffer.clear();
RawAudioBuffer.clear();
}
public void waitForComplete() throws InterruptedException {
while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
Thread.sleep(100);
}
line.drain();
}
public void shutdown() throws InterruptedException {
stopped.set(true);
decoderThread.join();
playerThread.join();
if (line != null && line.isRunning()) {
line.drain();
line.close();
}
}
}
public static void main(String[] args) throws Exception {
QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
.model(TARGET_MODEL)
// This URL is for the Singapore region. To use a model from the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
.url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
// API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you have not configured an environment variable, replace the next line with your Model Studio API key: .apikey("sk-xxx")
.apikey(System.getenv("DASHSCOPE_API_KEY"))
.build();
AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);
// Create real-time audio player instance
RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);
QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
@Override
public void onOpen() {
// Handle connection established
}
@Override
public void onEvent(JsonObject message) {
String type = message.get("type").getAsString();
switch(type) {
case "session.created":
// Handle session created
break;
case "response.audio.delta":
String recvAudioB64 = message.get("delta").getAsString();
// Play audio in real time
audioPlayer.write(recvAudioB64);
break;
case "response.done":
// Handle response completed
break;
case "session.finished":
// Handle session ended
completeLatch.get().countDown();
default:
break;
}
}
@Override
public void onClose(int code, String reason) {
// Handle connection closed
}
});
qwenTtsRef.set(qwenTtsRealtime);
try {
qwenTtsRealtime.connect();
} catch (NoApiKeyException e) {
throw new RuntimeException(e);
}
QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
.voice(createVoice()) // Replace voice parameter with the exclusive voice generated by voice cloning
.responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
.mode("server_commit")
.build();
qwenTtsRealtime.updateSession(config);
for (String text:textToSynthesize) {
qwenTtsRealtime.appendText(text);
Thread.sleep(100);
}
qwenTtsRealtime.finish();
completeLatch.get().await();
// Wait for audio playback to complete and shut down player
audioPlayer.waitForComplete();
audioPlayer.shutdown();
System.exit(0);
}
}Non-streaming/unidirectional streaming synthesis
Applies to Qwen3-TTS-VC series models. For details, see Speech synthesis - Qwen.
This example references the DashScope SDK non-streaming sample code for speech synthesis using system voices. It replaces the voice parameter with a custom cloned voice. For unidirectional streaming synthesis, see Speech synthesis - Qwen.
Python
import os
import requests
import base64
import pathlib
import dashscope
# ======= Constant configuration =======
DEFAULT_TARGET_MODEL = "qwen3-tts-vc-2026-01-22" # Use the same model for voice cloning and speech synthesis
DEFAULT_PREFERRED_NAME = "guanyu"
DEFAULT_AUDIO_MIME_TYPE = "audio/mpeg"
VOICE_FILE_PATH = "voice.mp3" # Relative path to the local audio file used for voice cloning
def create_voice(file_path: str,
target_model: str = DEFAULT_TARGET_MODEL,
preferred_name: str = DEFAULT_PREFERRED_NAME,
audio_mime_type: str = DEFAULT_AUDIO_MIME_TYPE) -> str:
"""
Create a voice and return the voice parameter.
"""
# API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you haven't configured an environment variable, replace the following line with: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
raise FileNotFoundError(f"Audio file does not exist: {file_path}")
base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
payload = {
"model": "qwen-voice-enrollment", # Do not change this value
"input": {
"action": "create",
"target_model": target_model,
"preferred_name": preferred_name,
"audio": {"data": data_uri}
}
}
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
resp = requests.post(url, json=payload, headers=headers)
if resp.status_code != 200:
raise RuntimeError(f"Failed to create voice: {resp.status_code}, {resp.text}")
try:
return resp.json()["output"]["voice"]
except (KeyError, ValueError) as e:
raise RuntimeError(f"Failed to parse voice response: {e}")
if __name__ == '__main__':
# The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
text = "How's the weather today?"
# SpeechSynthesizer interface usage: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...)
response = dashscope.MultiModalConversation.call(
model=DEFAULT_TARGET_MODEL,
# API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you haven't configured an environment variable, replace the following line with: api_key = "sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
text=text,
voice=create_voice(VOICE_FILE_PATH), # Replace the voice parameter with the custom voice generated by cloning
stream=False
)
print(response)
Java
Add the Gson dependency. If you use Maven or Gradle, add the dependency as follows:
Maven
Add the following content to your pom.xml:
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.13.1</version>
</dependency>
Gradle
Add the following content to your build.gradle:
// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")
When using a custom voice generated by voice cloning for speech synthesis, set the voice as follows:
MultiModalConversationParam param = MultiModalConversationParam.builder()
.parameter("voice", "your_voice") // Replace the voice parameter with the custom voice generated by cloning
.build();
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.utils.Constants;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
public class Main {
// ===== Constant definitions =====
// Use the same model for voice cloning and speech synthesis
private static final String TARGET_MODEL = "qwen3-tts-vc-2026-01-22";
private static final String PREFERRED_NAME = "guanyu";
// Relative path to the local audio file used for voice cloning
private static final String AUDIO_FILE = "voice.mp3";
private static final String AUDIO_MIME_TYPE = "audio/mpeg";
// Generate a data URI
public static String toDataUrl(String filePath) throws IOException {
byte[] bytes = Files.readAllBytes(Paths.get(filePath));
String encoded = Base64.getEncoder().encodeToString(bytes);
return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
}
// Call the API to create a voice
public static String createVoice() throws Exception {
// API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you haven't configured an environment variable, replace the following line with: String apiKey = "sk-xxx"
String apiKey = System.getenv("DASHSCOPE_API_KEY");
String jsonPayload =
"{"
+ "\"model\": \"qwen-voice-enrollment\"," // Do not change this value
+ "\"input\": {"
+ "\"action\": \"create\","
+ "\"target_model\": \"" + TARGET_MODEL + "\","
+ "\"preferred_name\": \"" + PREFERRED_NAME + "\","
+ "\"audio\": {"
+ "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
+ "}"
+ "}"
+ "}";
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
String url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
HttpURLConnection con = (HttpURLConnection) new URL(url).openConnection();
con.setRequestMethod("POST");
con.setRequestProperty("Authorization", "Bearer " + apiKey);
con.setRequestProperty("Content-Type", "application/json");
con.setDoOutput(true);
try (OutputStream os = con.getOutputStream()) {
os.write(jsonPayload.getBytes(StandardCharsets.UTF_8));
}
int status = con.getResponseCode();
System.out.println("HTTP status code: " + status);
try (BufferedReader br = new BufferedReader(
new InputStreamReader(status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(),
StandardCharsets.UTF_8))) {
StringBuilder response = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
response.append(line);
}
System.out.println("Response content: " + response);
if (status == 200) {
JsonObject jsonObj = new Gson().fromJson(response.toString(), JsonObject.class);
return jsonObj.getAsJsonObject("output").get("voice").getAsString();
}
throw new IOException("Failed to create voice: " + status + " - " + response);
}
}
public static void call() throws Exception {
MultiModalConversation conv = new MultiModalConversation();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you haven't configured an environment variable, replace the following line with: .apikey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(TARGET_MODEL)
.text("How's the weather today?")
.parameter("voice", createVoice()) // Replace the voice parameter with the custom voice generated by cloning
.build();
MultiModalConversationResult result = conv.call(param);
String audioUrl = result.getOutput().getAudio().getUrl();
System.out.print(audioUrl);
// Download the audio file locally
try (InputStream in = new URL(audioUrl).openStream();
FileOutputStream out = new FileOutputStream("downloaded_audio.wav")) {
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = in.read(buffer)) != -1) {
out.write(buffer, 0, bytesRead);
}
System.out.println("\nAudio file downloaded locally: downloaded_audio.wav");
} catch (Exception e) {
System.out.println("\nError downloading audio file: " + e.getMessage());
}
}
public static void main(String[] args) {
try {
// The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
call();
} catch (Exception e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
API reference
When using different APIs, ensure you use the same account.
Create voice
Uploads audio for cloning and create a custom voice.
-
URL
Chinese Mainland:
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational:
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization -
Request headers
Parameter
Type
Required
Description
Authorization
string
Authentication token in the format
Bearer <your_api_key>. Replace "<your_api_key>" with your actual API key.Content-Type
string
Media type of the data transmitted in the request body. Fixed as
application/json. -
Request body
The request body contains all request parameters. Omit optional fields if not needed.
ImportantNote the following parameters:
-
model: Voice cloning model, fixed asqwen-voice-enrollment -
target_model: Speech synthesis model that drives the voice. It must match the speech synthesis model used in subsequent speech synthesis calls. Otherwise, synthesis fails.
{ "model": "qwen-voice-enrollment", "input": { "action": "create", "target_model": "qwen3-tts-vc-realtime-2026-01-15", "preferred_name": "guanyu", "audio": { "data": "https://xxx.wav" }, "text": "Optional. Enter the text corresponding to audio.data.", "language": "Optional. Enter the language of audio.data, such as zh." } } -
-
Request parameters
Parameter
Type
Default
Required
Description
model
string
-
Voice cloning model, fixed as
qwen-voice-enrollment.action
string
-
Operation type, fixed as
create.target_model
string
-
Speech synthesis model that drives the voice. Supported models include (two types):
-
Qwen3-TTS-VC-Realtime (see Real-time speech synthesis - Qwen):
-
qwen3-tts-vc-realtime-2026-01-15
-
qwen3-tts-vc-realtime-2025-11-27
-
-
Qwen3-TTS-VC (see Speech synthesis - Qwen):
-
qwen3-tts-vc-2026-01-22
-
It must match the speech synthesis model used in subsequent speech synthesis calls. Otherwise, synthesis fails.
preferred_name
string
-
Assign a recognizable name to the voice (up to 16 characters: digits, letters, and underscores only). Use identifiers related to roles or scenarios.
This keyword appears in the cloned voice name. For example, if the keyword is "guanyu", the final voice name is "qwen-tts-vc-guanyu-voice-20250812105009984-838b".
audio.data
string
-
Audio for cloning (follow the Recording guide when recording, and ensure the audio meets Audio requirements).
Submit audio data in one of the following ways:
-
Format:
data:<mediatype>;base64,<data>-
<mediatype>: MIME type-
WAV:
audio/wav -
MP3:
audio/mpeg -
M4A:
audio/mp4
-
-
<data>: Base64-encoded string of the audioBase64 encoding increases file size. Control the original file size to keep encoded data under 10 MB.
-
Example:
data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9
-
-
Audio URL (we recommend uploading audio to OSS)
-
File size must not exceed 10 MB.
-
The URL must be publicly accessible and require no authentication.
-
text
string
-
Text matching the
audio.datacontent.If provided, the server compares the audio against this text. If they differ significantly, it returns Audio.PreprocessError.
language
string
-
Language of the
audio.dataaudio.Supported languages:
zh(Chinese),en(English),de(German),it(Italian),pt(Portuguese),es(Spanish),ja(Japanese),ko(Korean),fr(French),ru(Russian).If using this parameter, ensure the specified language matches the audio language.
-
-
Response parameters
Key parameters:
Parameter
Type
Description
voice
string
Voice name. Use directly as the
voiceparameter in the speech synthesis interface.target_model
string
Speech synthesis model that drives the voice. Supported models include (two types):
-
Qwen3-TTS-VC-Realtime (see Real-time speech synthesis - Qwen):
-
qwen3-tts-vc-realtime-2026-01-15
-
qwen3-tts-vc-realtime-2025-11-27
-
-
Qwen3-TTS-VC (see Speech synthesis - Qwen):
-
qwen3-tts-vc-2026-01-22
-
It must match the speech synthesis model used in subsequent speech synthesis calls. Otherwise, synthesis fails.
request_id
string
Request ID.
count
integer
Number of "create voice" operations billed for this request. The cost is $
. For voice creation, count is always 1.
-
-
Sample code
ImportantNote the following parameters:
-
model: Voice cloning model, fixed asqwen-voice-enrollment -
target_model: Speech synthesis model that drives the voice. It must match the speech synthesis model used in subsequent speech synthesis calls. Otherwise, synthesis fails.
cURL
If you haven't configured your API key as an environment variable, replace
$DASHSCOPE_API_KEYin the examples with your actual API key.https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization# ======= Important notes ======= # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # === Delete this comment before execution === curl -X POST <a data-init-id="9f104f338c7kz" href="https://poc-dashscope.aliyuncs.com/api/v1/services/audio/tts/customization" id="35ebbc67890ds">https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization</a> \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-voice-enrollment", "input": { "action": "create", "target_model": "qwen3-tts-vc-realtime-2026-01-15", "preferred_name": "guanyu", "audio": { "data": "https://xxx.wav" } } }'Python
import os import requests import base64, pathlib target_model = "qwen3-tts-vc-realtime-2026-01-15" preferred_name = "guanyu" audio_mime_type = "audio/mpeg" file_path = pathlib.Path("input.mp3") base64_str = base64.b64encode(file_path.read_bytes()).decode() data_uri = f"data:{audio_mime_type};base64,{base64_str}" # API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If you haven't configured an environment variable, replace the following line with: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" payload = { "model": "qwen-voice-enrollment", # Do not change this value "input": { "action": "create", "target_model": target_model, "preferred_name": preferred_name, "audio": { "data": data_uri } } } headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } # Send POST request resp = requests.post(url, json=payload, headers=headers) if resp.status_code == 200: data = resp.json() voice = data["output"]["voice"] print(f"Generated voice parameter: {voice}") else: print("Request failed:", resp.status_code, resp.text)Java
import com.google.gson.Gson; import com.google.gson.JsonObject; import java.io.*; import java.net.HttpURLConnection; import java.net.URL; import java.nio.file.*; import java.util.Base64; public class Main { private static final String TARGET_MODEL = "qwen3-tts-vc-realtime-2026-01-15"; private static final String PREFERRED_NAME = "guanyu"; private static final String AUDIO_FILE = "input.mp3"; private static final String AUDIO_MIME_TYPE = "audio/mpeg"; public static String toDataUrl(String filePath) throws Exception { byte[] bytes = Files.readAllBytes(Paths.get(filePath)); String encoded = Base64.getEncoder().encodeToString(bytes); return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded; } public static void main(String[] args) { // API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If you haven't configured an environment variable, replace the following line with: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"; try { // Construct JSON request body (escape internal quotes) String jsonPayload = "{" + "\"model\": \"qwen-voice-enrollment\"," // Do not change this value + "\"input\": {" + "\"action\": \"create\"," + "\"target_model\": \"" + TARGET_MODEL + "\"," + "\"preferred_name\": \"" + PREFERRED_NAME + "\"," + "\"audio\": {" + "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\"" + "}" + "}" + "}"; HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection(); con.setRequestMethod("POST"); con.setRequestProperty("Authorization", "Bearer " + apiKey); con.setRequestProperty("Content-Type", "application/json"); con.setDoOutput(true); // Send request body try (OutputStream os = con.getOutputStream()) { os.write(jsonPayload.getBytes("UTF-8")); } int status = con.getResponseCode(); InputStream is = (status >= 200 && status < 300) ? con.getInputStream() : con.getErrorStream(); StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"))) { String line; while ((line = br.readLine()) != null) { response.append(line); } } System.out.println("HTTP status code: " + status); System.out.println("Response content: " + response.toString()); if (status == 200) { // Parse JSON Gson gson = new Gson(); JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class); String voice = jsonObj.getAsJsonObject("output").get("voice").getAsString(); System.out.println("Generated voice parameter: " + voice); } } catch (Exception e) { e.printStackTrace(); } } } -
List voices
Performs a paged query to list voices you've created.
-
URL
Chinese Mainland:
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational:
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization -
Request headers
Parameter
Type
Required
Description
Authorization
string
Authentication token in the format
Bearer <your_api_key>. Replace "<your_api_key>" with your actual API key.Content-Type
string
Media type of the data transmitted in the request body. Fixed as
application/json. -
Request body
The request body contains all request parameters. Omit optional fields if not needed.
Importantmodel: Voice cloning model, fixed asqwen-voice-enrollment. Do not modify.{ "model": "qwen-voice-enrollment", "input": { "action": "list", "page_size": 2, "page_index": 0 } } -
Request parameters
Parameter
Type
Default
Required
Description
model
string
-
Voice cloning model, fixed as
qwen-voice-enrollment.action
string
-
Operation type, fixed as
list.page_index
integer
0
Page index. Range: [0, 1000000].
page_size
integer
10
Entries per page. Range: [0, 1000000].
-
Response parameters
Key parameters:
Parameter
Type
Description
voice
string
Voice name. Use directly as the
voiceparameter in the speech synthesis interface.gmt_create
string
Voice creation time.
target_model
string
Speech synthesis model that drives the voice. Supported models include (two types):
-
Qwen3-TTS-VC-Realtime (see Real-time speech synthesis - Qwen):
-
qwen3-tts-vc-realtime-2026-01-15
-
qwen3-tts-vc-realtime-2025-11-27
-
-
Qwen3-TTS-VC (see Speech synthesis - Qwen):
-
qwen3-tts-vc-2026-01-22
-
It must match the speech synthesis model used in subsequent speech synthesis calls. Otherwise, synthesis fails.
request_id
string
Request ID.
count
integer
Number of "create voice" operations billed for this request. The cost is $
. Voice listing is free. Therefore,
countis always 0. -
-
Sample code
Importantmodel: Voice cloning model, fixed asqwen-voice-enrollment. Do not modify.cURL
If you haven't configured your API key as an environment variable, replace
$DASHSCOPE_API_KEYin the examples with your actual API key.# ======= Important notes ======= # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # === Delete this comment before execution === curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \ --header 'Authorization: Bearer $DASHSCOPE_API_KEY' \ --header 'Content-Type: application/json' \ --data '{ "model": "qwen-voice-enrollment", "input": { "action": "list", "page_size": 10, "page_index": 0 } }'Python
import os import requests # API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If you haven't configured an environment variable, replace the following line with: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" payload = { "model": "qwen-voice-enrollment", # Do not change this value "input": { "action": "list", "page_size": 10, "page_index": 0 } } headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } response = requests.post(url, json=payload, headers=headers) print("HTTP status code:", response.status_code) if response.status_code == 200: data = response.json() voice_list = data["output"]["voice_list"] print("List of voices found:") for item in voice_list: print(f"- Voice: {item['voice']} Creation time: {item['gmt_create']} Model: {item['target_model']}") else: print("Request failed:", response.text)Java
import com.google.gson.Gson; import com.google.gson.JsonArray; import com.google.gson.JsonObject; import java.io.BufferedReader; import java.io.InputStreamReader; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; public class Main { public static void main(String[] args) { // API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If you haven't configured an environment variable, replace the following line with: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"; // JSON request body (older Java versions don't support """ multiline strings) String jsonPayload = "{" + "\"model\": \"qwen-voice-enrollment\"," // Do not change this value + "\"input\": {" + "\"action\": \"list\"," + "\"page_size\": 10," + "\"page_index\": 0" + "}" + "}"; try { HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection(); con.setRequestMethod("POST"); con.setRequestProperty("Authorization", "Bearer " + apiKey); con.setRequestProperty("Content-Type", "application/json"); con.setDoOutput(true); try (OutputStream os = con.getOutputStream()) { os.write(jsonPayload.getBytes("UTF-8")); } int status = con.getResponseCode(); BufferedReader br = new BufferedReader(new InputStreamReader( status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8")); StringBuilder response = new StringBuilder(); String line; while ((line = br.readLine()) != null) { response.append(line); } br.close(); System.out.println("HTTP status code: " + status); System.out.println("Response JSON: " + response.toString()); if (status == 200) { Gson gson = new Gson(); JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class); JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list"); System.out.println("\n List of voices found:"); for (int i = 0; i < voiceList.size(); i++) { JsonObject voiceItem = voiceList.get(i).getAsJsonObject(); String voice = voiceItem.get("voice").getAsString(); String gmtCreate = voiceItem.get("gmt_create").getAsString(); String targetModel = voiceItem.get("target_model").getAsString(); System.out.printf("- Voice: %s Creation time: %s Model: %s\n", voice, gmtCreate, targetModel); } } } catch (Exception e) { e.printStackTrace(); } } }
Delete a voice
Deletes a specified voice to free up its quota.
-
URL
Chinese Mainland:
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational:
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization -
Request headers
Parameter
Type
Required
Description
Authorization
string
Authentication token in the format
Bearer <your_api_key>. Replace "<your_api_key>" with your actual API key.Content-Type
string
Media type of the data transmitted in the request body. Fixed as
application/json. -
Message body
The request body contains all request parameters. Omit optional fields if not needed:
Importantmodel: Voice cloning model, fixed asqwen-voice-enrollment. Do not modify.{ "model": "qwen-voice-enrollment", "input": { "action": "delete", "voice": "yourVoice" } } -
Request parameters
Parameter
Type
Default
Required
Description
model
string
-
Voice cloning model, fixed as
qwen-voice-enrollment.action
string
-
Operation type, fixed as
delete.voice
string
-
Voice to delete.
-
Response parameters
Key parameters:
Parameter
Type
Description
request_id
string
Request ID.
count
integer
Number of "create voice" operations billed for this request. The cost is $
. Voice deletion is free. Therefore,
countis always 0. -
Sample code
Importantmodel: Voice cloning model, fixed asqwen-voice-enrollment. Do not modify.cURL
If you haven't configured your API key as an environment variable, replace
$DASHSCOPE_API_KEYin the examples with your actual API key.# ======= Important notes ======= # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # === Delete this comment before execution === curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \ --header 'Authorization: Bearer $DASHSCOPE_API_KEY' \ --header 'Content-Type: application/json' \ --data '{ "model": "qwen-voice-enrollment", "input": { "action": "delete", "voice": "yourVoice" } }'Python
import os import requests # API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If you haven't configured an environment variable, replace the following line with: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" voice_to_delete = "yourVoice" # Voice to delete (replace with actual value) payload = { "model": "qwen-voice-enrollment", # Do not change this value "input": { "action": "delete", "voice": voice_to_delete } } headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } response = requests.post(url, json=payload, headers=headers) print("HTTP status code:", response.status_code) if response.status_code == 200: data = response.json() request_id = data["request_id"] print(f"Deletion successful") print(f"Request ID: {request_id}") else: print("Request failed:", response.text)Java
import com.google.gson.Gson; import com.google.gson.JsonObject; import java.io.BufferedReader; import java.io.InputStreamReader; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; public class Main { public static void main(String[] args) { // API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If you haven't configured an environment variable, replace the following line with: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"; String voiceToDelete = "yourVoice"; // Voice to delete (replace with actual value) // Construct JSON request body (string concatenation for Java 8 compatibility) String jsonPayload = "{" + "\"model\": \"qwen-voice-enrollment\"," // Do not change this value + "\"input\": {" + "\"action\": \"delete\"," + "\"voice\": \"" + voiceToDelete + "\"" + "}" + "}"; try { // Establish POST connection HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection(); con.setRequestMethod("POST"); con.setRequestProperty("Authorization", "Bearer " + apiKey); con.setRequestProperty("Content-Type", "application/json"); con.setDoOutput(true); // Send request body try (OutputStream os = con.getOutputStream()) { os.write(jsonPayload.getBytes("UTF-8")); } int status = con.getResponseCode(); BufferedReader br = new BufferedReader(new InputStreamReader( status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8")); StringBuilder response = new StringBuilder(); String line; while ((line = br.readLine()) != null) { response.append(line); } br.close(); System.out.println("HTTP status code: " + status); System.out.println("Response JSON: " + response.toString()); if (status == 200) { Gson gson = new Gson(); JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class); String requestId = jsonObj.get("request_id").getAsString(); System.out.println("Deletion successful"); System.out.println("Request ID: " + requestId); } } catch (Exception e) { e.printStackTrace(); } } }
Speech synthesis
For details on using custom cloned voices for personalized speech synthesis, see Getting started: From cloning to synthesis.
Voice cloning-specific models (such as qwen3-tts-vc-realtime-2026-01-15) are dedicated models that only support custom cloned voices. They do not support system voices such as Chelsie, Serena, Ethan, or Cherry.
Voice quota and cleanup rules
-
Total limit: 1000 voices per account
The current interface does not provide a voice count query feature. Call the List voices interface to count voices yourself.
-
Automatic cleanup: If a voice has not been used in any speech synthesis request for over a year, the system automatically deletes it.
Billing details
Voice cloning and speech synthesis are billed separately:
-
Voice cloning: Creating a voice costs $0.01 per voice. Failed creations are not charged.
NoteFree quota details (available only in Alibaba Cloud China Website (www.aliyun.com) Beijing region and Alibaba Cloud International Website (www.alibabacloud.com) Singapore region):
-
Within 90 days of activating Alibaba Cloud Model Studio, you receive 1000 free voice creation attempts.
-
Failed creations do not consume free attempts.
-
Deleting a voice does not restore free attempts.
-
After the free quota expires or the 90-day validity period ends, voice creation is billed at $0.01 per voice.
-
-
Speech synthesis using custom cloned voices: Billed by volume (character count). For details, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.
Copyright and legality
You are responsible for ownership and legal rights to the voice you provide. Please read the Terms of Service.
Recording guide
Device
We recommend using a microphone with noise reduction or recording with your phone at close range in a quiet environment to ensure clean audio.
Environment
Location
-
Record in an enclosed space of 10 square meters or less.
-
Prioritize rooms with sound-absorbing materials (such as acoustic foam, carpets, or curtains).
-
Avoid large, reverberant spaces such as halls, meeting rooms, or classrooms.
Noise control
-
Outdoor noise: Close windows and doors to block traffic, construction, and other disturbances.
-
Indoor noise: Turn off air conditioners, fans, and fluorescent lamp ballasts. Record ambient noise with your phone and play back at high volume to identify noise sources.
Reverberation control
-
Reverberation causes muffled, unclear audio.
-
Reduce reflections: Draw curtains, open closet doors, and cover tables or cabinets with clothing or sheets.
-
Use irregular objects (such as bookshelves or upholstered furniture) to diffuse sound.
Script
-
Script content is flexible. Align it with your target scenario (e.g., use customer service dialogue style for customer service scenarios). Ensure it contains no sensitive or illegal content (such as political, pornographic, or violent material), as this will cause cloning to fail.
-
Avoid short phrases (like "Hello" or "Yes"). Use complete sentences.
-
Maintain semantic coherence. Avoid frequent pauses (aim for at least 3 seconds of continuous speech).
-
You may adopt the target emotion (such as friendly or serious), but avoid overly dramatic readings. Keep it natural.
Operational tips
Using a typical bedroom as example:
-
Close windows and doors to block external noise.
-
Turn off air conditioners, fans, and other electrical appliances.
-
Draw curtains to reduce reflections.
-
Place clothing or blankets on the desk to reduce reflections.
-
Familiarize yourself with the script. Set your character's tone and deliver naturally.
-
Maintain about 10 cm distance from the recording device to avoid plosives or weak signals.
Error messages
If you encounter errors, see Error messages for troubleshooting.