Voice cloning lets you clone voices without training. Provide 10 to 20 seconds of audio to generate a custom voice that closely resembles the original with natural sound quality. Voice cloning and model invocation are two sequential steps. This document covers the voice cloning parameters and API details. For model invocation, see Real-time (Qwen-Omni-Realtime).
This document applies only to the Qwen-Omni and Qwen-Omni-Realtime voice cloning API. If you use a text-to-speech model, see Speech synthesis.
Audio requirements
High-quality input audio is the foundation for a good cloning result.
|
Item |
Requirement |
|
Supported formats |
WAV (16-bit), MP3, M4A |
|
Duration |
10 to 20 seconds recommended. Maximum: 60 seconds. |
|
File size |
< 10 MB |
|
Sample rate |
>= 24 kHz |
|
Channels |
Mono |
|
Content |
The audio must contain at least 3 seconds of continuous, clear speech with no background sounds. The remaining portion may include brief pauses (<=2 seconds). Avoid background music, noise, or other voices throughout the entire audio. Use normal spoken audio as input. Don't upload songs or singing audio. |
|
Languages |
Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), Russian (ru), Thai (th), Indonesian (id), Arabic (ar), Czech (cs), Danish (da), Dutch (nl), Finnish (fi), Hebrew (he), Hindi (hi), Icelandic (is), Malay (ms), Norwegian (no), Persian (fa), Polish (pl), Swedish (sv), Tagalog (tl), Turkish (tr), Urdu (ur), Vietnamese (vi) Chinese dialects: Dongbei, Shannxi, Sichuan, Henan, Changsha, Tianjin, Hangzhou, Liaoning, Shenyang, Anshan |
Quick start: from cloning to real-time conversation
1. Workflow
Voice cloning and real-time conversation are two closely related but independent steps that follow a "create first, then use" workflow:
-
Create a voice
Call the Create a voice API and upload an audio clip. The system analyzes the audio and creates a custom cloned voice. You must specify
target_modelin this step to declare which omni model will drive the voice.If you already have a created voice (call the List voices API to check), skip this step and proceed to the next one.
-
Use the voice in a real-time conversation
Call the real-time multimodal API and pass in the voice obtained in the previous step. The omni model specified in this step must match the
target_modelfrom the previous step.
2. Model configuration and prerequisites
Choose the appropriate models and complete the prerequisites.
Model configuration
Voice cloning requires two models:
-
Voice cloning model: qwen-voice-enrollment
-
Omni model that drives the voice:
-
qwen3.5-omni-plus-realtime
-
qwen3.5-omni-flash-realtime
-
Prerequisites
-
Get an API key: Obtain an API key. For security, configure the API key as an environment variable.
-
Install the SDK: Make sure you have installed the latest DashScope SDK.
-
Prepare the audio for cloning: The audio must meet the audio requirements.
3. End-to-end example
The following example demonstrates how to use a voice cloned through voice cloning in a real-time conversation to produce output that closely resembles the original voice.
Key principle: When cloning a voice, the target_model (the omni model that drives the voice) must match the model specified in the subsequent real-time multimodal API call. Otherwise, synthesis fails. The example uses a local audio file voice.mp3 for voice cloning. Replace it with your own file when you run the code.
Applicable to the Qwen3.5-Omni-Realtime series models. For more information, see Real-time (Qwen-Omni-Realtime).
Python
# Requirements: dashscope >= 1.23.9, pyaudio
import os
import requests
import base64
import pathlib
import time
import pyaudio
from dashscope.audio.qwen_omni import MultiModality, OmniRealtimeCallback, OmniRealtimeConversation
import dashscope
# ======= Configuration =======
DEFAULT_TARGET_MODEL = "qwen3.5-omni-plus-realtime" # Must be the same model for cloning and conversation
DEFAULT_PREFERRED_NAME = "guanyu"
DEFAULT_AUDIO_MIME_TYPE = "audio/mpeg"
VOICE_FILE_PATH = "voice.mp3" # Path to the local audio file for voice cloning
def create_voice(file_path: str,
target_model: str = DEFAULT_TARGET_MODEL,
preferred_name: str = DEFAULT_PREFERRED_NAME,
audio_mime_type: str = DEFAULT_AUDIO_MIME_TYPE) -> str:
"""
Create a custom voice and return the voice parameter.
"""
# API keys differ between the Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you haven't set an environment variable, replace the following line with: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
raise FileNotFoundError(f"Audio file not found: {file_path}")
base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"
# Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
payload = {
"model": "qwen-voice-enrollment",
"input": {
"action": "create",
"target_model": target_model,
"preferred_name": preferred_name,
"audio": {"data": data_uri}
}
}
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
resp = requests.post(url, json=payload, headers=headers)
if resp.status_code != 200:
raise RuntimeError(f"Failed to create voice: {resp.status_code}, {resp.text}")
try:
return resp.json()["output"]["voice"]
except (KeyError, ValueError) as e:
raise RuntimeError(f"Failed to parse voice response: {e}")
class SimpleCallback(OmniRealtimeCallback):
def __init__(self, pya):
self.pya = pya
self.out = None
def on_open(self):
self.out = self.pya.open(
format=pyaudio.paInt16,
channels=1,
rate=24000,
output=True
)
def on_event(self, response):
if response['type'] == 'response.audio.delta':
self.out.write(base64.b64decode(response['delta']))
elif response['type'] == 'conversation.item.input_audio_transcription.completed':
print(f"[User] {response['transcript']}")
elif response['type'] == 'response.audio_transcript.done':
print(f"[LLM] {response['transcript']}")
if __name__ == '__main__':
# If you haven't set an environment variable, replace the following line with: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
# Singapore region URL. For the Beijing region, use: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
url = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime"
# Step 1: Clone a voice
voice = create_voice(VOICE_FILE_PATH)
print(f"Voice cloning complete. Voice: {voice}")
# Step 2: Start a real-time conversation with the cloned voice
pya = pyaudio.PyAudio()
callback = SimpleCallback(pya)
conv = OmniRealtimeConversation(model=DEFAULT_TARGET_MODEL, callback=callback, url=url)
conv.connect()
conv.update_session(
output_modalities=[MultiModality.AUDIO, MultiModality.TEXT],
voice=voice # Use the cloned voice
)
mic = pya.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True)
print("Conversation started. Speak into your microphone (Ctrl+C to exit)...")
try:
while True:
audio_data = mic.read(3200, exception_on_overflow=False)
conv.append_audio(base64.b64encode(audio_data).decode())
time.sleep(0.01)
except KeyboardInterrupt:
conv.close()
mic.close()
callback.out.close()
pya.terminate()
print("\nConversation ended")
Java
import com.alibaba.dashscope.audio.omni.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import javax.sound.sampled.*;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.ByteBuffer;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;
public class Main {
// ===== Constants =====
// Use the same model for voice cloning and real-time conversation
private static final String TARGET_MODEL = "qwen3.5-omni-plus-realtime";
private static final String PREFERRED_NAME = "guanyu";
// Relative path to the local audio file for voice cloning
private static final String AUDIO_FILE = "voice.mp3";
private static final String AUDIO_MIME_TYPE = "audio/mpeg";
// Generate a data URI
public static String toDataUrl(String filePath) throws IOException {
byte[] bytes = Files.readAllBytes(Paths.get(filePath));
String encoded = Base64.getEncoder().encodeToString(bytes);
return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
}
// Call the API to create a voice
public static String createVoice() throws Exception {
// The API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you haven't configured the environment variable, replace the following line with: String apiKey = "sk-xxx"
String apiKey = System.getenv("DASHSCOPE_API_KEY");
String jsonPayload =
"{"
+ "\"model\": \"qwen-voice-enrollment\","
+ "\"input\": {"
+ "\"action\": \"create\","
+ "\"target_model\": \"" + TARGET_MODEL + "\","
+ "\"preferred_name\": \"" + PREFERRED_NAME + "\","
+ "\"audio\": {"
+ "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
+ "}"
+ "}"
+ "}";
// The following URL is for the Singapore region. To use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
String url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
HttpURLConnection con = (HttpURLConnection) new URL(url).openConnection();
con.setRequestMethod("POST");
con.setRequestProperty("Authorization", "Bearer " + apiKey);
con.setRequestProperty("Content-Type", "application/json");
con.setDoOutput(true);
try (OutputStream os = con.getOutputStream()) {
os.write(jsonPayload.getBytes(StandardCharsets.UTF_8));
}
int status = con.getResponseCode();
try (BufferedReader br = new BufferedReader(
new InputStreamReader(status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(),
StandardCharsets.UTF_8))) {
StringBuilder response = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
response.append(line);
}
if (status == 200) {
JsonObject jsonObj = new Gson().fromJson(response.toString(), JsonObject.class);
return jsonObj.getAsJsonObject("output").get("voice").getAsString();
}
throw new IOException("Failed to create voice: " + status + " - " + response);
}
}
// Simple audio player
static class SimpleAudioPlayer {
private final SourceDataLine line;
private final Queue<byte[]> audioQueue = new ConcurrentLinkedQueue<>();
private final Thread playerThread;
private final AtomicBoolean shouldStop = new AtomicBoolean(false);
public SimpleAudioPlayer() throws LineUnavailableException {
AudioFormat format = new AudioFormat(24000, 16, 1, true, false);
line = AudioSystem.getSourceDataLine(format);
line.open(format);
line.start();
playerThread = new Thread(() -> {
while (!shouldStop.get()) {
byte[] audio = audioQueue.poll();
if (audio != null) {
line.write(audio, 0, audio.length);
} else {
try { Thread.sleep(10); } catch (InterruptedException ignored) {}
}
}
}, "AudioPlayer");
playerThread.start();
}
public void play(String base64Audio) {
audioQueue.add(Base64.getDecoder().decode(base64Audio));
}
public void close() {
shouldStop.set(true);
try { playerThread.join(1000); } catch (InterruptedException ignored) {}
line.drain();
line.close();
}
}
public static void main(String[] args) {
try {
// 1. Voice cloning: create a custom voice
String voice = createVoice();
System.out.println("Voice cloning complete. Voice: " + voice);
// 2. Use the cloned voice in a real-time conversation
SimpleAudioPlayer player = new SimpleAudioPlayer();
AtomicBoolean shouldStop = new AtomicBoolean(false);
OmniRealtimeParam param = OmniRealtimeParam.builder()
.model(TARGET_MODEL)
// The API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you haven't configured the environment variable, replace the following line with: .apikey("sk-xxx")
.apikey(System.getenv("DASHSCOPE_API_KEY"))
// The following URL is for the Singapore region. To use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
.url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
.build();
OmniRealtimeConversation conversation = new OmniRealtimeConversation(param, new OmniRealtimeCallback() {
@Override public void onOpen() { System.out.println("Connection established"); }
@Override public void onClose(int code, String reason) {
System.out.println("Connection closed (" + code + "): " + reason);
shouldStop.set(true);
}
@Override public void onEvent(JsonObject event) {
String type = event.get("type").getAsString();
if ("response.audio.delta".equals(type)) {
player.play(event.get("delta").getAsString());
} else if ("conversation.item.input_audio_transcription.completed".equals(type)) {
System.out.println("[User] " + event.get("transcript").getAsString());
} else if ("response.audio_transcript.done".equals(type)) {
System.out.println("[LLM] " + event.get("transcript").getAsString());
}
}
});
conversation.connect();
conversation.updateSession(OmniRealtimeConfig.builder()
.modalities(Arrays.asList(OmniRealtimeModality.AUDIO, OmniRealtimeModality.TEXT))
.voice(voice) // Use the cloned custom voice
.enableTurnDetection(true)
.enableInputAudioTranscription(true)
.build()
);
System.out.println("Conversation started. Speak into the microphone (Ctrl+C to exit)...");
AudioFormat format = new AudioFormat(16000, 16, 1, true, false);
TargetDataLine mic = AudioSystem.getTargetDataLine(format);
mic.open(format);
mic.start();
ByteBuffer buffer = ByteBuffer.allocate(3200);
while (!shouldStop.get()) {
int bytesRead = mic.read(buffer.array(), 0, buffer.capacity());
if (bytesRead > 0) {
conversation.appendAudio(Base64.getEncoder().encodeToString(buffer.array()));
}
Thread.sleep(20);
}
conversation.close(1000, "Normal exit");
player.close();
mic.close();
System.out.println("\nConversation ended");
} catch (NoApiKeyException e) {
System.err.println("API KEY not found. Set the DASHSCOPE_API_KEY environment variable.");
} catch (Exception e) {
e.printStackTrace();
}
System.exit(0);
}
}
API reference
Make sure you use the same account across different APIs.
Create a voice
Upload audio for cloning and create a custom voice.
-
URL
China (Beijing):
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational (Singapore):
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization -
Request headers
Parameter
Type
Required
Description
Authorization
string
Authentication token in the format
Bearer <your_api_key>. Replace<your_api_key>with your actual API key.Content-Type
string
Media type of the request body. Set to
application/json. -
Request body The following request body includes all parameters. Optional fields can be omitted as needed. Distinguish between the following parameters:
Importantmodel: The voice cloning model. Set toqwen-voice-enrollment.target_model: The omni model that drives the voice. This must match the model used in the subsequent real-time multimodal API call. Otherwise, synthesis fails.{ "model": "qwen-voice-enrollment", "input": { "action": "create", "target_model": "qwen3.5-omni-plus-realtime", "preferred_name": "guanyu", "audio": { "data": "https://xxx.wav" }, "text": "Optional. The transcript of the audio in audio.data.", "language": "Optional. The language of the audio in audio.data, such as zh." } } -
Request parameters
Parameter
Type
Default
Required
Description
model
string
-
The voice cloning model. Set to
qwen-voice-enrollment.action
string
-
The action type. Set to
create.target_model
string
-
The omni model that drives the voice:
-
qwen3.5-omni-plus-realtime
-
qwen3.5-omni-flash-realtime
This must match the model used in the subsequent real-time multimodal API call. Otherwise, synthesis fails.
preferred_name
string
-
A human-readable name for the voice. Only digits, letters, and underscores are allowed. Maximum: 16 characters. Use a name related to the role or scenario.
This keyword appears in the final voice name. For example, if the keyword is "guanyu", the resulting voice name is "qwen-omni-vc-guanyu-voice-20250812105009984-838b".
audio.data
string
-
The audio for cloning (follow the Recording guide when recording, and make sure the audio meets the Audio requirements).
Submit audio data in one of the following ways:
-
Format:
data:<mediatype>;base64,<data>-
<mediatype>: The MIME type-
WAV:
audio/wav -
MP3:
audio/mpeg -
M4A:
audio/mp4
-
-
<data>: The Base64-encoded string of the audioBase64 encoding increases the file size. Keep the original file small enough so the encoded result stays under 10 MB.
-
Example:
data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9
-
-
Audio URL (we recommend uploading your audio to OSS)
-
File size must not exceed 10 MB.
-
The URL must be publicly accessible without authentication.
-
text
string
-
The transcript that matches the audio in
audio.data.When this parameter is provided, the server compares the audio against the text. If the difference is too large, an Audio.PreprocessError is returned.
language
string
-
The language of the audio in
audio.data.Supported values:
zh(Chinese),en(English),de(German),it(Italian),pt(Portuguese),es(Spanish),ja(Japanese),ko(Korean),fr(French),ru(Russian),th(Thai),id(Indonesian),ar(Arabic),cs(Czech),da(Danish),nl(Dutch),fi(Finnish),he(Hebrew),hi(Hindi),is(Icelandic),ms(Malay),no(Norwegian),fa(Persian),pl(Polish),sv(Swedish),tl(Tagalog),tr(Turkish),ur(Urdu),vi(Vietnamese).Chinese dialects:
Dongbei,Shannxi,Sichuan,Henan,Changsha,Tianjin,Hangzhou,Liaoning,Shenyang,Anshan.If you use this parameter, set it to the actual language of the audio used for cloning.
-
-
Response parameters
Key response parameters:
Parameter
Type
Description
voice
string
The voice name. Use this value directly as the
voiceparameter in the real-time multimodal API.target_model
string
The omni model that drives the voice:
-
qwen3.5-omni-plus-realtime
-
qwen3.5-omni-flash-realtime
This must match the model used in the subsequent real-time multimodal API call. Otherwise, synthesis fails.
request_id
string
Request ID.
count
integer
The number of "create voice" operations billed for this request. The cost is $
. When creating a voice, count is always 1.
-
-
Sample code
Importantmodel: The voice cloning model. Set toqwen-voice-enrollment.target_model: The omni model that drives the voice. This must match the model used in the subsequent real-time multimodal API call. Otherwise, synthesis fails.curl
If you have not set the API key as an environment variable, you must replace
$DASHSCOPE_API_KEYin the example with your actual API key.# ======= Important ======= # Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key # === Remove these comments before running === curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \ --header 'Authorization: Bearer $DASHSCOPE_API_KEY' \ --header 'Content-Type: application/json' \ --data '{ "model": "qwen-voice-enrollment", "input": { "action": "list", "page_size": 10, "page_index": 0 } }'python
import os import requests # API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key # If you haven't set an environment variable, replace the following line with: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") # Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" payload = { "model": "qwen-voice-enrollment", # Do not modify "input": { "action": "list", "page_size": 10, "page_index": 0 } } headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } response = requests.post(url, json=payload, headers=headers) print("HTTP status:", response.status_code) if response.status_code == 200: data = response.json() voice_list = data["output"]["voice_list"] print("Voice list:") for item in voice_list: print(f"- Voice: {item['voice']} Created: {item['gmt_create']} Model: {item['target_model']}") else: print("Request failed:", response.text)java
import com.google.gson.Gson; import com.google.gson.JsonArray; import com.google.gson.JsonObject; import java.io.BufferedReader; import java.io.InputStreamReader; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; public class Main { public static void main(String[] args) { // API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key // If you haven't set an environment variable, replace the following line with: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"; // JSON request body String jsonPayload = "{" + "\"model\": \"qwen-voice-enrollment\"," // Do not modify + "\"input\": {" + "\"action\": \"list\"," + "\"page_size\": 10," + "\"page_index\": 0" + "}" + "}"; try { HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection(); con.setRequestMethod("POST"); con.setRequestProperty("Authorization", "Bearer " + apiKey); con.setRequestProperty("Content-Type", "application/json"); con.setDoOutput(true); try (OutputStream os = con.getOutputStream()) { os.write(jsonPayload.getBytes("UTF-8")); } int status = con.getResponseCode(); BufferedReader br = new BufferedReader(new InputStreamReader( status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8")); StringBuilder response = new StringBuilder(); String line; while ((line = br.readLine()) != null) { response.append(line); } br.close(); System.out.println("HTTP status: " + status); System.out.println("Response JSON: " + response.toString()); if (status == 200) { Gson gson = new Gson(); JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class); JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list"); System.out.println("\n Voice list:"); for (int i = 0; i < voiceList.size(); i++) { JsonObject voiceItem = voiceList.get(i).getAsJsonObject(); String voice = voiceItem.get("voice").getAsString(); String gmtCreate = voiceItem.get("gmt_create").getAsString(); String targetModel = voiceItem.get("target_model").getAsString(); System.out.printf("- Voice: %s Created: %s Model: %s\n", voice, gmtCreate, targetModel); } } } catch (Exception e) { e.printStackTrace(); } } }
List voices
Query your created voices with pagination.
-
URL
China (Beijing):
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational (Singapore):
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization -
Request headers
Parameter
Type
Required
Description
Authorization
string
Authentication token in the format
Bearer <your_api_key>. Replace<your_api_key>with your actual API key.Content-Type
string
Media type of the request body. Set to
application/json. -
Request parameters
Parameter
Type
Default
Required
Description
model
string
-
The voice cloning model. Set to
qwen-voice-enrollment.action
string
-
The action type. Set to
list.page_index
integer
0
Page number index. Valid values: 0 to 1,000,000.
page_size
integer
10
Number of items per page. Valid values: 0 to 1,000,000.
-
Response parameters
Key response parameters:
Parameter
Type
Description
voice
string
The voice name. Use this value directly in the
voiceparameter of the real-time multimodal API.gmt_create
string
The time when the voice was created.
target_model
string
The omni model that drives the voice:
-
qwen3.5-omni-plus-realtime
-
qwen3.5-omni-flash-realtime
This must match the model used in the subsequent real-time multimodal API call. Otherwise, synthesis fails.
request_id
string
Request ID.
count
integer
This request is charged based on the actual number of 'Create Voice' operations. The cost for this request is $
. Listing voices is free.
countis always 0. -
-
Sample code
Importantmodel: The voice cloning model. The value is fixed asqwen-voice-enrollment. Do not modify this value.cURL
If you have not set the API key as an environment variable, you must replace
$DASHSCOPE_API_KEYin the example with your actual API key.# ======= Important ======= # Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key # === Remove these comments before running === curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \ --header 'Authorization: Bearer $DASHSCOPE_API_KEY' \ --header 'Content-Type: application/json' \ --data '{ "model": "qwen-voice-enrollment", "input": { "action": "list", "page_size": 10, "page_index": 0 } }'Python
import os import requests # API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key # If you haven't set an environment variable, replace the following line with: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") # Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" payload = { "model": "qwen-voice-enrollment", # Do not modify "input": { "action": "list", "page_size": 10, "page_index": 0 } } headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } response = requests.post(url, json=payload, headers=headers) print("HTTP status:", response.status_code) if response.status_code == 200: data = response.json() voice_list = data["output"]["voice_list"] print("Voice list:") for item in voice_list: print(f"- Voice: {item['voice']} Created: {item['gmt_create']} Model: {item['target_model']}") else: print("Request failed:", response.text)Java
import com.google.gson.Gson; import com.google.gson.JsonArray; import com.google.gson.JsonObject; import java.io.BufferedReader; import java.io.InputStreamReader; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; public class Main { public static void main(String[] args) { // API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key // If you haven't set an environment variable, replace the following line with: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"; // JSON request body String jsonPayload = "{" + "\"model\": \"qwen-voice-enrollment\"," // Do not modify + "\"input\": {" + "\"action\": \"list\"," + "\"page_size\": 10," + "\"page_index\": 0" + "}" + "}"; try { HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection(); con.setRequestMethod("POST"); con.setRequestProperty("Authorization", "Bearer " + apiKey); con.setRequestProperty("Content-Type", "application/json"); con.setDoOutput(true); try (OutputStream os = con.getOutputStream()) { os.write(jsonPayload.getBytes("UTF-8")); } int status = con.getResponseCode(); BufferedReader br = new BufferedReader(new InputStreamReader( status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8")); StringBuilder response = new StringBuilder(); String line; while ((line = br.readLine()) != null) { response.append(line); } br.close(); System.out.println("HTTP status: " + status); System.out.println("Response JSON: " + response.toString()); if (status == 200) { Gson gson = new Gson(); JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class); JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list"); System.out.println("\n Voice list:"); for (int i = 0; i < voiceList.size(); i++) { JsonObject voiceItem = voiceList.get(i).getAsJsonObject(); String voice = voiceItem.get("voice").getAsString(); String gmtCreate = voiceItem.get("gmt_create").getAsString(); String targetModel = voiceItem.get("target_model").getAsString(); System.out.printf("- Voice: %s Created: %s Model: %s\n", voice, gmtCreate, targetModel); } } } catch (Exception e) { e.printStackTrace(); } } }
Delete a voice
Delete a specific voice and release the corresponding quota.
-
URL
China (Beijing):
POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customizationInternational (Singapore):
POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization -
Request headers
Parameter
Type
Required
Description
Authorization
string
Authentication token in the format
Bearer <your_api_key>. Replace<your_api_key>with your actual API key.Content-Type
string
Media type of the request body. Set to
application/json. -
Request body
The following request body includes all parameters. Optional fields can be omitted as needed.
Importantmodel: The voice cloning model. The value is fixed asqwen-voice-enrollment. Do not modify this value.{ "model": "qwen-voice-enrollment", "input": { "action": "delete", "voice": "yourVoice" } } -
Request parameters
Parameter
Type
Default
Required
Description
model
string
-
The voice cloning model. Set to
qwen-voice-enrollment.action
string
-
The action type. Set to
delete.voice
string
-
The voice to delete.
-
Response parameters
Key response parameters:
Parameter
Type
Description
request_id
string
Request ID.
count
integer
This request is charged based on the actual number of 'Create Voice' operations. The cost for this request is $
. Deleting a voice is free.
countis always 0. -
Sample code
Importantmodel: The voice cloning model. The value is fixed asqwen-voice-enrollment. Do not modify this value.cURL
If you have not set the API key as an environment variable, you must replace
$DASHSCOPE_API_KEYin the example with your actual API key.# ======= Important ======= # Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization # API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key # === Remove these comments before running === curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \ --header 'Authorization: Bearer $DASHSCOPE_API_KEY' \ --header 'Content-Type: application/json' \ --data '{ "model": "qwen-voice-enrollment", "input": { "action": "delete", "voice": "yourVoice" } }'Python
import os import requests # API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key # If you haven't set an environment variable, replace the following line with: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") # Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" voice_to_delete = "yourVoice" # Replace with the actual voice name payload = { "model": "qwen-voice-enrollment", # Do not modify "input": { "action": "delete", "voice": voice_to_delete } } headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } response = requests.post(url, json=payload, headers=headers) print("HTTP status:", response.status_code) if response.status_code == 200: data = response.json() request_id = data["request_id"] print(f"Deleted successfully") print(f"Request ID: {request_id}") else: print("Request failed:", response.text)Java
import com.google.gson.Gson; import com.google.gson.JsonObject; import java.io.BufferedReader; import java.io.InputStreamReader; import java.io.OutputStream; import java.net.HttpURLConnection; import java.net.URL; public class Main { public static void main(String[] args) { // API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key // If you haven't set an environment variable, replace the following line with: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"; String voiceToDelete = "yourVoice"; // Replace with the actual voice name // Build JSON request body String jsonPayload = "{" + "\"model\": \"qwen-voice-enrollment\"," // Do not modify + "\"input\": {" + "\"action\": \"delete\"," + "\"voice\": \"" + voiceToDelete + "\"" + "}" + "}"; try { // Create POST connection HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection(); con.setRequestMethod("POST"); con.setRequestProperty("Authorization", "Bearer " + apiKey); con.setRequestProperty("Content-Type", "application/json"); con.setDoOutput(true); // Send request body try (OutputStream os = con.getOutputStream()) { os.write(jsonPayload.getBytes("UTF-8")); } int status = con.getResponseCode(); BufferedReader br = new BufferedReader(new InputStreamReader( status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8")); StringBuilder response = new StringBuilder(); String line; while ((line = br.readLine()) != null) { response.append(line); } br.close(); System.out.println("HTTP status: " + status); System.out.println("Response JSON: " + response.toString()); if (status == 200) { Gson gson = new Gson(); JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class); String requestId = jsonObj.get("request_id").getAsString(); System.out.println("Deleted successfully"); System.out.println("Request ID: " + requestId); } } catch (Exception e) { e.printStackTrace(); } } }
Real-time conversation
To use a cloned voice in a real-time conversation, see the Quick start: from cloning to real-time conversation.
Voice quota and auto-cleanup
Total limit: 1,000 voices per account
The current API doesn't provide a voice count query. Call the API to count your voices.
Auto-cleanup: If a voice hasn't been used in any model invocation request for one year, the system automatically deletes it.
Billing
Voice cloning and model invocation are billed separately:
-
Voice cloning: billed at $0.01/voice. Failed creations are not billed.
NoteFree quota (available only for the China site Beijing region and the International site Singapore region):
-
Within 90 days of activating Alibaba Cloud Model Studio, you get 1,000 free voice creations.
-
Failed creations don't consume the free quota.
-
Deleting a voice doesn't restore the free quota.
-
After the free quota is used up or the 90-day period expires, voice creation is billed at $0.01/voice.
-
-
Real-time conversation with a cloned voice: Billed by token usage for model invocation. For details, see Model invocation pricing.
Copyright and legal compliance
You are responsible for the ownership and legal use of the voice you provide. Read the Service Agreement.
Recording guide
Recording equipment
Use a microphone with noise cancellation, or record with a phone at close range in a quiet environment to keep the audio clean.
Recording environment
Location
-
Record in a small enclosed space of 10 square meters or less.
-
Choose a room with sound-absorbing materials such as acoustic foam, carpets, or curtains.
-
Avoid large open halls, conference rooms, or classrooms where reverberation is high.
Noise control
-
Outdoor noise: Close doors and windows to block traffic, construction, and other external sounds.
-
Indoor noise: Turn off air conditioners, fans, and fluorescent light ballasts.
-
Record ambient sound with your phone and play it back at high volume to identify hidden noise sources.
Reverberation control
-
Reverberation causes audio to sound muffled and reduces clarity.
-
Reduce reflections from smooth surfaces: close curtains, open wardrobe doors, and cover desks and shelves with clothing or blankets.
-
Use irregular objects such as bookshelves and upholstered furniture to create diffuse reflections.
Script preparation
-
Match the script to the target use case. For example, use customer service dialog style for a customer service scenario.
-
Make sure the script doesn't contain sensitive or illegal content (such as political, pornographic, or violent material), as this causes cloning to fail.
-
Avoid short phrases (such as "hello" or "yes"). Use complete sentences.
-
Maintain semantic coherence and avoid frequent pauses when reading. At least 3 consecutive seconds without interruption is recommended.
-
You can convey the target emotion (such as friendly or serious), but avoid overly dramatic or theatrical delivery. Keep the tone natural.
Recommended steps
Using a typical bedroom as an example:
-
Close doors and windows to block external noise.
-
Turn off air conditioners, fans, and other appliances.
-
Close curtains to reduce glass reflections.
-
Cover the desk surface with clothing or a blanket to reduce desktop reflections.
-
Familiarize yourself with the script, set the character's tone, and deliver naturally.
-
Maintain about 10 cm distance from the recording device to avoid plosive distortion or weak signal.
Error messages
If you encounter errors, see Error messagesfor troubleshooting.