Qwen-TTS voice cloning API reference - Alibaba Cloud Model Studio

Voice cloning uses a model to extract vocal features from an audio sample, without any training. By providing a 10- to 20-second audio clip, you can generate a natural-sounding custom voice that is highly similar to the original. Voice cloning and speech synthesis are two sequential steps. This document describes the API parameters and details for voice cloning. For speech synthesis, see Real-time speech synthesis - Qwen.

User guide: For model overview and selection, see Real-time speech synthesis - Qwen.

Audio requirements

Excellent cloning results require high-quality input audio.

Item	Requirement
Supported formats	WAV (16-bit), MP3, M4A
Audio duration	Recommended: 10 to 20 seconds. The maximum duration is 60 seconds.
File size	< 10 MB
Sample rate	≥ 24 kHz
Sound channel	Mono
Content	The audio must contain at least 3 seconds of continuous, clear reading without background sound. The rest of the audio can have only short pauses (≤ 2 seconds). The entire audio clip must be free of background music, noise, or other voices to ensure the quality of the core reading content. Use normal speech audio as input. Do not upload songs or singing audio to ensure the accuracy and usability of the cloned voice.
Language	Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), and Russian (ru)

Getting started: From cloning to synthesis

1. Workflow

Voice cloning and speech synthesis are two distinct but related steps that follow a "create first, then use" workflow:

Create a voice
Call the Create a voice API and upload an audio clip. The system analyzes the audio and creates a custom cloned voice. In this step, you must use the target_model parameter to specify the speech synthesis model that drives the created voice.
If you already have a voice, call the Query the voice list API to verify it and skip this step.
Use the voice for speech synthesis
Call the speech synthesis API and pass the voice obtained in the previous step. The speech synthesis model specified in this step must match the target_model specified during voice creation.

2. Model configuration and preparations

Select the appropriate model and complete the required preparation.

Model configuration

When cloning a voice, specify the following two models:

Voice cloning model: qwen-voice-enrollment
Speech synthesis model to use the voice: Only qwen3-tts-vc-realtime-2025-11-27 is currently supported.

Preparations

Get an API key: Create an API key. For security purposes, set the API key as an environment variable.
Install the SDK: Ensure that you have installed the latest version of the DashScope SDK.
Prepare the audio for cloning: The audio must meet the audio requirements.

3. End-to-end example

The following examples show how to use a custom cloned voice for speech synthesis. The synthesized audio output is highly similar to the source voice.

Key principle: The target_model specified during voice cloning must match the model used in the subsequent speech synthesis API call. Otherwise, the synthesis will fail.
This example uses the local audio file voice.mp3 for voice cloning. When you run the code, replace this file with your own audio file.

Python

# The DashScope SDK version must be 1.23.9 or later, and the Python version must be 3.10 or later.
# coding=utf-8
# Installation instructions for pyaudio:
# APPLE Mac OS X
#   brew install portaudio
#   pip install pyaudio
# Debian/Ubuntu
#   sudo apt-get install python-pyaudio python3-pyaudio
#   or
#   pip install pyaudio
# CentOS
#   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
#   python -m pip install pyaudio

import pyaudio
import os
import requests
import base64
import pathlib
import threading
import time
import dashscope  # The DashScope Python SDK version must be 1.23.9 or later.
from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat

# ======= Constant configuration =======
DEFAULT_TARGET_MODEL = "qwen3-tts-vc-realtime-2025-11-27"  # The same model must be used for voice cloning and speech synthesis.
DEFAULT_PREFERRED_NAME = "guanyu"
DEFAULT_AUDIO_MIME_TYPE = "audio/mpeg"
VOICE_FILE_PATH = "voice.mp3"  # The relative path of the local audio file for voice cloning.

TEXT_TO_SYNTHESIZE = [
    'Right? I really like this kind of supermarket,',
    'especially during the New Year.',
    'Going to the supermarket',
    'just makes me feel',
    'super, super happy!',
    'I want to buy so many things!'
]

def create_voice(file_path: str,
                 target_model: str = DEFAULT_TARGET_MODEL,
                 preferred_name: str = DEFAULT_PREFERRED_NAME,
                 audio_mime_type: str = DEFAULT_AUDIO_MIME_TYPE) -> str:
    """
    Create a voice and return the voice parameter.
    """
    # The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
    # If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")

    file_path_obj = pathlib.Path(file_path)
    if not file_path_obj.exists():
        raise FileNotFoundError(f"The audio file does not exist: {file_path}")

    base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
    data_uri = f"data:{audio_mime_type};base64,{base64_str}"

    # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    payload = {
        "model": "qwen-voice-enrollment", # Do not modify this value.
        "input": {
            "action": "create",
            "target_model": target_model,
            "preferred_name": preferred_name,
            "audio": {"data": data_uri}
        }
    }
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }

    resp = requests.post(url, json=payload, headers=headers)
    if resp.status_code != 200:
        raise RuntimeError(f"Failed to create the voice: {resp.status_code}, {resp.text}")

    try:
        return resp.json()["output"]["voice"]
    except (KeyError, ValueError) as e:
        raise RuntimeError(f"Failed to parse the voice response: {e}")

def init_dashscope_api_key():
    """
    Initialize the API key for the DashScope SDK.
    """
    # The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
    # If you have not configured the environment variable, replace the following line with your Model Studio API key: dashscope.api_key = "sk-xxx"
    dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

# ======= Callback class =======
class MyCallback(QwenTtsRealtimeCallback):
    """
    Custom TTS streaming callback.
    """
    def __init__(self):
        self.complete_event = threading.Event()
        self._player = pyaudio.PyAudio()
        self._stream = self._player.open(
            format=pyaudio.paInt16, channels=1, rate=24000, output=True
        )

    def on_open(self) -> None:
        print('[TTS] Connection established')

    def on_close(self, close_status_code, close_msg) -> None:
        self._stream.stop_stream()
        self._stream.close()
        self._player.terminate()
        print(f'[TTS] Connection closed code={close_status_code}, msg={close_msg}')

    def on_event(self, response: dict) -> None:
        try:
            event_type = response.get('type', '')
            if event_type == 'session.created':
                print(f'[TTS] Session started: {response["session"]["id"]}')
            elif event_type == 'response.audio.delta':
                audio_data = base64.b64decode(response['delta'])
                self._stream.write(audio_data)
            elif event_type == 'response.done':
                print(f'[TTS] Response complete, Response ID: {qwen_tts_realtime.get_last_response_id()}')
            elif event_type == 'session.finished':
                print('[TTS] Session finished')
                self.complete_event.set()
        except Exception as e:
            print(f'[Error] Exception occurred while processing callback event: {e}')

    def wait_for_finished(self):
        self.complete_event.wait()

# ======= Main execution logic =======
if __name__ == '__main__':
    init_dashscope_api_key()
    print('[System] Initializing Qwen TTS Realtime ...')

    callback = MyCallback()
    qwen_tts_realtime = QwenTtsRealtime(
        model=DEFAULT_TARGET_MODEL,
        callback=callback,
        # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
        url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
    )
    qwen_tts_realtime.connect()
    
    qwen_tts_realtime.update_session(
        voice=create_voice(VOICE_FILE_PATH), # Replace the voice parameter with the custom voice generated by cloning.
        response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
        mode='server_commit'
    )

    for text_chunk in TEXT_TO_SYNTHESIZE:
        print(f'[Send text]: {text_chunk}')
        qwen_tts_realtime.append_text(text_chunk)
        time.sleep(0.1)

    qwen_tts_realtime.finish()
    callback.wait_for_finished()

    print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, '
          f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')

Java

Import the Gson dependency. If you use Maven or Gradle, add the dependency as follows:

Maven

In the pom.xml file, add the following content:

<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.13.1</version>
</dependency>

Gradle

In the build.gradle file, add the following content:

// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")

// The Java SDK version must be 2.20.9 or later.
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.Gson;
import com.google.gson.JsonObject;

import javax.sound.sampled.*;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class Main {
    // ===== Constant definitions =====
    // The same model must be used for voice cloning and speech synthesis.
    private static final String TARGET_MODEL = "qwen3-tts-vc-realtime-2025-11-27";
    private static final String PREFERRED_NAME = "guanyu";
    // The relative path of the local audio file for voice cloning.
    private static final String AUDIO_FILE = "voice.mp3";
    private static final String AUDIO_MIME_TYPE = "audio/mpeg";
    private static String[] textToSynthesize = {
            "Right? I really like this kind of supermarket,",
            "especially during the New Year.",
            "Going to the supermarket",
            "just makes me feel",
            "super, super happy!",
            "I want to buy so many things!"
    };

    // Generate a data URI.
    public static String toDataUrl(String filePath) throws IOException {
        byte[] bytes = Files.readAllBytes(Paths.get(filePath));
        String encoded = Base64.getEncoder().encodeToString(bytes);
        return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
    }

    // Call the API to create a voice.
    public static String createVoice() throws Exception {
        // The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
        // If you have not configured the environment variable, replace the following line with your Model Studio API key: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        String jsonPayload =
                "{"
                        + "\"model\": \"qwen-voice-enrollment\"," // Do not modify this value.
                        + "\"input\": {"
                        +     "\"action\": \"create\","
                        +     "\"target_model\": \"" + TARGET_MODEL + "\","
                        +     "\"preferred_name\": \"" + PREFERRED_NAME + "\","
                        +     "\"audio\": {"
                        +         "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
                        +     "}"
                        + "}"
                        + "}";

        HttpURLConnection con = (HttpURLConnection) new URL("https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization").openConnection();
        con.setRequestMethod("POST");
        con.setRequestProperty("Authorization", "Bearer " + apiKey);
        con.setRequestProperty("Content-Type", "application/json");
        con.setDoOutput(true);

        try (OutputStream os = con.getOutputStream()) {
            os.write(jsonPayload.getBytes(StandardCharsets.UTF_8));
        }

        int status = con.getResponseCode();
        System.out.println("HTTP status code: " + status);

        try (BufferedReader br = new BufferedReader(
                new InputStreamReader(status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(),
                        StandardCharsets.UTF_8))) {
            StringBuilder response = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                response.append(line);
            }
            System.out.println("Returned content: " + response);

            if (status == 200) {
                JsonObject jsonObj = new Gson().fromJson(response.toString(), JsonObject.class);
                return jsonObj.getAsJsonObject("output").get("voice").getAsString();
            }
            throw new IOException("Failed to create the voice: " + status + " - " + response);
        }
    }

    // Real-time PCM audio player class
    public static class RealtimePcmPlayer {
        private int sampleRate;
        private SourceDataLine line;
        private AudioFormat audioFormat;
        private Thread decoderThread;
        private Thread playerThread;
        private AtomicBoolean stopped = new AtomicBoolean(false);
        private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
        private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();

        // The constructor initializes the audio format and audio line.
        public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
            this.sampleRate = sampleRate;
            this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
            DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
            line = (SourceDataLine) AudioSystem.getLine(info);
            line.open(audioFormat);
            line.start();
            decoderThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        String b64Audio = b64AudioBuffer.poll();
                        if (b64Audio != null) {
                            byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
                            RawAudioBuffer.add(rawAudio);
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            playerThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        byte[] rawAudio = RawAudioBuffer.poll();
                        if (rawAudio != null) {
                            try {
                                playChunk(rawAudio);
                            } catch (IOException e) {
                                throw new RuntimeException(e);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            decoderThread.start();
            playerThread.start();
        }

        // Play an audio chunk and block until playback is complete.
        private void playChunk(byte[] chunk) throws IOException, InterruptedException {
            if (chunk == null || chunk.length == 0) return;

            int bytesWritten = 0;
            while (bytesWritten < chunk.length) {
                bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
            }
            int audioLength = chunk.length / (this.sampleRate*2/1000);
            // Wait for the audio in the buffer to finish playing.
            Thread.sleep(audioLength - 10);
        }

        public void write(String b64Audio) {
            b64AudioBuffer.add(b64Audio);
        }

        public void cancel() {
            b64AudioBuffer.clear();
            RawAudioBuffer.clear();
        }

        public void waitForComplete() throws InterruptedException {
            while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
                Thread.sleep(100);
            }
            line.drain();
        }

        public void shutdown() throws InterruptedException {
            stopped.set(true);
            decoderThread.join();
            playerThread.join();
            if (line != null && line.isRunning()) {
                line.drain();
                line.close();
            }
        }
    }

    public static void main(String[] args) throws Exception {
        QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
                .model(TARGET_MODEL)
                // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
                .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                // The API keys for the Singapore and Beijing regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-api-key.
                // If you have not configured the environment variable, replace the following line with your Model Studio API key: .apikey("sk-xxx")
                .apikey(System.getenv("DASHSCOPE_API_KEY"))
                .build();
        AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
        final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);

        // Create a real-time audio player instance.
        RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);

        QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
            @Override
            public void onOpen() {
                // Handle the connection establishment.
            }
            @Override
            public void onEvent(JsonObject message) {
                String type = message.get("type").getAsString();
                switch(type) {
                    case "session.created":
                        // Handle the session creation.
                        break;
                    case "response.audio.delta":
                        String recvAudioB64 = message.get("delta").getAsString();
                        // Play the audio in real time.
                        audioPlayer.write(recvAudioB64);
                        break;
                    case "response.done":
                        // Handle the response completion.
                        break;
                    case "session.finished":
                        // Handle the session termination.
                        completeLatch.get().countDown();
                    default:
                        break;
                }
            }
            @Override
            public void onClose(int code, String reason) {
                // Handle the connection closure.
            }
        });
        qwenTtsRef.set(qwenTtsRealtime);
        try {
            qwenTtsRealtime.connect();
        } catch (NoApiKeyException e) {
            throw new RuntimeException(e);
        }
        QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
                .voice(createVoice()) // Replace the voice parameter with the custom voice generated by cloning.
                .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
                .mode("server_commit")
                .build();
        qwenTtsRealtime.updateSession(config);
        for (String text:textToSynthesize) {
            qwenTtsRealtime.appendText(text);
            Thread.sleep(100);
        }
        qwenTtsRealtime.finish();
        completeLatch.get().await();

        // Wait for the audio to finish playing and then shut down the player.
        audioPlayer.waitForComplete();
        audioPlayer.shutdown();
        System.exit(0);
    }
}

API reference

When using different APIs, ensure that you use the same account for all operations.

Create a voice

Uploads an audio file to create a custom cloned voice.

URL

China (Beijing):

POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

International (Singapore):

POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization

Request header

Parameter	Type	Required	Description
Authorization	string	Yes	An authentication token. The format is `Bearer <your_api_key>`. Replace "`<your_api_key>`" with your actual API key.
Content-Type	string	Supported	The media type of the data in the request body. The value is fixed to `application/json`.

Request body
The following request body includes all request parameters. You can omit optional parameters as needed.
Important
Note the difference between the following parameters:
- model: The voice cloning model. The value is fixed to qwen-voice-enrollment.
- target_model: The speech synthesis model that is used to generate the voice. This parameter must match the model used in subsequent speech synthesis API calls. Otherwise, the synthesis operation fails.
```
{
    "model": "qwen-voice-enrollment",
    "input": {
        "action": "create",
        "target_model": "qwen3-tts-vc-realtime-2025-11-27",
        "preferred_name": "guanyu",
        "audio": {
            "data": "https://xxx.wav"
        },
        "text": "Optional. The text that corresponds to the audio in audio.data.",
        "language": "Optional. The language of the audio in audio.data, such as zh."
    }
}
```

Request parameters

Parameter	Type	Default	Required	Description
model	string	-	Supported	The voice cloning model. The value is fixed to `qwen-voice-enrollment`.
action	string	-	Supported	The type of operation. The value is fixed to `create`.
target_model	string	-	Supported	The speech synthesis model that drives the voice. Currently, only qwen3-tts-vc-realtime-2025-11-27 is supported. This parameter must be consistent with the speech synthesis model that you use when you later call the speech synthesis API. Otherwise, the synthesis fails.
preferred_name	string	-	Supported	A recognizable name for the voice. The name can contain only digits, letters, and underscores (_), and must be no more than 16 characters long. We recommend using an identifier related to the role or scenario. This keyword appears in the cloned voice name. For example, if the keyword is "guanyu", the final voice name is "qwen-tts-vc-guanyu-voice-20250812105009984-838b".
audio.data	string	-	Supported	The audio for cloning. When you record the audio, follow the Recording guide. The audio must meet the audio requirements. You can submit audio data in one of the following two ways: Data URL Format: `data:<mediatype>;base64,<data>` `<mediatype>`: Multipurpose Internet Mail Extensions (MIME) type WAV: `audio/wav` MP3: `audio/mpeg` M4A: `audio/mp4` `<data>`: The audio data as a Base64-encoded string. Base64 encoding increases the file size. Control the source file size to make sure the encoded file is still smaller than 10 MB. Example: `data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9` Click to view sample code Python `import base64, pathlib # input.mp3 is the local audio file for voice cloning. Replace it with the path to your audio file and make sure it meets the audio requirements. file_path = pathlib.Path("input.mp3") base64_str = base64.b64encode(file_path.read_bytes()).decode() data_uri = f"data:audio/mpeg;base64,{base64_str}"` Java import java.nio.file.; import java.util.Base64; public class Main { /* * filePath is the local audio file for voice cloning. Replace it with the path to your audio file and make sure it meets the audio requirements. */ public static String toDataUrl(String filePath) throws Exception { byte[] bytes = Files.readAllBytes(Paths.get(filePath)); String encoded = Base64.getEncoder().encodeToString(bytes); return "data:audio/mpeg;base64," + encoded; } // Usage example public static void main(String[] args) throws Exception { System.out.println(toDataUrl("input.mp3")); } } An audio URL. We recommend uploading the audio to OSS. The file size cannot exceed 10 MB. The URL must be accessible from the public network and require no authentication.
text	string	-	No	The text that matches the audio content of `audio.data`. If you provide this parameter, the server compares the audio with the text. If the difference is too large, it returns an Audio.PreprocessError.
language	string	-	No	The language of the audio in `audio.data`. Supports `zh` (Chinese), `en` (English), `de` (German), `it` (Italian), `pt` (Portuguese), `es` (Spanish), `ja` (Japanese), `ko` (Korean), `fr` (French), and `ru` (Russian). If you use this parameter, the specified language must match the language of the audio used for cloning.

Response parameters

Click to view a response example

{
    "output": {
        "voice": "yourVoice",
        "target_model": "qwen3-tts-vc-realtime-2025-11-27"
    },
    "usage": {
        "count": 1
    },
    "request_id": "yourRequestId"
}

The following response parameters are important:

Parameter	Type	Description
voice	string	The name of the voice. You can use this name directly as the `voice` parameter in the speech synthesis API.
target_model	string	The speech synthesis model that drives the voice. Currently, only qwen3-tts-vc-realtime-2025-11-27 is supported. This parameter must be consistent with the speech synthesis model that you use when you later call the speech synthesis API. Otherwise, the synthesis fails.
request_id	string	The request ID.
count	integer	The number of "create voice" operations that are billed for this request. The cost of this request is $ $co u n t \times 0.01$ . When you create a voice, the value of count is always 1.

Sample code

Important

Note the difference between the following parameters:

model: The voice cloning model. The value is fixed to qwen-voice-enrollment.
target_model: The speech synthesis model that is used to generate the voice. This parameter must match the model used in subsequent speech synthesis API calls. Otherwise, the synthesis operation fails.

cURL

If you have not set the API key as an environment variable, replace $DASHSCOPE_API_KEY in the example with your actual API key.

https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization# ======= Important =======
# The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
# The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# === Delete this comment before you run the command. ===

curl -X POST <a data-init-id="9f104f338c7kz" href="https://poc-dashscope.aliyuncs.com/api/v1/services/audio/tts/customization" id="35ebbc67890ds">https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization</a> \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-enrollment",
    "input": {
        "action": "create",
        "target_model": "qwen3-tts-vc-realtime-2025-11-27",
        "preferred_name": "guanyu",
        "audio": {
            "data": "https://xxx.wav"
        }
    }
}'

Python

import os
import requests
import base64, pathlib

target_model = "qwen3-tts-vc-realtime-2025-11-27"
preferred_name = "guanyu"
audio_mime_type = "audio/mpeg"

file_path = pathlib.Path("input.mp3")
base64_str = base64.b64encode(file_path.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"

# The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# If you have not set the environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
# The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"

payload = {
    "model": "qwen-voice-enrollment", # Do not modify this value.
    "input": {
        "action": "create",
        "target_model": target_model,
        "preferred_name": preferred_name,
        "audio": {
            "data": data_uri
        }
    }
}

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

# Send the POST request.
resp = requests.post(url, json=payload, headers=headers)

if resp.status_code == 200:
    data = resp.json()
    voice = data["output"]["voice"]
    print(f"Generated voice parameter: {voice}")
else:
    print("Request failed:", resp.status_code, resp.text)

Java

import com.google.gson.Gson;
import com.google.gson.JsonObject;

import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.*;
import java.util.Base64;

public class Main {
    private static final String TARGET_MODEL = "qwen3-tts-vc-realtime-2025-11-27";
    private static final String PREFERRED_NAME = "guanyu";
    private static final String AUDIO_FILE = "input.mp3";
    private static final String AUDIO_MIME_TYPE = "audio/mpeg";

    public static String toDataUrl(String filePath) throws Exception {
        byte[] bytes = Files.readAllBytes(Paths.get(filePath));
        String encoded = Base64.getEncoder().encodeToString(bytes);
        return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
    }

    public static void main(String[] args) {
        // The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
        // If you have not set the environment variable, replace the following line with your Model Studio API key: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");
        // The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";

        try {
            // Construct the JSON request body. Note that internal quotation marks must be escaped.
            String jsonPayload =
                    "{"
                            + "\"model\": \"qwen-voice-enrollment\"," // Do not modify this value.
                            + "\"input\": {"
                            +     "\"action\": \"create\","
                            +     "\"target_model\": \"" + TARGET_MODEL + "\","
                            +     "\"preferred_name\": \"" + PREFERRED_NAME + "\","
                            +     "\"audio\": {"
                            +         "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
                            +     "}"
                            + "}"
                            + "}";

            HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
            con.setRequestMethod("POST");
            con.setRequestProperty("Authorization", "Bearer " + apiKey);
            con.setRequestProperty("Content-Type", "application/json");
            con.setDoOutput(true);

            // Send the request body.
            try (OutputStream os = con.getOutputStream()) {
                os.write(jsonPayload.getBytes("UTF-8"));
            }

            int status = con.getResponseCode();
            InputStream is = (status >= 200 && status < 300)
                    ? con.getInputStream()
                    : con.getErrorStream();

            StringBuilder response = new StringBuilder();
            try (BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"))) {
                String line;
                while ((line = br.readLine()) != null) {
                    response.append(line);
                }
            }

            System.out.println("HTTP status code: " + status);
            System.out.println("Response content: " + response.toString());

            if (status == 200) {
                // Parse the JSON.
                Gson gson = new Gson();
                JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
                String voice = jsonObj.getAsJsonObject("output").get("voice").getAsString();
                System.out.println("Generated voice parameter: " + voice);
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Query the voice list

Performs a paged query to retrieve a list of the voices that you have created.

URL

China (Beijing):

POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

International (Singapore):

POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization

Request header

Parameter	Type	Required	Description
Authorization	string	Yes	An authentication token. The format is `Bearer <your_api_key>`. Replace "`<your_api_key>`" with your actual API key.
Content-Type	string	Supported	The media type of the data in the request body. The value is fixed to `application/json`.

Request body
The following request body includes all request parameters. You can omit optional parameters as needed.
Important
model: The voice cloning model. The value is fixed to qwen-voice-enrollment. Do not modify this value.
```
{
    "model": "qwen-voice-enrollment",
    "input": {
        "action": "list",
        "page_size": 2,
        "page_index": 0
    }
}
```

Request parameters

Parameter	Type	Default	Required	Description
model	string	-	Yes	The voice cloning model. The value is fixed to `qwen-voice-enrollment`.
action	string	-	Yes	The type of operation. The value is fixed to `list`.
page_index	integer	0	No	The page number. Valid values: [0, 1000000].
page_size	integer	10	No	The number of entries per page. Valid values: [0, 1000000].

Response parameters

Click to view a response example

{
    "output": {
        "voice_list": [
            {
                "voice": "yourVoice1",
                "gmt_create": "2025-08-11 17:59:32",
                "target_model": "qwen3-tts-vc-realtime-2025-11-27"
            },
            {
                "voice": "yourVoice2",
                "gmt_create": "2025-08-11 17:38:10",
                "target_model": "qwen3-tts-vc-realtime-2025-11-27"
            }
        ]
    },
    "usage": {
        "count": 0
    },
    "request_id": "yourRequestId"
}

The following response parameters are important:

Parameter	Type	Description
voice	string	The name of the voice. You can use this name directly as the `voice` parameter in the speech synthesis API.
gmt_create	string	The time when the voice was created.
target_model	string	The speech synthesis model that drives the voice. Currently, only qwen3-tts-vc-realtime-2025-11-27 is supported. This parameter must be consistent with the speech synthesis model that you use when you later call the speech synthesis API. Otherwise, the synthesis fails.
request_id	string	The request ID.
count	integer	The number of "create voice" operations that are billed for this request. The cost of this request is $ $co u n t \times 0.01$ . Querying voices is not billed. Therefore, the value of `count` is always 0.

Sample code

Important

model: The voice cloning model. The value is fixed to qwen-voice-enrollment. Do not modify this value.

cURL

If you have not set the API key as an environment variable, replace $DASHSCOPE_API_KEY in the example with your actual API key.

# ======= Important =======
# The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
# The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# === Delete this comment before you run the command. ===

curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \
--header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-voice-enrollment",
    "input": {
        "action": "list",
        "page_size": 10,
        "page_index": 0
    }
}'

Python

import os
import requests

# The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# If you have not set the environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
# The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"

payload = {
    "model": "qwen-voice-enrollment", # Do not modify this value.
    "input": {
        "action": "list",
        "page_size": 10,
        "page_index": 0
    }
}

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print("HTTP status code:", response.status_code)

if response.status_code == 200:
    data = response.json()
    voice_list = data["output"]["voice_list"]

    print("Queried voice list:")
    for item in voice_list:
        print(f"- Voice: {item['voice']}  Creation time: {item['gmt_create']}  Model: {item['target_model']}")
else:
    print("Request failed:", response.text)

Java

import com.google.gson.Gson;
import com.google.gson.JsonArray;
import com.google.gson.JsonObject;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;

public class Main {
    public static void main(String[] args) {
        // The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
        // If you have not set the environment variable, replace the following line with your Model Studio API key: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");
        // The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";

        // JSON request body (older Java versions do not have """ for multiline strings)
        String jsonPayload =
                "{"
                        + "\"model\": \"qwen-voice-enrollment\"," // Do not modify this value.
                        + "\"input\": {"
                        +     "\"action\": \"list\","
                        +     "\"page_size\": 10,"
                        +     "\"page_index\": 0"
                        + "}"
                        + "}";

        try {
            HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
            con.setRequestMethod("POST");
            con.setRequestProperty("Authorization", "Bearer " + apiKey);
            con.setRequestProperty("Content-Type", "application/json");
            con.setDoOutput(true);

            try (OutputStream os = con.getOutputStream()) {
                os.write(jsonPayload.getBytes("UTF-8"));
            }

            int status = con.getResponseCode();
            BufferedReader br = new BufferedReader(new InputStreamReader(
                    status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8"));

            StringBuilder response = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                response.append(line);
            }
            br.close();

            System.out.println("HTTP status code: " + status);
            System.out.println("Returned JSON: " + response.toString());

            if (status == 200) {
                Gson gson = new Gson();
                JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
                JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list");

                System.out.println("\n Queried voice list:");
                for (int i = 0; i < voiceList.size(); i++) {
                    JsonObject voiceItem = voiceList.get(i).getAsJsonObject();
                    String voice = voiceItem.get("voice").getAsString();
                    String gmtCreate = voiceItem.get("gmt_create").getAsString();
                    String targetModel = voiceItem.get("target_model").getAsString();

                    System.out.printf("- Voice: %s  Creation time: %s  Model: %s\n",
                            voice, gmtCreate, targetModel);
                }
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Delete a voice

Deletes a specified voice and releases the corresponding quota.

URL

China (Beijing):

POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

International (Singapore):

POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization

Request header

Parameter	Type	Required	Description
Authorization	string	Yes	An authentication token. The format is `Bearer <your_api_key>`. Replace "`<your_api_key>`" with your actual API key.
Content-Type	string	Supported	The media type of the data in the request body. The value is fixed to `application/json`.

Request body
The following request body includes all request parameters. You can omit optional parameters as needed:
Important
model: The voice cloning model. The value is fixed to qwen-voice-enrollment. Do not modify this value.
```
{
    "model": "qwen-voice-enrollment",
    "input": {
        "action": "delete",
        "voice": "yourVoice"
    }
}
```

Request parameters

Parameter	Type	Default	Required	Description
model	string	-	Supported	The voice cloning model. The value is fixed to `qwen-voice-enrollment`.
action	string	-	Yes	The type of operation. The value is fixed to `delete`.
voice	string	-	Yes	The voice to delete.

Response parameters

Click to view a response example

{
    "usage": {
        "count": 0
    },
    "request_id": "yourRequestId"
}

The following response parameters are important:

Parameter

Type

Description

request_id

string

The request ID.

count

integer

The number of "create voice" operations that are billed for this request. The cost of this request is $ $co u n t \times 0.01$ .

Deleting a voice is not billed. Therefore, the value of count is always 0.

Sample code

Important

model: The voice cloning model. The value is fixed to qwen-voice-enrollment. Do not modify this value.

cURL

If you have not set the API key as an environment variable, replace $DASHSCOPE_API_KEY in the example with your actual API key.

# ======= Important =======
# The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
# The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# === Delete this comment before you run the command. ===

curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \
--header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-voice-enrollment",
    "input": {
        "action": "delete",
        "voice": "yourVoice"
    }
}'

Python

import os
import requests

# The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
# If you have not set the environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
# The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"

voice_to_delete = "yourVoice"  # The voice to delete (replace with the actual value).

payload = {
    "model": "qwen-voice-enrollment", # Do not modify this value.
    "input": {
        "action": "delete",
        "voice": voice_to_delete
    }
}

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print("HTTP status code:", response.status_code)

if response.status_code == 200:
    data = response.json()
    request_id = data["request_id"]

    print(f"Deletion successful")
    print(f"Request ID: {request_id}")
else:
    print("Request failed:", response.text)

Java

import com.google.gson.Gson;
import com.google.gson.JsonObject;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;

public class Main {
    public static void main(String[] args) {
        // The API keys for the Singapore and China (Beijing) regions are different. To get an API key, see https://www.alibabacloud.com/help/en/model-studio/get-an-api-key
        // If you have not set the environment variable, replace the following line with your Model Studio API key: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");
        // The following is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
        String voiceToDelete = "yourVoice"; // The voice to delete (replace with the actual value).

        // Construct the JSON request body (string concatenation for Java 8 compatibility).
        String jsonPayload =
                "{"
                        + "\"model\": \"qwen-voice-enrollment\"," // Do not modify this value.
                        + "\"input\": {"
                        +     "\"action\": \"delete\","
                        +     "\"voice\": \"" + voiceToDelete + "\""
                        + "}"
                        + "}";

        try {
            // Establish a POST connection.
            HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
            con.setRequestMethod("POST");
            con.setRequestProperty("Authorization", "Bearer " + apiKey);
            con.setRequestProperty("Content-Type", "application/json");
            con.setDoOutput(true);

            // Send the request body.
            try (OutputStream os = con.getOutputStream()) {
                os.write(jsonPayload.getBytes("UTF-8"));
            }

            int status = con.getResponseCode();
            BufferedReader br = new BufferedReader(new InputStreamReader(
                    status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8"));

            StringBuilder response = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                response.append(line);
            }
            br.close();

            System.out.println("HTTP status code: " + status);
            System.out.println("Returned JSON: " + response.toString());

            if (status == 200) {
                Gson gson = new Gson();
                JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
                String requestId = jsonObj.get("request_id").getAsString();

                System.out.println("Deletion successful");
                System.out.println("Request ID: " + requestId);
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Speech synthesis

For more information about how to use a custom voice generated by voice cloning to synthesize personalized speech, see Getting started: From cloning to synthesis.

Speech synthesis models for voice cloning, such as qwen3-tts-vc-realtime-2025-11-27, are dedicated models. These models support only custom voices created by cloning and do not support public preset voices, such as Chelsie, Serena, Ethan, or Cherry.

Voice quotas and automatic cleanup rules

Total limit: 1,000 voices per account
This API does not return the total number of voices. You can call the Query the voice list operation to count the voices.
Automatic cleanup: The system automatically deletes voices that have not been used for any speech synthesis requests in the past year.

Billing

Voice cloning and speech synthesis are billed separately:

Voice cloning: Voice creation is billed at $0.01 per voice. You are not charged for failed creation attempts.
Note
Free quota (available only in the Beijing region on the Alibaba Cloud China Website (www.aliyun.com) and the Singapore region on the Alibaba Cloud International Website (www.alibabacloud.com)):
- You receive a free quota of 1,000 voice creations, valid for 90 days after you activate Alibaba Cloud Model Studio.
- Failed creation attempts do not consume your free quota.
- Deleting a voice does not restore your free quota.
- After the free quota is used up or expires, voice creation is billed at $0.01 per voice.
Speech synthesis that uses custom voices created by voice cloning is billed on a pay-as-you-go basis based on the number of text characters. For more information, see Real-time speech synthesis - Qwen.

Copyright and legality

You are responsible for the ownership of any audio that you provide and for its legal use. Please read the Service Agreement.

Recording guide

Recording equipment

Use a microphone with a denoising feature, or record at close range with a mobile phone in a quiet environment to ensure a clean audio source.

Recording environment

Location

Record in a small, enclosed space of less than 10 square meters.
Choose a room with sound-absorbing materials, such as acoustic foam, carpets, or curtains.
Avoid large, open halls, conference rooms, or classrooms with high reverberation.

Noise control

Outdoor noise: Close doors and windows to block out external noise from sources such as traffic and construction.
Indoor noise: Turn off air conditioners, fans, fluorescent light ballasts, or other devices. You can record ambient sound on your phone and play it back at a high volume to identify potential noise sources.

Reverberation control

Reverberation can blur the sound and reduce its definition.
Reduce reflections from smooth surfaces: Draw the curtains, open closet doors, and cover desks or cabinets with clothes or bedsheets.
Use irregular objects, such as bookshelves or upholstered furniture, to diffuse sound waves.

Recording script

Customize the script to fit the target application scenario. For example, a script for a customer service scenario should be conversational. Do not include any sensitive or illegal words, such as those related to politics, pornography, or violence. Otherwise, the cloning process will fail.
Avoid short phrases such as "Hello" or "Yes." Use complete sentences.
Read the script smoothly and coherently. Avoid frequent pauses, and speak for at least 3 seconds at a time without interruption.
Convey the target emotion, such as friendliness or seriousness. Avoid an overly dramatic reading and maintain a natural tone.

Operational suggestions

For example, if you are recording in a typical bedroom:

Close the doors and windows to block out external noise.
Turn off the air conditioner, fans, or other appliances.
Draw the curtains to reduce reflections from the glass.
Place clothes or a blanket on the desk to reduce reflections from the surface.
Familiarize yourself with the script in advance. Set the tone for the character and read in a natural voice.
Stay about 10 cm away from the recording device to avoid popping sounds or a weak signal.

Error messages

If you encounter an error, refer to Error messages for troubleshooting information.