All Products
Search
Document Center

Alibaba Cloud Model Studio:Qwen voice cloning API Reference

Last Updated:Feb 25, 2026

Qwen voice cloning uses a feature extraction model to clone voices without training. Provide 10 to 20 seconds of audio to generate a highly similar, natural-sounding custom voice. Voice cloning and speech synthesis are sequential steps. This document covers voice cloning parameters and interface details. For speech synthesis, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.

User guide: For model descriptions and selection recommendations, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.

Important

This document applies only to the Qwen voice cloning interface. If you use the CosyVoice model, see CosyVoice voice cloning API.

Audio requirements

High-quality input audio is essential for optimal cloning results.

Item

Requirement

Supported formats

WAV (16-bit), MP3, M4A

Audio duration

Recommended: 10–20 seconds. Maximum: 60 seconds.

File size

< 10 MB

Sample rate

≥ 24 kHz

Sound channel

Mono

Content

The audio must contain at least 3 seconds of continuous, clear speech with no background noise. The rest may include short pauses (≤ 2 seconds). Avoid background music, noise, or other voices throughout clip to ensure high-quality core speech content. Use normal speaking audio as input. Do not upload songs or singing audio to ensure accurate and usable cloning results.

Language

Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), Russian (ru)

Getting started: From cloning to synthesis

image

1. Workflow

Voice cloning and speech synthesis are two separate but closely linked steps that follow a "create first, use later" workflow:

  1. Create a voice

    Call the Create voice interface and upload an audio clip. The system analyzes the audio and creates a custom cloned voice. You must specify target_model, which defines the speech synthesis model that will drive the created voice.

    If you already have a voice (check by calling the List voices interface), skip this step and proceed to the next.

  2. Use the voice for speech synthesis

    Call the speech synthesis interface and pass in the voice from the previous step. The speech synthesis model specified here must match the target_model from the previous step.

2. Model configuration and preparations

Select an appropriate model and complete necessary preparations.

Model configuration

Specify these two models for voice cloning:

  • Voice cloning model: qwen-voice-enrollment

  • Speech synthesis models that drive the voice:

Preparations

  1. Get an API key: Get an API key. For security, set your API key as an environment variable.

  2. Install the SDK: Ensure you have installed the latest DashScope SDK.

  3. Prepare the audio for cloning: The audio must meet the audio requirements.

3. End-to-end example

The following example demonstrates how to use a custom cloned voice in speech synthesis to produce output that closely matches the original voice.

  • Key principle: The target_model (the speech synthesis model that drives the voice) specified during voice cloning must match the model used in the speech synthesis call. Otherwise, synthesis fails.

  • The example uses a local audio file voice.mp3 for voice cloning. Replace it with your own file when running the code.

Bidirectional streaming synthesis

Applies to Qwen3-TTS-VC-Realtime series models. For more information, see Real-time speech synthesis - Qwen.

Python

# coding=utf-8
# Installation instructions for pyaudio:
# APPLE Mac OS X
#   brew install portaudio
#   pip install pyaudio
# Debian/Ubuntu
#   sudo apt-get install python-pyaudio python3-pyaudio
#   or
#   pip install pyaudio
# CentOS
#   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
#   python -m pip install pyaudio

import pyaudio
import os
import requests
import base64
import pathlib
import threading
import time
import dashscope  # DashScope Python SDK version must be 1.23.9 or higher
from dashscope.audio.qwen_tts_realtime import QwenTtsRealtime, QwenTtsRealtimeCallback, AudioFormat

# ======= Constants =======
DEFAULT_TARGET_MODEL = "qwen3-tts-vc-realtime-2026-01-15"  # Target model for voice cloning and speech synthesis must match
DEFAULT_PREFERRED_NAME = "guanyu"
DEFAULT_AUDIO_MIME_TYPE = "audio/mpeg"
VOICE_FILE_PATH = "voice.mp3"  # Relative path to local audio file used for voice cloning

TEXT_TO_SYNTHESIZE = [
    'Right? I love supermarkets like this.',
    'Especially during Chinese New Year',
    'When I go shopping',
    'I feel',
    'Extremely happy!',
    'And want to buy so many things!'
]

def create_voice(file_path: str,
                 target_model: str = DEFAULT_TARGET_MODEL,
                 preferred_name: str = DEFAULT_PREFERRED_NAME,
                 audio_mime_type: str = DEFAULT_AUDIO_MIME_TYPE) -> str:
    """
    Create a voice and return the voice parameter
    """
    # API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you have not configured an environment variable, replace the next line with your Model Studio API key: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")

    file_path_obj = pathlib.Path(file_path)
    if not file_path_obj.exists():
        raise FileNotFoundError(f"Audio file not found: {file_path}")

    base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
    data_uri = f"data:{audio_mime_type};base64,{base64_str}"

    # This URL is for the Singapore region. To use a model from the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    payload = {
        "model": "qwen-voice-enrollment", # Do not change this value
        "input": {
            "action": "create",
            "target_model": target_model,
            "preferred_name": preferred_name,
            "audio": {"data": data_uri}
        }
    }
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }

    resp = requests.post(url, json=payload, headers=headers)
    if resp.status_code != 200:
        raise RuntimeError(f"Failed to create voice: {resp.status_code}, {resp.text}")

    try:
        return resp.json()["output"]["voice"]
    except (KeyError, ValueError) as e:
        raise RuntimeError(f"Failed to parse voice response: {e}")

def init_dashscope_api_key():
    """
    Initialize DashScope SDK API key
    """
    # API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you have not configured an environment variable, replace the next line with your Model Studio API key: dashscope.api_key = "sk-xxx"
    dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")

# ======= Callback class =======
class MyCallback(QwenTtsRealtimeCallback):
    """
    Custom TTS streaming callback
    """
    def __init__(self):
        self.complete_event = threading.Event()
        self._player = pyaudio.PyAudio()
        self._stream = self._player.open(
            format=pyaudio.paInt16, channels=1, rate=24000, output=True
        )

    def on_open(self) -> None:
        print('[TTS] Connection established')

    def on_close(self, close_status_code, close_msg) -> None:
        self._stream.stop_stream()
        self._stream.close()
        self._player.terminate()
        print(f'[TTS] Connection closed code={close_status_code}, msg={close_msg}')

    def on_event(self, response: dict) -> None:
        try:
            event_type = response.get('type', '')
            if event_type == 'session.created':
                print(f'[TTS] Session started: {response["session"]["id"]}')
            elif event_type == 'response.audio.delta':
                audio_data = base64.b64decode(response['delta'])
                self._stream.write(audio_data)
            elif event_type == 'response.done':
                print(f'[TTS] Response completed, Response ID: {qwen_tts_realtime.get_last_response_id()}')
            elif event_type == 'session.finished':
                print('[TTS] Session ended')
                self.complete_event.set()
        except Exception as e:
            print(f'[Error] Exception in callback event handler: {e}')

    def wait_for_finished(self):
        self.complete_event.wait()

# ======= Main execution logic =======
if __name__ == '__main__':
    init_dashscope_api_key()
    print('[System] Initializing Qwen TTS Realtime ...')

    callback = MyCallback()
    qwen_tts_realtime = QwenTtsRealtime(
        model=DEFAULT_TARGET_MODEL,
        callback=callback,
        # This URL is for the Singapore region. To use a model from the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
        url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime'
    )
    qwen_tts_realtime.connect()
    
    qwen_tts_realtime.update_session(
        voice=create_voice(VOICE_FILE_PATH), # Replace voice parameter with cloned voice
        response_format=AudioFormat.PCM_24000HZ_MONO_16BIT,
        mode='server_commit'
    )

    for text_chunk in TEXT_TO_SYNTHESIZE:
        print(f'[Sending text]: {text_chunk}')
        qwen_tts_realtime.append_text(text_chunk)
        time.sleep(0.1)

    qwen_tts_realtime.finish()
    callback.wait_for_finished()

    print(f'[Metric] session_id={qwen_tts_realtime.get_session_id()}, '
          f'first_audio_delay={qwen_tts_realtime.get_first_audio_delay()}s')

Java

You need to import the Gson dependency. If you use Maven or Gradle, add the dependency as follows:

Maven

Add the following to your pom.xml:

<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.13.1</version>
</dependency>

Gradle

Add the following to your build.gradle:

// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")
import com.alibaba.dashscope.audio.qwen_tts_realtime.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.Gson;
import com.google.gson.JsonObject;

import javax.sound.sampled.*;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.atomic.AtomicReference;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class Main {
    // ===== Constants =====
    // Target model for voice cloning and speech synthesis must match
    private static final String TARGET_MODEL = "qwen3-tts-vc-realtime-2026-01-15";
    private static final String PREFERRED_NAME = "guanyu";
    // Relative path to local audio file used for voice cloning
    private static final String AUDIO_FILE = "voice.mp3";
    private static final String AUDIO_MIME_TYPE = "audio/mpeg";
    private static String[] textToSynthesize = {
            "Right? I love supermarkets like this.",
            "Especially during Chinese New Year",
            "When I go shopping",
            "I feel",
            "Extremely happy!",
            "And want to buy so many things!"
    };

    // Generate data URI
    public static String toDataUrl(String filePath) throws IOException {
        byte[] bytes = Files.readAllBytes(Paths.get(filePath));
        String encoded = Base64.getEncoder().encodeToString(bytes);
        return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
    }

    // Call API to create voice
    public static String createVoice() throws Exception {
        // API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        // If you have not configured an environment variable, replace the next line with your Model Studio API key: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        String jsonPayload =
                "{"
                        + "\"model\": \"qwen-voice-enrollment\"," // Do not change this value
                        + "\"input\": {"
                        +     "\"action\": \"create\","
                        +     "\"target_model\": \"" + TARGET_MODEL + "\","
                        +     "\"preferred_name\": \"" + PREFERRED_NAME + "\","
                        +     "\"audio\": {"
                        +         "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
                        +     "}"
                        + "}"
                        + "}";

        HttpURLConnection con = (HttpURLConnection) new URL("https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization").openConnection();
        con.setRequestMethod("POST");
        con.setRequestProperty("Authorization", "Bearer " + apiKey);
        con.setRequestProperty("Content-Type", "application/json");
        con.setDoOutput(true);

        try (OutputStream os = con.getOutputStream()) {
            os.write(jsonPayload.getBytes(StandardCharsets.UTF_8));
        }

        int status = con.getResponseCode();
        System.out.println("HTTP Status Code: " + status);

        try (BufferedReader br = new BufferedReader(
                new InputStreamReader(status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(),
                        StandardCharsets.UTF_8))) {
            StringBuilder response = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                response.append(line);
            }
            System.out.println("Response: " + response);

            if (status == 200) {
                JsonObject jsonObj = new Gson().fromJson(response.toString(), JsonObject.class);
                return jsonObj.getAsJsonObject("output").get("voice").getAsString();
            }
            throw new IOException("Failed to create voice: " + status + " - " + response);
        }
    }

    // Real-time PCM audio player class
    public static class RealtimePcmPlayer {
        private int sampleRate;
        private SourceDataLine line;
        private AudioFormat audioFormat;
        private Thread decoderThread;
        private Thread playerThread;
        private AtomicBoolean stopped = new AtomicBoolean(false);
        private Queue<String> b64AudioBuffer = new ConcurrentLinkedQueue<>();
        private Queue<byte[]> RawAudioBuffer = new ConcurrentLinkedQueue<>();

        // Constructor initializes audio format and audio line
        public RealtimePcmPlayer(int sampleRate) throws LineUnavailableException {
            this.sampleRate = sampleRate;
            this.audioFormat = new AudioFormat(this.sampleRate, 16, 1, true, false);
            DataLine.Info info = new DataLine.Info(SourceDataLine.class, audioFormat);
            line = (SourceDataLine) AudioSystem.getLine(info);
            line.open(audioFormat);
            line.start();
            decoderThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        String b64Audio = b64AudioBuffer.poll();
                        if (b64Audio != null) {
                            byte[] rawAudio = Base64.getDecoder().decode(b64Audio);
                            RawAudioBuffer.add(rawAudio);
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            playerThread = new Thread(new Runnable() {
                @Override
                public void run() {
                    while (!stopped.get()) {
                        byte[] rawAudio = RawAudioBuffer.poll();
                        if (rawAudio != null) {
                            try {
                                playChunk(rawAudio);
                            } catch (IOException e) {
                                throw new RuntimeException(e);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        } else {
                            try {
                                Thread.sleep(100);
                            } catch (InterruptedException e) {
                                throw new RuntimeException(e);
                            }
                        }
                    }
                }
            });
            decoderThread.start();
            playerThread.start();
        }

        // Play an audio chunk and block until playback completes
        private void playChunk(byte[] chunk) throws IOException, InterruptedException {
            if (chunk == null || chunk.length == 0) return;

            int bytesWritten = 0;
            while (bytesWritten < chunk.length) {
                bytesWritten += line.write(chunk, bytesWritten, chunk.length - bytesWritten);
            }
            int audioLength = chunk.length / (this.sampleRate*2/1000);
            // Wait for audio in buffer to finish playing
            Thread.sleep(audioLength - 10);
        }

        public void write(String b64Audio) {
            b64AudioBuffer.add(b64Audio);
        }

        public void cancel() {
            b64AudioBuffer.clear();
            RawAudioBuffer.clear();
        }

        public void waitForComplete() throws InterruptedException {
            while (!b64AudioBuffer.isEmpty() || !RawAudioBuffer.isEmpty()) {
                Thread.sleep(100);
            }
            line.drain();
        }

        public void shutdown() throws InterruptedException {
            stopped.set(true);
            decoderThread.join();
            playerThread.join();
            if (line != null && line.isRunning()) {
                line.drain();
                line.close();
            }
        }
    }

    public static void main(String[] args) throws Exception {
        QwenTtsRealtimeParam param = QwenTtsRealtimeParam.builder()
                .model(TARGET_MODEL)
                // This URL is for the Singapore region. To use a model from the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
                .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                // API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                // If you have not configured an environment variable, replace the next line with your Model Studio API key: .apikey("sk-xxx")
                .apikey(System.getenv("DASHSCOPE_API_KEY"))
                .build();
        AtomicReference<CountDownLatch> completeLatch = new AtomicReference<>(new CountDownLatch(1));
        final AtomicReference<QwenTtsRealtime> qwenTtsRef = new AtomicReference<>(null);

        // Create real-time audio player instance
        RealtimePcmPlayer audioPlayer = new RealtimePcmPlayer(24000);

        QwenTtsRealtime qwenTtsRealtime = new QwenTtsRealtime(param, new QwenTtsRealtimeCallback() {
            @Override
            public void onOpen() {
                // Handle connection established
            }
            @Override
            public void onEvent(JsonObject message) {
                String type = message.get("type").getAsString();
                switch(type) {
                    case "session.created":
                        // Handle session created
                        break;
                    case "response.audio.delta":
                        String recvAudioB64 = message.get("delta").getAsString();
                        // Play audio in real time
                        audioPlayer.write(recvAudioB64);
                        break;
                    case "response.done":
                        // Handle response completed
                        break;
                    case "session.finished":
                        // Handle session ended
                        completeLatch.get().countDown();
                    default:
                        break;
                }
            }
            @Override
            public void onClose(int code, String reason) {
                // Handle connection closed
            }
        });
        qwenTtsRef.set(qwenTtsRealtime);
        try {
            qwenTtsRealtime.connect();
        } catch (NoApiKeyException e) {
            throw new RuntimeException(e);
        }
        QwenTtsRealtimeConfig config = QwenTtsRealtimeConfig.builder()
                .voice(createVoice()) // Replace voice parameter with the exclusive voice generated by voice cloning
                .responseFormat(QwenTtsRealtimeAudioFormat.PCM_24000HZ_MONO_16BIT)
                .mode("server_commit")
                .build();
        qwenTtsRealtime.updateSession(config);
        for (String text:textToSynthesize) {
            qwenTtsRealtime.appendText(text);
            Thread.sleep(100);
        }
        qwenTtsRealtime.finish();
        completeLatch.get().await();

        // Wait for audio playback to complete and shut down player
        audioPlayer.waitForComplete();
        audioPlayer.shutdown();
        System.exit(0);
    }
}

Non-streaming/unidirectional streaming synthesis

Applies to Qwen3-TTS-VC series models. For details, see Speech synthesis - Qwen.

This example references the DashScope SDK non-streaming sample code for speech synthesis using system voices. It replaces the voice parameter with a custom cloned voice. For unidirectional streaming synthesis, see Speech synthesis - Qwen.

Python

import os
import requests
import base64
import pathlib
import dashscope

# ======= Constant configuration =======
DEFAULT_TARGET_MODEL = "qwen3-tts-vc-2026-01-22"  # Use the same model for voice cloning and speech synthesis
DEFAULT_PREFERRED_NAME = "guanyu"
DEFAULT_AUDIO_MIME_TYPE = "audio/mpeg"
VOICE_FILE_PATH = "voice.mp3"  # Relative path to the local audio file used for voice cloning


def create_voice(file_path: str,
                 target_model: str = DEFAULT_TARGET_MODEL,
                 preferred_name: str = DEFAULT_PREFERRED_NAME,
                 audio_mime_type: str = DEFAULT_AUDIO_MIME_TYPE) -> str:
    """
    Create a voice and return the voice parameter.
    """
    # API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you haven't configured an environment variable, replace the following line with: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")

    file_path_obj = pathlib.Path(file_path)
    if not file_path_obj.exists():
        raise FileNotFoundError(f"Audio file does not exist: {file_path}")

    base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
    data_uri = f"data:{audio_mime_type};base64,{base64_str}"

    # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    payload = {
        "model": "qwen-voice-enrollment", # Do not change this value
        "input": {
            "action": "create",
            "target_model": target_model,
            "preferred_name": preferred_name,
            "audio": {"data": data_uri}
        }
    }
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }

    resp = requests.post(url, json=payload, headers=headers)
    if resp.status_code != 200:
        raise RuntimeError(f"Failed to create voice: {resp.status_code}, {resp.text}")

    try:
        return resp.json()["output"]["voice"]
    except (KeyError, ValueError) as e:
        raise RuntimeError(f"Failed to parse voice response: {e}")


if __name__ == '__main__':
    # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
    dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

    text = "How's the weather today?"
    # SpeechSynthesizer interface usage: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...)
    response = dashscope.MultiModalConversation.call(
        model=DEFAULT_TARGET_MODEL,
        # API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        # If you haven't configured an environment variable, replace the following line with: api_key = "sk-xxx"
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        text=text,
        voice=create_voice(VOICE_FILE_PATH), # Replace the voice parameter with the custom voice generated by cloning
        stream=False
    )
    print(response)

Java

Add the Gson dependency. If you use Maven or Gradle, add the dependency as follows:

Maven

Add the following content to your pom.xml:

<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.13.1</version>
</dependency>

Gradle

Add the following content to your build.gradle:

// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")
Important

When using a custom voice generated by voice cloning for speech synthesis, set the voice as follows:

MultiModalConversationParam param = MultiModalConversationParam.builder()
                .parameter("voice", "your_voice") // Replace the voice parameter with the custom voice generated by cloning
                .build();
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.utils.Constants;
import com.google.gson.Gson;
import com.google.gson.JsonObject;

import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
import java.util.Base64;

public class Main {
    // ===== Constant definitions =====
    // Use the same model for voice cloning and speech synthesis
    private static final String TARGET_MODEL = "qwen3-tts-vc-2026-01-22";
    private static final String PREFERRED_NAME = "guanyu";
    // Relative path to the local audio file used for voice cloning
    private static final String AUDIO_FILE = "voice.mp3";
    private static final String AUDIO_MIME_TYPE = "audio/mpeg";

    // Generate a data URI
    public static String toDataUrl(String filePath) throws IOException {
        byte[] bytes = Files.readAllBytes(Paths.get(filePath));
        String encoded = Base64.getEncoder().encodeToString(bytes);
        return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
    }

    // Call the API to create a voice
    public static String createVoice() throws Exception {
        // API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        // If you haven't configured an environment variable, replace the following line with: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        String jsonPayload =
                "{"
                        + "\"model\": \"qwen-voice-enrollment\"," // Do not change this value
                        + "\"input\": {"
                        +     "\"action\": \"create\","
                        +     "\"target_model\": \"" + TARGET_MODEL + "\","
                        +     "\"preferred_name\": \"" + PREFERRED_NAME + "\","
                        +     "\"audio\": {"
                        +         "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
                        +     "}"
                        + "}"
                        + "}";

        // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        String url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
        HttpURLConnection con = (HttpURLConnection) new URL(url).openConnection();
        con.setRequestMethod("POST");
        con.setRequestProperty("Authorization", "Bearer " + apiKey);
        con.setRequestProperty("Content-Type", "application/json");
        con.setDoOutput(true);

        try (OutputStream os = con.getOutputStream()) {
            os.write(jsonPayload.getBytes(StandardCharsets.UTF_8));
        }

        int status = con.getResponseCode();
        System.out.println("HTTP status code: " + status);

        try (BufferedReader br = new BufferedReader(
                new InputStreamReader(status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(),
                        StandardCharsets.UTF_8))) {
            StringBuilder response = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                response.append(line);
            }
            System.out.println("Response content: " + response);

            if (status == 200) {
                JsonObject jsonObj = new Gson().fromJson(response.toString(), JsonObject.class);
                return jsonObj.getAsJsonObject("output").get("voice").getAsString();
            }
            throw new IOException("Failed to create voice: " + status + " - " + response);
        }
    }

    public static void call() throws Exception {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                // If you haven't configured an environment variable, replace the following line with: .apikey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model(TARGET_MODEL)
                .text("How's the weather today?")
                .parameter("voice", createVoice()) // Replace the voice parameter with the custom voice generated by cloning
                .build();
        MultiModalConversationResult result = conv.call(param);
        String audioUrl = result.getOutput().getAudio().getUrl();
        System.out.print(audioUrl);

        // Download the audio file locally
        try (InputStream in = new URL(audioUrl).openStream();
             FileOutputStream out = new FileOutputStream("downloaded_audio.wav")) {
            byte[] buffer = new byte[1024];
            int bytesRead;
            while ((bytesRead = in.read(buffer)) != -1) {
                out.write(buffer, 0, bytesRead);
            }
            System.out.println("\nAudio file downloaded locally: downloaded_audio.wav");
        } catch (Exception e) {
            System.out.println("\nError downloading audio file: " + e.getMessage());
        }
    }
    public static void main(String[] args) {
        try {
            // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
            Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
            call();
        } catch (Exception e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

API reference

When using different APIs, ensure you use the same account.

Create voice

Uploads audio for cloning and create a custom voice.

  • URL

    Chinese Mainland:

    POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

    International:

    POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
  • Request headers

    Parameter

    Type

    Required

    Description

    Authorization

    string

    Supported

    Authentication token in the format Bearer <your_api_key>. Replace "<your_api_key>" with your actual API key.

    Content-Type

    string

    Supported

    Media type of the data transmitted in the request body. Fixed as application/json.

  • Request body

    The request body contains all request parameters. Omit optional fields if not needed.

    Important

    Note the following parameters:

    • model: Voice cloning model, fixed as qwen-voice-enrollment

    • target_model: Speech synthesis model that drives the voice. It must match the speech synthesis model used in subsequent speech synthesis calls. Otherwise, synthesis fails.

    {
        "model": "qwen-voice-enrollment",
        "input": {
            "action": "create",
            "target_model": "qwen3-tts-vc-realtime-2026-01-15",
            "preferred_name": "guanyu",
            "audio": {
                "data": "https://xxx.wav"
            },
            "text": "Optional. Enter the text corresponding to audio.data.",
            "language": "Optional. Enter the language of audio.data, such as zh."
        }
    }
  • Request parameters

    Parameter

    Type

    Default

    Required

    Description

    model

    string

    -

    Supported

    Voice cloning model, fixed as qwen-voice-enrollment.

    action

    string

    -

    Supported

    Operation type, fixed as create.

    target_model

    string

    -

    Supported

    Speech synthesis model that drives the voice. Supported models include (two types):

    It must match the speech synthesis model used in subsequent speech synthesis calls. Otherwise, synthesis fails.

    preferred_name

    string

    -

    Supported

    Assign a recognizable name to the voice (up to 16 characters: digits, letters, and underscores only). Use identifiers related to roles or scenarios.

    This keyword appears in the cloned voice name. For example, if the keyword is "guanyu", the final voice name is "qwen-tts-vc-guanyu-voice-20250812105009984-838b".

    audio.data

    string

    -

    Supported

    Audio for cloning (follow the Recording guide when recording, and ensure the audio meets Audio requirements).

    Submit audio data in one of the following ways:

    1. Data URL

      Format: data:<mediatype>;base64,<data>

      • <mediatype>: MIME type

        • WAV: audio/wav

        • MP3: audio/mpeg

        • M4A: audio/mp4

      • <data>: Base64-encoded string of the audio

        Base64 encoding increases file size. Control the original file size to keep encoded data under 10 MB.

      • Example: data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9

        View sample code

        import base64, pathlib
        
        # input.mp3 is the local audio file for voice cloning. Replace it with your own audio file path and ensure it meets the audio requirements.
        file_path = pathlib.Path("input.mp3")
        base64_str = base64.b64encode(file_path.read_bytes()).decode()
        data_uri = f"data:audio/mpeg;base64,{base64_str}"
        import java.nio.file.*;
        import java.util.Base64;
        
        public class Main {
            /**
             * filePath is the local audio file for voice cloning. Replace it with your own audio file path and ensure it meets the audio requirements.
             */
            public static String toDataUrl(String filePath) throws Exception {
                byte[] bytes = Files.readAllBytes(Paths.get(filePath));
                String encoded = Base64.getEncoder().encodeToString(bytes);
                return "data:audio/mpeg;base64," + encoded;
            }
        
            // Usage example
            public static void main(String[] args) throws Exception {
                System.out.println(toDataUrl("input.mp3"));
            }
        }
    2. Audio URL (we recommend uploading audio to OSS)

      • File size must not exceed 10 MB.

      • The URL must be publicly accessible and require no authentication.

    text

    string

    -

    No

    Text matching the audio.data content.

    If provided, the server compares the audio against this text. If they differ significantly, it returns Audio.PreprocessError.

    language

    string

    -

    No

    Language of the audio.data audio.

    Supported languages: zh (Chinese), en (English), de (German), it (Italian), pt (Portuguese), es (Spanish), ja (Japanese), ko (Korean), fr (French), ru (Russian).

    If using this parameter, ensure the specified language matches the audio language.

  • Response parameters

    View response example

    {
        "output": {
            "voice": "yourVoice",
            "target_model": "qwen3-tts-vc-realtime-2026-01-15"
        },
        "usage": {
            "count": 1
        },
        "request_id": "yourRequestId"
    }

    Key parameters:

    Parameter

    Type

    Description

    voice

    string

    Voice name. Use directly as the voice parameter in the speech synthesis interface.

    target_model

    string

    Speech synthesis model that drives the voice. Supported models include (two types):

    It must match the speech synthesis model used in subsequent speech synthesis calls. Otherwise, synthesis fails.

    request_id

    string

    Request ID.

    count

    integer

    Number of "create voice" operations billed for this request. The cost is $ .

    For voice creation, count is always 1.

  • Sample code

    Important

    Note the following parameters:

    • model: Voice cloning model, fixed as qwen-voice-enrollment

    • target_model: Speech synthesis model that drives the voice. It must match the speech synthesis model used in subsequent speech synthesis calls. Otherwise, synthesis fails.

    cURL

    If you haven't configured your API key as an environment variable, replace $DASHSCOPE_API_KEY in the examples with your actual API key.

    https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization# ======= Important notes =======
    # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    # API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # === Delete this comment before execution ===
    
    curl -X POST <a data-init-id="9f104f338c7kz" href="https://poc-dashscope.aliyuncs.com/api/v1/services/audio/tts/customization" id="35ebbc67890ds">https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization</a> \
    -H "Authorization: Bearer $DASHSCOPE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
        "model": "qwen-voice-enrollment",
        "input": {
            "action": "create",
            "target_model": "qwen3-tts-vc-realtime-2026-01-15",
            "preferred_name": "guanyu",
            "audio": {
                "data": "https://xxx.wav"
            }
        }
    }'

    Python

    import os
    import requests
    import base64, pathlib
    
    target_model = "qwen3-tts-vc-realtime-2026-01-15"
    preferred_name = "guanyu"
    audio_mime_type = "audio/mpeg"
    
    file_path = pathlib.Path("input.mp3")
    base64_str = base64.b64encode(file_path.read_bytes()).decode()
    data_uri = f"data:{audio_mime_type};base64,{base64_str}"
    
    # API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you haven't configured an environment variable, replace the following line with: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")
    # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    
    payload = {
        "model": "qwen-voice-enrollment", # Do not change this value
        "input": {
            "action": "create",
            "target_model": target_model,
            "preferred_name": preferred_name,
            "audio": {
                "data": data_uri
            }
        }
    }
    
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    
    # Send POST request
    resp = requests.post(url, json=payload, headers=headers)
    
    if resp.status_code == 200:
        data = resp.json()
        voice = data["output"]["voice"]
        print(f"Generated voice parameter: {voice}")
    else:
        print("Request failed:", resp.status_code, resp.text)

    Java

    import com.google.gson.Gson;
    import com.google.gson.JsonObject;
    
    import java.io.*;
    import java.net.HttpURLConnection;
    import java.net.URL;
    import java.nio.file.*;
    import java.util.Base64;
    
    public class Main {
        private static final String TARGET_MODEL = "qwen3-tts-vc-realtime-2026-01-15";
        private static final String PREFERRED_NAME = "guanyu";
        private static final String AUDIO_FILE = "input.mp3";
        private static final String AUDIO_MIME_TYPE = "audio/mpeg";
    
        public static String toDataUrl(String filePath) throws Exception {
            byte[] bytes = Files.readAllBytes(Paths.get(filePath));
            String encoded = Base64.getEncoder().encodeToString(bytes);
            return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
        }
    
        public static void main(String[] args) {
            // API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
            // If you haven't configured an environment variable, replace the following line with: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
            // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
            String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
    
            try {
                // Construct JSON request body (escape internal quotes)
                String jsonPayload =
                        "{"
                                + "\"model\": \"qwen-voice-enrollment\"," // Do not change this value
                                + "\"input\": {"
                                +     "\"action\": \"create\","
                                +     "\"target_model\": \"" + TARGET_MODEL + "\","
                                +     "\"preferred_name\": \"" + PREFERRED_NAME + "\","
                                +     "\"audio\": {"
                                +         "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
                                +     "}"
                                + "}"
                                + "}";
    
                HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
                con.setRequestMethod("POST");
                con.setRequestProperty("Authorization", "Bearer " + apiKey);
                con.setRequestProperty("Content-Type", "application/json");
                con.setDoOutput(true);
    
                // Send request body
                try (OutputStream os = con.getOutputStream()) {
                    os.write(jsonPayload.getBytes("UTF-8"));
                }
    
                int status = con.getResponseCode();
                InputStream is = (status >= 200 && status < 300)
                        ? con.getInputStream()
                        : con.getErrorStream();
    
                StringBuilder response = new StringBuilder();
                try (BufferedReader br = new BufferedReader(new InputStreamReader(is, "UTF-8"))) {
                    String line;
                    while ((line = br.readLine()) != null) {
                        response.append(line);
                    }
                }
    
                System.out.println("HTTP status code: " + status);
                System.out.println("Response content: " + response.toString());
    
                if (status == 200) {
                    // Parse JSON
                    Gson gson = new Gson();
                    JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
                    String voice = jsonObj.getAsJsonObject("output").get("voice").getAsString();
                    System.out.println("Generated voice parameter: " + voice);
                }
    
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

List voices

Performs a paged query to list voices you've created.

  • URL

    Chinese Mainland:

    POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

    International:

    POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
  • Request headers

    Parameter

    Type

    Required

    Description

    Authorization

    string

    Supported

    Authentication token in the format Bearer <your_api_key>. Replace "<your_api_key>" with your actual API key.

    Content-Type

    string

    Supported

    Media type of the data transmitted in the request body. Fixed as application/json.

  • Request body

    The request body contains all request parameters. Omit optional fields if not needed.

    Important

    model: Voice cloning model, fixed as qwen-voice-enrollment. Do not modify.

    {
        "model": "qwen-voice-enrollment",
        "input": {
            "action": "list",
            "page_size": 2,
            "page_index": 0
        }
    }
  • Request parameters

    Parameter

    Type

    Default

    Required

    Description

    model

    string

    -

    Supported

    Voice cloning model, fixed as qwen-voice-enrollment.

    action

    string

    -

    Supported

    Operation type, fixed as list.

    page_index

    integer

    0

    Not supported.

    Page index. Range: [0, 1000000].

    page_size

    integer

    10

    No

    Entries per page. Range: [0, 1000000].

  • Response parameters

    View response example

    {
        "output": {
            "voice_list": [
                {
                    "voice": "yourVoice1",
                    "gmt_create": "2025-08-11 17:59:32",
                    "target_model": "qwen3-tts-vc-realtime-2026-01-15"
                },
                {
                    "voice": "yourVoice2",
                    "gmt_create": "2025-08-11 17:38:10",
                    "target_model": "qwen3-tts-vc-realtime-2026-01-15"
                }
            ]
        },
        "usage": {
            "count": 0
        },
        "request_id": "yourRequestId"
    }

    Key parameters:

    Parameter

    Type

    Description

    voice

    string

    Voice name. Use directly as the voice parameter in the speech synthesis interface.

    gmt_create

    string

    Voice creation time.

    target_model

    string

    Speech synthesis model that drives the voice. Supported models include (two types):

    It must match the speech synthesis model used in subsequent speech synthesis calls. Otherwise, synthesis fails.

    request_id

    string

    Request ID.

    count

    integer

    Number of "create voice" operations billed for this request. The cost is $ .

    Voice listing is free. Therefore, count is always 0.

  • Sample code

    Important

    model: Voice cloning model, fixed as qwen-voice-enrollment. Do not modify.

    cURL

    If you haven't configured your API key as an environment variable, replace $DASHSCOPE_API_KEY in the examples with your actual API key.

    # ======= Important notes =======
    # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    # API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # === Delete this comment before execution ===
    
    curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \
    --header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
    --header 'Content-Type: application/json' \
    --data '{
        "model": "qwen-voice-enrollment",
        "input": {
            "action": "list",
            "page_size": 10,
            "page_index": 0
        }
    }'

    Python

    import os
    import requests
    
    # API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you haven't configured an environment variable, replace the following line with: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")
    # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    
    payload = {
        "model": "qwen-voice-enrollment", # Do not change this value
        "input": {
            "action": "list",
            "page_size": 10,
            "page_index": 0
        }
    }
    
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    
    response = requests.post(url, json=payload, headers=headers)
    
    print("HTTP status code:", response.status_code)
    
    if response.status_code == 200:
        data = response.json()
        voice_list = data["output"]["voice_list"]
    
        print("List of voices found:")
        for item in voice_list:
            print(f"- Voice: {item['voice']}  Creation time: {item['gmt_create']}  Model: {item['target_model']}")
    else:
        print("Request failed:", response.text)

    Java

    import com.google.gson.Gson;
    import com.google.gson.JsonArray;
    import com.google.gson.JsonObject;
    
    import java.io.BufferedReader;
    import java.io.InputStreamReader;
    import java.io.OutputStream;
    import java.net.HttpURLConnection;
    import java.net.URL;
    
    public class Main {
        public static void main(String[] args) {
            // API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
            // If you haven't configured an environment variable, replace the following line with: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
            // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
            String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
    
            // JSON request body (older Java versions don't support """ multiline strings)
            String jsonPayload =
                    "{"
                            + "\"model\": \"qwen-voice-enrollment\"," // Do not change this value
                            + "\"input\": {"
                            +     "\"action\": \"list\","
                            +     "\"page_size\": 10,"
                            +     "\"page_index\": 0"
                            + "}"
                            + "}";
    
            try {
                HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
                con.setRequestMethod("POST");
                con.setRequestProperty("Authorization", "Bearer " + apiKey);
                con.setRequestProperty("Content-Type", "application/json");
                con.setDoOutput(true);
    
                try (OutputStream os = con.getOutputStream()) {
                    os.write(jsonPayload.getBytes("UTF-8"));
                }
    
                int status = con.getResponseCode();
                BufferedReader br = new BufferedReader(new InputStreamReader(
                        status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8"));
    
                StringBuilder response = new StringBuilder();
                String line;
                while ((line = br.readLine()) != null) {
                    response.append(line);
                }
                br.close();
    
                System.out.println("HTTP status code: " + status);
                System.out.println("Response JSON: " + response.toString());
    
                if (status == 200) {
                    Gson gson = new Gson();
                    JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
                    JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list");
    
                    System.out.println("\n List of voices found:");
                    for (int i = 0; i < voiceList.size(); i++) {
                        JsonObject voiceItem = voiceList.get(i).getAsJsonObject();
                        String voice = voiceItem.get("voice").getAsString();
                        String gmtCreate = voiceItem.get("gmt_create").getAsString();
                        String targetModel = voiceItem.get("target_model").getAsString();
    
                        System.out.printf("- Voice: %s  Creation time: %s  Model: %s\n",
                                voice, gmtCreate, targetModel);
                    }
                }
    
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

Delete a voice

Deletes a specified voice to free up its quota.

  • URL

    Chinese Mainland:

    POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

    International:

    POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
  • Request headers

    Parameter

    Type

    Required

    Description

    Authorization

    string

    Supported

    Authentication token in the format Bearer <your_api_key>. Replace "<your_api_key>" with your actual API key.

    Content-Type

    string

    Supported

    Media type of the data transmitted in the request body. Fixed as application/json.

  • Message body

    The request body contains all request parameters. Omit optional fields if not needed:

    Important

    model: Voice cloning model, fixed as qwen-voice-enrollment. Do not modify.

    {
        "model": "qwen-voice-enrollment",
        "input": {
            "action": "delete",
            "voice": "yourVoice"
        }
    }
  • Request parameters

    Parameter

    Type

    Default

    Required

    Description

    model

    string

    -

    Supported

    Voice cloning model, fixed as qwen-voice-enrollment.

    action

    string

    -

    Supported

    Operation type, fixed as delete.

    voice

    string

    -

    Supported

    Voice to delete.

  • Response parameters

    View response example

    {
        "usage": {
            "count": 0
        },
        "request_id": "yourRequestId"
    }

    Key parameters:

    Parameter

    Type

    Description

    request_id

    string

    Request ID.

    count

    integer

    Number of "create voice" operations billed for this request. The cost is $ .

    Voice deletion is free. Therefore, count is always 0.

  • Sample code

    Important

    model: Voice cloning model, fixed as qwen-voice-enrollment. Do not modify.

    cURL

    If you haven't configured your API key as an environment variable, replace $DASHSCOPE_API_KEY in the examples with your actual API key.

    # ======= Important notes =======
    # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    # API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # === Delete this comment before execution ===
    
    curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \
    --header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
    --header 'Content-Type: application/json' \
    --data '{
        "model": "qwen-voice-enrollment",
        "input": {
            "action": "delete",
            "voice": "yourVoice"
        }
    }'

    Python

    import os
    import requests
    
    # API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you haven't configured an environment variable, replace the following line with: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")
    # The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    
    voice_to_delete = "yourVoice"  # Voice to delete (replace with actual value)
    
    payload = {
        "model": "qwen-voice-enrollment", # Do not change this value
        "input": {
            "action": "delete",
            "voice": voice_to_delete
        }
    }
    
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    
    response = requests.post(url, json=payload, headers=headers)
    
    print("HTTP status code:", response.status_code)
    
    if response.status_code == 200:
        data = response.json()
        request_id = data["request_id"]
    
        print(f"Deletion successful")
        print(f"Request ID: {request_id}")
    else:
        print("Request failed:", response.text)

    Java

    import com.google.gson.Gson;
    import com.google.gson.JsonObject;
    
    import java.io.BufferedReader;
    import java.io.InputStreamReader;
    import java.io.OutputStream;
    import java.net.HttpURLConnection;
    import java.net.URL;
    
    public class Main {
        public static void main(String[] args) {
            // API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
            // If you haven't configured an environment variable, replace the following line with: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
            // The following URL is for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
            String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
            String voiceToDelete = "yourVoice"; // Voice to delete (replace with actual value)
    
            // Construct JSON request body (string concatenation for Java 8 compatibility)
            String jsonPayload =
                    "{"
                            + "\"model\": \"qwen-voice-enrollment\"," // Do not change this value
                            + "\"input\": {"
                            +     "\"action\": \"delete\","
                            +     "\"voice\": \"" + voiceToDelete + "\""
                            + "}"
                            + "}";
    
            try {
                // Establish POST connection
                HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
                con.setRequestMethod("POST");
                con.setRequestProperty("Authorization", "Bearer " + apiKey);
                con.setRequestProperty("Content-Type", "application/json");
                con.setDoOutput(true);
    
                // Send request body
                try (OutputStream os = con.getOutputStream()) {
                    os.write(jsonPayload.getBytes("UTF-8"));
                }
    
                int status = con.getResponseCode();
                BufferedReader br = new BufferedReader(new InputStreamReader(
                        status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8"));
    
                StringBuilder response = new StringBuilder();
                String line;
                while ((line = br.readLine()) != null) {
                    response.append(line);
                }
                br.close();
    
                System.out.println("HTTP status code: " + status);
                System.out.println("Response JSON: " + response.toString());
    
                if (status == 200) {
                    Gson gson = new Gson();
                    JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
                    String requestId = jsonObj.get("request_id").getAsString();
    
                    System.out.println("Deletion successful");
                    System.out.println("Request ID: " + requestId);
                }
    
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

Speech synthesis

For details on using custom cloned voices for personalized speech synthesis, see Getting started: From cloning to synthesis.

Voice cloning-specific models (such as qwen3-tts-vc-realtime-2026-01-15) are dedicated models that only support custom cloned voices. They do not support system voices such as Chelsie, Serena, Ethan, or Cherry.

Voice quota and cleanup rules

  • Total limit: 1000 voices per account

    The current interface does not provide a voice count query feature. Call the List voices interface to count voices yourself.
  • Automatic cleanup: If a voice has not been used in any speech synthesis request for over a year, the system automatically deletes it.

Billing details

Voice cloning and speech synthesis are billed separately:

  • Voice cloning: Creating a voice costs $0.01 per voice. Failed creations are not charged.

    Note

    Free quota details (available only in Alibaba Cloud China Website (www.aliyun.com) Beijing region and Alibaba Cloud International Website (www.alibabacloud.com) Singapore region):

    • Within 90 days of activating Alibaba Cloud Model Studio, you receive 1000 free voice creation attempts.

    • Failed creations do not consume free attempts.

    • Deleting a voice does not restore free attempts.

    • After the free quota expires or the 90-day validity period ends, voice creation is billed at $0.01 per voice.

  • Speech synthesis using custom cloned voices: Billed by volume (character count). For details, see Real-time speech synthesis - Qwen or Speech synthesis - Qwen.

Copyright and legality

You are responsible for ownership and legal rights to the voice you provide. Please read the Terms of Service.

Recording guide

Device

We recommend using a microphone with noise reduction or recording with your phone at close range in a quiet environment to ensure clean audio.

Environment

Location

  • Record in an enclosed space of 10 square meters or less.

  • Prioritize rooms with sound-absorbing materials (such as acoustic foam, carpets, or curtains).

  • Avoid large, reverberant spaces such as halls, meeting rooms, or classrooms.

Noise control

  • Outdoor noise: Close windows and doors to block traffic, construction, and other disturbances.

  • Indoor noise: Turn off air conditioners, fans, and fluorescent lamp ballasts. Record ambient noise with your phone and play back at high volume to identify noise sources.

Reverberation control

  • Reverberation causes muffled, unclear audio.

  • Reduce reflections: Draw curtains, open closet doors, and cover tables or cabinets with clothing or sheets.

  • Use irregular objects (such as bookshelves or upholstered furniture) to diffuse sound.

Script

  • Script content is flexible. Align it with your target scenario (e.g., use customer service dialogue style for customer service scenarios). Ensure it contains no sensitive or illegal content (such as political, pornographic, or violent material), as this will cause cloning to fail.

  • Avoid short phrases (like "Hello" or "Yes"). Use complete sentences.

  • Maintain semantic coherence. Avoid frequent pauses (aim for at least 3 seconds of continuous speech).

  • You may adopt the target emotion (such as friendly or serious), but avoid overly dramatic readings. Keep it natural.

Operational tips

Using a typical bedroom as example:

  1. Close windows and doors to block external noise.

  2. Turn off air conditioners, fans, and other electrical appliances.

  3. Draw curtains to reduce reflections.

  4. Place clothing or blankets on the desk to reduce reflections.

  5. Familiarize yourself with the script. Set your character's tone and deliver naturally.

  6. Maintain about 10 cm distance from the recording device to avoid plosives or weak signals.

Error messages

If you encounter errors, see Error messages for troubleshooting.