All Products
Search
Document Center

Alibaba Cloud Model Studio:Qwen Voice Cloning API Reference

最終更新日:Apr 21, 2026

Voice cloning lets you clone voices without training. Provide 10 to 20 seconds of audio to generate a custom voice that closely resembles the original with natural sound quality. Voice cloning and model invocation are two sequential steps. This document covers the voice cloning parameters and API details. For model invocation, see Real-time (Qwen-Omni-Realtime).

Important

This document applies only to the Qwen-Omni and Qwen-Omni-Realtime voice cloning API. If you use a text-to-speech model, see Speech synthesis.

Audio requirements

High-quality input audio is the foundation for a good cloning result.

Item

Requirement

Supported formats

WAV (16-bit), MP3, M4A

Duration

10 to 20 seconds recommended. Maximum: 60 seconds.

File size

< 10 MB

Sample rate

>= 24 kHz

Channels

Mono

Content

The audio must contain at least 3 seconds of continuous, clear speech with no background sounds. The remaining portion may include brief pauses (<=2 seconds). Avoid background music, noise, or other voices throughout the entire audio. Use normal spoken audio as input. Don't upload songs or singing audio.

Languages

Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), Russian (ru), Thai (th), Indonesian (id), Arabic (ar), Czech (cs), Danish (da), Dutch (nl), Finnish (fi), Hebrew (he), Hindi (hi), Icelandic (is), Malay (ms), Norwegian (no), Persian (fa), Polish (pl), Swedish (sv), Tagalog (tl), Turkish (tr), Urdu (ur), Vietnamese (vi)

Chinese dialects: Dongbei, Shannxi, Sichuan, Henan, Changsha, Tianjin, Hangzhou, Liaoning, Shenyang, Anshan

Quick start: from cloning to real-time conversation

image

1. Workflow

Voice cloning and real-time conversation are two closely related but independent steps that follow a "create first, then use" workflow:

  1. Create a voice

    Call the Create a voice API and upload an audio clip. The system analyzes the audio and creates a custom cloned voice. You must specify target_model in this step to declare which omni model will drive the voice.

    If you already have a created voice (call the List voices API to check), skip this step and proceed to the next one.

  2. Use the voice in a real-time conversation

    Call the real-time multimodal API and pass in the voice obtained in the previous step. The omni model specified in this step must match the target_model from the previous step.

2. Model configuration and prerequisites

Choose the appropriate models and complete the prerequisites.

Model configuration

Voice cloning requires two models:

  • Voice cloning model: qwen-voice-enrollment

  • Omni model that drives the voice:

    • qwen3.5-omni-plus-realtime

    • qwen3.5-omni-flash-realtime

Prerequisites

  1. Get an API key: Obtain an API key. For security, configure the API key as an environment variable.

  2. Install the SDK: Make sure you have installed the latest DashScope SDK.

  3. Prepare the audio for cloning: The audio must meet the audio requirements.

3. End-to-end example

The following example demonstrates how to use a voice cloned through voice cloning in a real-time conversation to produce output that closely resembles the original voice.

Key principle: When cloning a voice, the target_model (the omni model that drives the voice) must match the model specified in the subsequent real-time multimodal API call. Otherwise, synthesis fails. The example uses a local audio file voice.mp3 for voice cloning. Replace it with your own file when you run the code.

Applicable to the Qwen3.5-Omni-Realtime series models. For more information, see Real-time (Qwen-Omni-Realtime).

Python

# Requirements: dashscope >= 1.23.9, pyaudio
import os
import requests
import base64
import pathlib
import time
import pyaudio
from dashscope.audio.qwen_omni import MultiModality, OmniRealtimeCallback, OmniRealtimeConversation
import dashscope

# ======= Configuration =======
DEFAULT_TARGET_MODEL = "qwen3.5-omni-plus-realtime"  # Must be the same model for cloning and conversation
DEFAULT_PREFERRED_NAME = "guanyu"
DEFAULT_AUDIO_MIME_TYPE = "audio/mpeg"
VOICE_FILE_PATH = "voice.mp3"  # Path to the local audio file for voice cloning


def create_voice(file_path: str,
                 target_model: str = DEFAULT_TARGET_MODEL,
                 preferred_name: str = DEFAULT_PREFERRED_NAME,
                 audio_mime_type: str = DEFAULT_AUDIO_MIME_TYPE) -> str:
    """
    Create a custom voice and return the voice parameter.
    """
    # API keys differ between the Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you haven't set an environment variable, replace the following line with: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")

    file_path_obj = pathlib.Path(file_path)
    if not file_path_obj.exists():
        raise FileNotFoundError(f"Audio file not found: {file_path}")

    base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
    data_uri = f"data:{audio_mime_type};base64,{base64_str}"

    # Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    payload = {
        "model": "qwen-voice-enrollment",
        "input": {
            "action": "create",
            "target_model": target_model,
            "preferred_name": preferred_name,
            "audio": {"data": data_uri}
        }
    }
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }

    resp = requests.post(url, json=payload, headers=headers)
    if resp.status_code != 200:
        raise RuntimeError(f"Failed to create voice: {resp.status_code}, {resp.text}")

    try:
        return resp.json()["output"]["voice"]
    except (KeyError, ValueError) as e:
        raise RuntimeError(f"Failed to parse voice response: {e}")


class SimpleCallback(OmniRealtimeCallback):
    def __init__(self, pya):
        self.pya = pya
        self.out = None
    def on_open(self):
        self.out = self.pya.open(
            format=pyaudio.paInt16,
            channels=1,
            rate=24000,
            output=True
        )
    def on_event(self, response):
        if response['type'] == 'response.audio.delta':
            self.out.write(base64.b64decode(response['delta']))
        elif response['type'] == 'conversation.item.input_audio_transcription.completed':
            print(f"[User] {response['transcript']}")
        elif response['type'] == 'response.audio_transcript.done':
            print(f"[LLM] {response['transcript']}")


if __name__ == '__main__':
    # If you haven't set an environment variable, replace the following line with: dashscope.api_key = "sk-xxx"
    dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
    # Singapore region URL. For the Beijing region, use: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
    url = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime"

    # Step 1: Clone a voice
    voice = create_voice(VOICE_FILE_PATH)
    print(f"Voice cloning complete. Voice: {voice}")

    # Step 2: Start a real-time conversation with the cloned voice
    pya = pyaudio.PyAudio()
    callback = SimpleCallback(pya)
    conv = OmniRealtimeConversation(model=DEFAULT_TARGET_MODEL, callback=callback, url=url)
    conv.connect()
    conv.update_session(
        output_modalities=[MultiModality.AUDIO, MultiModality.TEXT],
        voice=voice  # Use the cloned voice
    )
    mic = pya.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True)
    print("Conversation started. Speak into your microphone (Ctrl+C to exit)...")
    try:
        while True:
            audio_data = mic.read(3200, exception_on_overflow=False)
            conv.append_audio(base64.b64encode(audio_data).decode())
            time.sleep(0.01)
    except KeyboardInterrupt:
        conv.close()
        mic.close()
        callback.out.close()
        pya.terminate()
        print("\nConversation ended")

Java

import com.alibaba.dashscope.audio.omni.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.Gson;
import com.google.gson.JsonObject;

import javax.sound.sampled.*;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.ByteBuffer;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class Main {
    // ===== Constants =====
    // Use the same model for voice cloning and real-time conversation
    private static final String TARGET_MODEL = "qwen3.5-omni-plus-realtime";
    private static final String PREFERRED_NAME = "guanyu";
    // Relative path to the local audio file for voice cloning
    private static final String AUDIO_FILE = "voice.mp3";
    private static final String AUDIO_MIME_TYPE = "audio/mpeg";

    // Generate a data URI
    public static String toDataUrl(String filePath) throws IOException {
        byte[] bytes = Files.readAllBytes(Paths.get(filePath));
        String encoded = Base64.getEncoder().encodeToString(bytes);
        return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
    }

    // Call the API to create a voice
    public static String createVoice() throws Exception {
        // The API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        // If you haven't configured the environment variable, replace the following line with: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        String jsonPayload =
                "{"
                        + "\"model\": \"qwen-voice-enrollment\","
                        + "\"input\": {"
                        +     "\"action\": \"create\","
                        +     "\"target_model\": \"" + TARGET_MODEL + "\","
                        +     "\"preferred_name\": \"" + PREFERRED_NAME + "\","
                        +     "\"audio\": {"
                        +         "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
                        +     "}"
                        + "}"
                        + "}";

        // The following URL is for the Singapore region. To use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        String url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
        HttpURLConnection con = (HttpURLConnection) new URL(url).openConnection();
        con.setRequestMethod("POST");
        con.setRequestProperty("Authorization", "Bearer " + apiKey);
        con.setRequestProperty("Content-Type", "application/json");
        con.setDoOutput(true);

        try (OutputStream os = con.getOutputStream()) {
            os.write(jsonPayload.getBytes(StandardCharsets.UTF_8));
        }

        int status = con.getResponseCode();
        try (BufferedReader br = new BufferedReader(
                new InputStreamReader(status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(),
                        StandardCharsets.UTF_8))) {
            StringBuilder response = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                response.append(line);
            }
            if (status == 200) {
                JsonObject jsonObj = new Gson().fromJson(response.toString(), JsonObject.class);
                return jsonObj.getAsJsonObject("output").get("voice").getAsString();
            }
            throw new IOException("Failed to create voice: " + status + " - " + response);
        }
    }

    // Simple audio player
    static class SimpleAudioPlayer {
        private final SourceDataLine line;
        private final Queue<byte[]> audioQueue = new ConcurrentLinkedQueue<>();
        private final Thread playerThread;
        private final AtomicBoolean shouldStop = new AtomicBoolean(false);

        public SimpleAudioPlayer() throws LineUnavailableException {
            AudioFormat format = new AudioFormat(24000, 16, 1, true, false);
            line = AudioSystem.getSourceDataLine(format);
            line.open(format);
            line.start();
            playerThread = new Thread(() -> {
                while (!shouldStop.get()) {
                    byte[] audio = audioQueue.poll();
                    if (audio != null) {
                        line.write(audio, 0, audio.length);
                    } else {
                        try { Thread.sleep(10); } catch (InterruptedException ignored) {}
                    }
                }
            }, "AudioPlayer");
            playerThread.start();
        }

        public void play(String base64Audio) {
            audioQueue.add(Base64.getDecoder().decode(base64Audio));
        }

        public void close() {
            shouldStop.set(true);
            try { playerThread.join(1000); } catch (InterruptedException ignored) {}
            line.drain();
            line.close();
        }
    }

    public static void main(String[] args) {
        try {
            // 1. Voice cloning: create a custom voice
            String voice = createVoice();
            System.out.println("Voice cloning complete. Voice: " + voice);

            // 2. Use the cloned voice in a real-time conversation
            SimpleAudioPlayer player = new SimpleAudioPlayer();
            AtomicBoolean shouldStop = new AtomicBoolean(false);

            OmniRealtimeParam param = OmniRealtimeParam.builder()
                    .model(TARGET_MODEL)
                    // The API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                    // If you haven't configured the environment variable, replace the following line with: .apikey("sk-xxx")
                    .apikey(System.getenv("DASHSCOPE_API_KEY"))
                    // The following URL is for the Singapore region. To use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
                    .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                    .build();

            OmniRealtimeConversation conversation = new OmniRealtimeConversation(param, new OmniRealtimeCallback() {
                @Override public void onOpen() { System.out.println("Connection established"); }
                @Override public void onClose(int code, String reason) {
                    System.out.println("Connection closed (" + code + "): " + reason);
                    shouldStop.set(true);
                }
                @Override public void onEvent(JsonObject event) {
                    String type = event.get("type").getAsString();
                    if ("response.audio.delta".equals(type)) {
                        player.play(event.get("delta").getAsString());
                    } else if ("conversation.item.input_audio_transcription.completed".equals(type)) {
                        System.out.println("[User] " + event.get("transcript").getAsString());
                    } else if ("response.audio_transcript.done".equals(type)) {
                        System.out.println("[LLM] " + event.get("transcript").getAsString());
                    }
                }
            });

            conversation.connect();
            conversation.updateSession(OmniRealtimeConfig.builder()
                    .modalities(Arrays.asList(OmniRealtimeModality.AUDIO, OmniRealtimeModality.TEXT))
                    .voice(voice)  // Use the cloned custom voice
                    .enableTurnDetection(true)
                    .enableInputAudioTranscription(true)
                    .build()
            );

            System.out.println("Conversation started. Speak into the microphone (Ctrl+C to exit)...");
            AudioFormat format = new AudioFormat(16000, 16, 1, true, false);
            TargetDataLine mic = AudioSystem.getTargetDataLine(format);
            mic.open(format);
            mic.start();

            ByteBuffer buffer = ByteBuffer.allocate(3200);
            while (!shouldStop.get()) {
                int bytesRead = mic.read(buffer.array(), 0, buffer.capacity());
                if (bytesRead > 0) {
                    conversation.appendAudio(Base64.getEncoder().encodeToString(buffer.array()));
                }
                Thread.sleep(20);
            }

            conversation.close(1000, "Normal exit");
            player.close();
            mic.close();
            System.out.println("\nConversation ended");
        } catch (NoApiKeyException e) {
            System.err.println("API KEY not found. Set the DASHSCOPE_API_KEY environment variable.");
        } catch (Exception e) {
            e.printStackTrace();
        }
        System.exit(0);
    }
}

API reference

Make sure you use the same account across different APIs.

Create a voice

Upload audio for cloning and create a custom voice.

  • URL

    China (Beijing):

    POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

    International (Singapore):

    POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
  • Request headers

    Parameter

    Type

    Required

    Description

    Authorization

    string

    Supported

    Authentication token in the format Bearer <your_api_key>. Replace <your_api_key> with your actual API key.

    Content-Type

    string

    Supported

    Media type of the request body. Set to application/json.

  • Request body The following request body includes all parameters. Optional fields can be omitted as needed. Distinguish between the following parameters:

    Important

    model: The voice cloning model. Set to qwen-voice-enrollment.

    target_model: The omni model that drives the voice. This must match the model used in the subsequent real-time multimodal API call. Otherwise, synthesis fails.

    {
        "model": "qwen-voice-enrollment",
        "input": {
            "action": "create",
            "target_model": "qwen3.5-omni-plus-realtime",
            "preferred_name": "guanyu",
            "audio": {
                "data": "https://xxx.wav"
            },
            "text": "Optional. The transcript of the audio in audio.data.",
            "language": "Optional. The language of the audio in audio.data, such as zh."
        }
    }
  • Request parameters

    Parameter

    Type

    Default

    Required

    Description

    model

    string

    -

    Supported

    The voice cloning model. Set to qwen-voice-enrollment.

    action

    string

    -

    Supported

    The action type. Set to create.

    target_model

    string

    -

    Supported

    The omni model that drives the voice:

    • qwen3.5-omni-plus-realtime

    • qwen3.5-omni-flash-realtime

    This must match the model used in the subsequent real-time multimodal API call. Otherwise, synthesis fails.

    preferred_name

    string

    -

    Supported

    A human-readable name for the voice. Only digits, letters, and underscores are allowed. Maximum: 16 characters. Use a name related to the role or scenario.

    This keyword appears in the final voice name. For example, if the keyword is "guanyu", the resulting voice name is "qwen-omni-vc-guanyu-voice-20250812105009984-838b".

    audio.data

    string

    -

    Supported

    The audio for cloning (follow the Recording guide when recording, and make sure the audio meets the Audio requirements).

    Submit audio data in one of the following ways:

    1. Data URL

      Format: data:<mediatype>;base64,<data>

      • <mediatype>: The MIME type

        • WAV: audio/wav

        • MP3: audio/mpeg

        • M4A: audio/mp4

      • <data>: The Base64-encoded string of the audio

        Base64 encoding increases the file size. Keep the original file small enough so the encoded result stays under 10 MB.

      • Example: data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9

        View sample code

        import base64, pathlib
        
        # input.mp3 is the local audio file for voice cloning. Replace with your own file path. Make sure the file meets the audio requirements.
        file_path = pathlib.Path("input.mp3")
        base64_str = base64.b64encode(file_path.read_bytes()).decode()
        data_uri = f"data:audio/mpeg;base64,{base64_str}"
        import java.nio.file.*;
        import java.util.Base64;
        
        public class Main {
            /**
             * filePath is the local audio file for voice cloning. Replace with your own file path. Make sure the file meets the audio requirements.
             */
            public static String toDataUrl(String filePath) throws Exception {
                byte[] bytes = Files.readAllBytes(Paths.get(filePath));
                String encoded = Base64.getEncoder().encodeToString(bytes);
                return "data:audio/mpeg;base64," + encoded;
            }
        
            // Usage example
            public static void main(String[] args) throws Exception {
                System.out.println(toDataUrl("input.mp3"));
            }
        }
    2. Audio URL (we recommend uploading your audio to OSS)

      • File size must not exceed 10 MB.

      • The URL must be publicly accessible without authentication.

    text

    string

    -

    Unsupported

    The transcript that matches the audio in audio.data.

    When this parameter is provided, the server compares the audio against the text. If the difference is too large, an Audio.PreprocessError is returned.

    language

    string

    -

    Unsupported

    The language of the audio in audio.data.

    Supported values: zh (Chinese), en (English), de (German), it (Italian), pt (Portuguese), es (Spanish), ja (Japanese), ko (Korean), fr (French), ru (Russian), th (Thai), id (Indonesian), ar (Arabic), cs (Czech), da (Danish), nl (Dutch), fi (Finnish), he (Hebrew), hi (Hindi), is (Icelandic), ms (Malay), no (Norwegian), fa (Persian), pl (Polish), sv (Swedish), tl (Tagalog), tr (Turkish), ur (Urdu), vi (Vietnamese).

    Chinese dialects: Dongbei, Shannxi, Sichuan, Henan, Changsha, Tianjin, Hangzhou, Liaoning, Shenyang, Anshan.

    If you use this parameter, set it to the actual language of the audio used for cloning.

  • Response parameters

    View response example

    {
        "output": {
            "voice": "yourVoice",
            "target_model": "qwen3.5-omni-plus-realtime"
        },
        "usage": {
            "count": 1
        },
        "request_id": "yourRequestId"
    }

    Key response parameters:

    Parameter

    Type

    Description

    voice

    string

    The voice name. Use this value directly as the voice parameter in the real-time multimodal API.

    target_model

    string

    The omni model that drives the voice:

    • qwen3.5-omni-plus-realtime

    • qwen3.5-omni-flash-realtime

    This must match the model used in the subsequent real-time multimodal API call. Otherwise, synthesis fails.

    request_id

    string

    Request ID.

    count

    integer

    The number of "create voice" operations billed for this request. The cost is $ .

    When creating a voice, count is always 1.

  • Sample code

    Important

    model: The voice cloning model. Set to qwen-voice-enrollment.

    target_model: The omni model that drives the voice. This must match the model used in the subsequent real-time multimodal API call. Otherwise, synthesis fails.

    curl

    If you have not set the API key as an environment variable, you must replace $DASHSCOPE_API_KEY in the example with your actual API key.

    # ======= Important =======
    # Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    # API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # === Remove these comments before running ===
    
    curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \
    --header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
    --header 'Content-Type: application/json' \
    --data '{
        "model": "qwen-voice-enrollment",
        "input": {
            "action": "list",
            "page_size": 10,
            "page_index": 0
        }
    }'

    python

    import os
    import requests
    
    # API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you haven't set an environment variable, replace the following line with: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")
    # Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    
    payload = {
        "model": "qwen-voice-enrollment", # Do not modify
        "input": {
            "action": "list",
            "page_size": 10,
            "page_index": 0
        }
    }
    
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    
    response = requests.post(url, json=payload, headers=headers)
    
    print("HTTP status:", response.status_code)
    
    if response.status_code == 200:
        data = response.json()
        voice_list = data["output"]["voice_list"]
    
        print("Voice list:")
        for item in voice_list:
            print(f"- Voice: {item['voice']}  Created: {item['gmt_create']}  Model: {item['target_model']}")
    else:
        print("Request failed:", response.text)

    java

    import com.google.gson.Gson;
    import com.google.gson.JsonArray;
    import com.google.gson.JsonObject;
    
    import java.io.BufferedReader;
    import java.io.InputStreamReader;
    import java.io.OutputStream;
    import java.net.HttpURLConnection;
    import java.net.URL;
    
    public class Main {
        public static void main(String[] args) {
            // API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
            // If you haven't set an environment variable, replace the following line with: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
            // Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
            String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
    
            // JSON request body
            String jsonPayload =
                    "{"
                            + "\"model\": \"qwen-voice-enrollment\"," // Do not modify
                            + "\"input\": {"
                            +     "\"action\": \"list\","
                            +     "\"page_size\": 10,"
                            +     "\"page_index\": 0"
                            + "}"
                            + "}";
    
            try {
                HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
                con.setRequestMethod("POST");
                con.setRequestProperty("Authorization", "Bearer " + apiKey);
                con.setRequestProperty("Content-Type", "application/json");
                con.setDoOutput(true);
    
                try (OutputStream os = con.getOutputStream()) {
                    os.write(jsonPayload.getBytes("UTF-8"));
                }
    
                int status = con.getResponseCode();
                BufferedReader br = new BufferedReader(new InputStreamReader(
                        status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8"));
    
                StringBuilder response = new StringBuilder();
                String line;
                while ((line = br.readLine()) != null) {
                    response.append(line);
                }
                br.close();
    
                System.out.println("HTTP status: " + status);
                System.out.println("Response JSON: " + response.toString());
    
                if (status == 200) {
                    Gson gson = new Gson();
                    JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
                    JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list");
    
                    System.out.println("\n Voice list:");
                    for (int i = 0; i < voiceList.size(); i++) {
                        JsonObject voiceItem = voiceList.get(i).getAsJsonObject();
                        String voice = voiceItem.get("voice").getAsString();
                        String gmtCreate = voiceItem.get("gmt_create").getAsString();
                        String targetModel = voiceItem.get("target_model").getAsString();
    
                        System.out.printf("- Voice: %s  Created: %s  Model: %s\n",
                                voice, gmtCreate, targetModel);
                    }
                }
    
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

List voices

Query your created voices with pagination.

  • URL

    China (Beijing):

    POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

    International (Singapore):

    POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
  • Request headers

    Parameter

    Type

    Required

    Description

    Authorization

    string

    Supported

    Authentication token in the format Bearer <your_api_key>. Replace <your_api_key> with your actual API key.

    Content-Type

    string

    Supported

    Media type of the request body. Set to application/json.

  • Request parameters

    Parameter

    Type

    Default

    Required

    Description

    model

    string

    -

    Supported

    The voice cloning model. Set to qwen-voice-enrollment.

    action

    string

    -

    Supported

    The action type. Set to list.

    page_index

    integer

    0

    Unsupported

    Page number index. Valid values: 0 to 1,000,000.

    page_size

    integer

    10

    Unsupported

    Number of items per page. Valid values: 0 to 1,000,000.

  • Response parameters

    View response example

    {
        "output": {
            "voice_list": [
                {
                    "voice": "yourVoice1",
                    "gmt_create": "2025-08-11 17:59:32",
                    "target_model": "qwen3.5-omni-plus-realtime"
                },
                {
                    "voice": "yourVoice2",
                    "gmt_create": "2025-08-11 17:38:10",
                    "target_model": "qwen3.5-omni-plus-realtime"
                }
            ]
        },
        "usage": {
            "count": 0
        },
        "request_id": "yourRequestId"
    }

    Key response parameters:

    Parameter

    Type

    Description

    voice

    string

    The voice name. Use this value directly in the voice parameter of the real-time multimodal API.

    gmt_create

    string

    The time when the voice was created.

    target_model

    string

    The omni model that drives the voice:

    • qwen3.5-omni-plus-realtime

    • qwen3.5-omni-flash-realtime

    This must match the model used in the subsequent real-time multimodal API call. Otherwise, synthesis fails.

    request_id

    string

    Request ID.

    count

    integer

    This request is charged based on the actual number of 'Create Voice' operations. The cost for this request is $ .

    Listing voices is free. count is always 0.

  • Sample code

    Important

    model: The voice cloning model. The value is fixed as qwen-voice-enrollment. Do not modify this value.

    cURL

    If you have not set the API key as an environment variable, you must replace $DASHSCOPE_API_KEY in the example with your actual API key.

    # ======= Important =======
    # Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    # API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # === Remove these comments before running ===
    
    curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \
    --header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
    --header 'Content-Type: application/json' \
    --data '{
        "model": "qwen-voice-enrollment",
        "input": {
            "action": "list",
            "page_size": 10,
            "page_index": 0
        }
    }'

    Python

    import os
    import requests
    
    # API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you haven't set an environment variable, replace the following line with: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")
    # Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    
    payload = {
        "model": "qwen-voice-enrollment", # Do not modify
        "input": {
            "action": "list",
            "page_size": 10,
            "page_index": 0
        }
    }
    
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    
    response = requests.post(url, json=payload, headers=headers)
    
    print("HTTP status:", response.status_code)
    
    if response.status_code == 200:
        data = response.json()
        voice_list = data["output"]["voice_list"]
    
        print("Voice list:")
        for item in voice_list:
            print(f"- Voice: {item['voice']}  Created: {item['gmt_create']}  Model: {item['target_model']}")
    else:
        print("Request failed:", response.text)

    Java

    import com.google.gson.Gson;
    import com.google.gson.JsonArray;
    import com.google.gson.JsonObject;
    
    import java.io.BufferedReader;
    import java.io.InputStreamReader;
    import java.io.OutputStream;
    import java.net.HttpURLConnection;
    import java.net.URL;
    
    public class Main {
        public static void main(String[] args) {
            // API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
            // If you haven't set an environment variable, replace the following line with: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
            // Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
            String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
    
            // JSON request body
            String jsonPayload =
                    "{"
                            + "\"model\": \"qwen-voice-enrollment\"," // Do not modify
                            + "\"input\": {"
                            +     "\"action\": \"list\","
                            +     "\"page_size\": 10,"
                            +     "\"page_index\": 0"
                            + "}"
                            + "}";
    
            try {
                HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
                con.setRequestMethod("POST");
                con.setRequestProperty("Authorization", "Bearer " + apiKey);
                con.setRequestProperty("Content-Type", "application/json");
                con.setDoOutput(true);
    
                try (OutputStream os = con.getOutputStream()) {
                    os.write(jsonPayload.getBytes("UTF-8"));
                }
    
                int status = con.getResponseCode();
                BufferedReader br = new BufferedReader(new InputStreamReader(
                        status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8"));
    
                StringBuilder response = new StringBuilder();
                String line;
                while ((line = br.readLine()) != null) {
                    response.append(line);
                }
                br.close();
    
                System.out.println("HTTP status: " + status);
                System.out.println("Response JSON: " + response.toString());
    
                if (status == 200) {
                    Gson gson = new Gson();
                    JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
                    JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list");
    
                    System.out.println("\n Voice list:");
                    for (int i = 0; i < voiceList.size(); i++) {
                        JsonObject voiceItem = voiceList.get(i).getAsJsonObject();
                        String voice = voiceItem.get("voice").getAsString();
                        String gmtCreate = voiceItem.get("gmt_create").getAsString();
                        String targetModel = voiceItem.get("target_model").getAsString();
    
                        System.out.printf("- Voice: %s  Created: %s  Model: %s\n",
                                voice, gmtCreate, targetModel);
                    }
                }
    
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

Delete a voice

Delete a specific voice and release the corresponding quota.

  • URL

    China (Beijing):

    POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

    International (Singapore):

    POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization
  • Request headers

    Parameter

    Type

    Required

    Description

    Authorization

    string

    Supported

    Authentication token in the format Bearer <your_api_key>. Replace <your_api_key> with your actual API key.

    Content-Type

    string

    Supported

    Media type of the request body. Set to application/json.

  • Request body

    The following request body includes all parameters. Optional fields can be omitted as needed.

    Important

    model: The voice cloning model. The value is fixed as qwen-voice-enrollment. Do not modify this value.

    {
        "model": "qwen-voice-enrollment",
        "input": {
            "action": "delete",
            "voice": "yourVoice"
        }
    }
  • Request parameters

    Parameter

    Type

    Default

    Required

    Description

    model

    string

    -

    Supported

    The voice cloning model. Set to qwen-voice-enrollment.

    action

    string

    -

    Supported

    The action type. Set to delete.

    voice

    string

    -

    Supported

    The voice to delete.

  • Response parameters

    View response example

    {
        "usage": {
            "count": 0
        },
        "request_id": "yourRequestId"
    }

    Key response parameters:

    Parameter

    Type

    Description

    request_id

    string

    Request ID.

    count

    integer

    This request is charged based on the actual number of 'Create Voice' operations. The cost for this request is $ .

    Deleting a voice is free. count is always 0.

  • Sample code

    Important

    model: The voice cloning model. The value is fixed as qwen-voice-enrollment. Do not modify this value.

    cURL

    If you have not set the API key as an environment variable, you must replace $DASHSCOPE_API_KEY in the example with your actual API key.

    # ======= Important =======
    # Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    # API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # === Remove these comments before running ===
    
    curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \
    --header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
    --header 'Content-Type: application/json' \
    --data '{
        "model": "qwen-voice-enrollment",
        "input": {
            "action": "delete",
            "voice": "yourVoice"
        }
    }'

    Python

    import os
    import requests
    
    # API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you haven't set an environment variable, replace the following line with: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")
    # Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    
    voice_to_delete = "yourVoice"  # Replace with the actual voice name
    
    payload = {
        "model": "qwen-voice-enrollment", # Do not modify
        "input": {
            "action": "delete",
            "voice": voice_to_delete
        }
    }
    
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }
    
    response = requests.post(url, json=payload, headers=headers)
    
    print("HTTP status:", response.status_code)
    
    if response.status_code == 200:
        data = response.json()
        request_id = data["request_id"]
    
        print(f"Deleted successfully")
        print(f"Request ID: {request_id}")
    else:
        print("Request failed:", response.text)

    Java

    import com.google.gson.Gson;
    import com.google.gson.JsonObject;
    
    import java.io.BufferedReader;
    import java.io.InputStreamReader;
    import java.io.OutputStream;
    import java.net.HttpURLConnection;
    import java.net.URL;
    
    public class Main {
        public static void main(String[] args) {
            // API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
            // If you haven't set an environment variable, replace the following line with: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
            // Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
            String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
            String voiceToDelete = "yourVoice"; // Replace with the actual voice name
    
            // Build JSON request body
            String jsonPayload =
                    "{"
                            + "\"model\": \"qwen-voice-enrollment\"," // Do not modify
                            + "\"input\": {"
                            +     "\"action\": \"delete\","
                            +     "\"voice\": \"" + voiceToDelete + "\""
                            + "}"
                            + "}";
    
            try {
                // Create POST connection
                HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
                con.setRequestMethod("POST");
                con.setRequestProperty("Authorization", "Bearer " + apiKey);
                con.setRequestProperty("Content-Type", "application/json");
                con.setDoOutput(true);
    
                // Send request body
                try (OutputStream os = con.getOutputStream()) {
                    os.write(jsonPayload.getBytes("UTF-8"));
                }
    
                int status = con.getResponseCode();
                BufferedReader br = new BufferedReader(new InputStreamReader(
                        status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8"));
    
                StringBuilder response = new StringBuilder();
                String line;
                while ((line = br.readLine()) != null) {
                    response.append(line);
                }
                br.close();
    
                System.out.println("HTTP status: " + status);
                System.out.println("Response JSON: " + response.toString());
    
                if (status == 200) {
                    Gson gson = new Gson();
                    JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
                    String requestId = jsonObj.get("request_id").getAsString();
    
                    System.out.println("Deleted successfully");
                    System.out.println("Request ID: " + requestId);
                }
    
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

Real-time conversation

To use a cloned voice in a real-time conversation, see the Quick start: from cloning to real-time conversation.

Voice quota and auto-cleanup

Total limit: 1,000 voices per account

The current API doesn't provide a voice count query. Call the API to count your voices.

Auto-cleanup: If a voice hasn't been used in any model invocation request for one year, the system automatically deletes it.

Billing

Voice cloning and model invocation are billed separately:

  • Voice cloning: billed at $0.01/voice. Failed creations are not billed.

    Note

    Free quota (available only for the China site Beijing region and the International site Singapore region):

    • Within 90 days of activating Alibaba Cloud Model Studio, you get 1,000 free voice creations.

    • Failed creations don't consume the free quota.

    • Deleting a voice doesn't restore the free quota.

    • After the free quota is used up or the 90-day period expires, voice creation is billed at $0.01/voice.

  • Real-time conversation with a cloned voice: Billed by token usage for model invocation. For details, see Model invocation pricing.

Copyright and legal compliance

You are responsible for the ownership and legal use of the voice you provide. Read the Service Agreement.

Recording guide

Recording equipment

Use a microphone with noise cancellation, or record with a phone at close range in a quiet environment to keep the audio clean.

Recording environment

Location

  • Record in a small enclosed space of 10 square meters or less.

  • Choose a room with sound-absorbing materials such as acoustic foam, carpets, or curtains.

  • Avoid large open halls, conference rooms, or classrooms where reverberation is high.

Noise control

  • Outdoor noise: Close doors and windows to block traffic, construction, and other external sounds.

  • Indoor noise: Turn off air conditioners, fans, and fluorescent light ballasts.

  • Record ambient sound with your phone and play it back at high volume to identify hidden noise sources.

Reverberation control

  • Reverberation causes audio to sound muffled and reduces clarity.

  • Reduce reflections from smooth surfaces: close curtains, open wardrobe doors, and cover desks and shelves with clothing or blankets.

  • Use irregular objects such as bookshelves and upholstered furniture to create diffuse reflections.

Script preparation

  • Match the script to the target use case. For example, use customer service dialog style for a customer service scenario.

  • Make sure the script doesn't contain sensitive or illegal content (such as political, pornographic, or violent material), as this causes cloning to fail.

  • Avoid short phrases (such as "hello" or "yes"). Use complete sentences.

  • Maintain semantic coherence and avoid frequent pauses when reading. At least 3 consecutive seconds without interruption is recommended.

  • You can convey the target emotion (such as friendly or serious), but avoid overly dramatic or theatrical delivery. Keep the tone natural.

Recommended steps

Using a typical bedroom as an example:

Error messages

If you encounter errors, see Error messagesfor troubleshooting.