Qwen Voice Cloning API Reference - Alibaba Cloud Model Studio - Alibaba Cloud ドキュメントセンター

Audio requirements

High-quality input audio is the foundation for a good cloning result.

Item	Requirement
Supported formats	WAV (16-bit), MP3, M4A
Duration	10 to 20 seconds recommended. Maximum: 60 seconds.
File size	< 10 MB
Sample rate	>= 24 kHz
Channels	Mono
Content	The audio must contain at least 3 seconds of continuous, clear speech with no background sounds. The remaining portion may include brief pauses (<=2 seconds). Avoid background music, noise, or other voices throughout the entire audio. Use normal spoken audio as input. Don't upload songs or singing audio.
Languages	Chinese (zh), English (en), German (de), Italian (it), Portuguese (pt), Spanish (es), Japanese (ja), Korean (ko), French (fr), Russian (ru), Thai (th), Indonesian (id), Arabic (ar), Czech (cs), Danish (da), Dutch (nl), Finnish (fi), Hebrew (he), Hindi (hi), Icelandic (is), Malay (ms), Norwegian (no), Persian (fa), Polish (pl), Swedish (sv), Tagalog (tl), Turkish (tr), Urdu (ur), Vietnamese (vi) Chinese dialects: Dongbei, Shannxi, Sichuan, Henan, Changsha, Tianjin, Hangzhou, Liaoning, Shenyang, Anshan

Quick start: from cloning to real-time conversation

1. Workflow

Voice cloning and real-time conversation are two closely related but independent steps that follow a "create first, then use" workflow:

Create a voice

Call the Create a voice API and upload an audio clip. The system analyzes the audio and creates a custom cloned voice. You must specify target_model in this step to declare which omni model will drive the voice.

If you already have a created voice (call the List voices API to check), skip this step and proceed to the next one.
Use the voice in a real-time conversation

Call the real-time multimodal API and pass in the voice obtained in the previous step. The omni model specified in this step must match the target_model from the previous step.

2. Model configuration and prerequisites

Choose the appropriate models and complete the prerequisites.

Model configuration

Voice cloning requires two models:

Voice cloning model: qwen-voice-enrollment
Omni model that drives the voice:
- qwen3.5-omni-plus-realtime
- qwen3.5-omni-flash-realtime

Prerequisites

Get an API key: Obtain an API key. For security, configure the API key as an environment variable.
Install the SDK: Make sure you have installed the latest DashScope SDK.
Prepare the audio for cloning: The audio must meet the audio requirements.

3. End-to-end example

The following example demonstrates how to use a voice cloned through voice cloning in a real-time conversation to produce output that closely resembles the original voice.

Key principle: When cloning a voice, the target_model (the omni model that drives the voice) must match the model specified in the subsequent real-time multimodal API call. Otherwise, synthesis fails. The example uses a local audio file voice.mp3 for voice cloning. Replace it with your own file when you run the code.

Applicable to the Qwen3.5-Omni-Realtime series models. For more information, see Real-time (Qwen-Omni-Realtime).

Python

# Requirements: dashscope >= 1.23.9, pyaudio
import os
import requests
import base64
import pathlib
import time
import pyaudio
from dashscope.audio.qwen_omni import MultiModality, OmniRealtimeCallback, OmniRealtimeConversation
import dashscope

# ======= Configuration =======
DEFAULT_TARGET_MODEL = "qwen3.5-omni-plus-realtime"  # Must be the same model for cloning and conversation
DEFAULT_PREFERRED_NAME = "guanyu"
DEFAULT_AUDIO_MIME_TYPE = "audio/mpeg"
VOICE_FILE_PATH = "voice.mp3"  # Path to the local audio file for voice cloning


def create_voice(file_path: str,
                 target_model: str = DEFAULT_TARGET_MODEL,
                 preferred_name: str = DEFAULT_PREFERRED_NAME,
                 audio_mime_type: str = DEFAULT_AUDIO_MIME_TYPE) -> str:
    """
    Create a custom voice and return the voice parameter.
    """
    # API keys differ between the Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you haven't set an environment variable, replace the following line with: api_key = "sk-xxx"
    api_key = os.getenv("DASHSCOPE_API_KEY")

    file_path_obj = pathlib.Path(file_path)
    if not file_path_obj.exists():
        raise FileNotFoundError(f"Audio file not found: {file_path}")

    base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
    data_uri = f"data:{audio_mime_type};base64,{base64_str}"

    # Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
    url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
    payload = {
        "model": "qwen-voice-enrollment",
        "input": {
            "action": "create",
            "target_model": target_model,
            "preferred_name": preferred_name,
            "audio": {"data": data_uri}
        }
    }
    headers = {
        "Authorization": f"Bearer {api_key}",
        "Content-Type": "application/json"
    }

    resp = requests.post(url, json=payload, headers=headers)
    if resp.status_code != 200:
        raise RuntimeError(f"Failed to create voice: {resp.status_code}, {resp.text}")

    try:
        return resp.json()["output"]["voice"]
    except (KeyError, ValueError) as e:
        raise RuntimeError(f"Failed to parse voice response: {e}")


class SimpleCallback(OmniRealtimeCallback):
    def __init__(self, pya):
        self.pya = pya
        self.out = None
    def on_open(self):
        self.out = self.pya.open(
            format=pyaudio.paInt16,
            channels=1,
            rate=24000,
            output=True
        )
    def on_event(self, response):
        if response['type'] == 'response.audio.delta':
            self.out.write(base64.b64decode(response['delta']))
        elif response['type'] == 'conversation.item.input_audio_transcription.completed':
            print(f"[User] {response['transcript']}")
        elif response['type'] == 'response.audio_transcript.done':
            print(f"[LLM] {response['transcript']}")


if __name__ == '__main__':
    # If you haven't set an environment variable, replace the following line with: dashscope.api_key = "sk-xxx"
    dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
    # Singapore region URL. For the Beijing region, use: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
    url = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime"

    # Step 1: Clone a voice
    voice = create_voice(VOICE_FILE_PATH)
    print(f"Voice cloning complete. Voice: {voice}")

    # Step 2: Start a real-time conversation with the cloned voice
    pya = pyaudio.PyAudio()
    callback = SimpleCallback(pya)
    conv = OmniRealtimeConversation(model=DEFAULT_TARGET_MODEL, callback=callback, url=url)
    conv.connect()
    conv.update_session(
        output_modalities=[MultiModality.AUDIO, MultiModality.TEXT],
        voice=voice  # Use the cloned voice
    )
    mic = pya.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True)
    print("Conversation started. Speak into your microphone (Ctrl+C to exit)...")
    try:
        while True:
            audio_data = mic.read(3200, exception_on_overflow=False)
            conv.append_audio(base64.b64encode(audio_data).decode())
            time.sleep(0.01)
    except KeyboardInterrupt:
        conv.close()
        mic.close()
        callback.out.close()
        pya.terminate()
        print("\nConversation ended")

Java

import com.alibaba.dashscope.audio.omni.*;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.Gson;
import com.google.gson.JsonObject;

import javax.sound.sampled.*;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.ByteBuffer;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;
import java.util.Base64;
import java.util.Queue;
import java.util.concurrent.ConcurrentLinkedQueue;
import java.util.concurrent.atomic.AtomicBoolean;

public class Main {
    // ===== Constants =====
    // Use the same model for voice cloning and real-time conversation
    private static final String TARGET_MODEL = "qwen3.5-omni-plus-realtime";
    private static final String PREFERRED_NAME = "guanyu";
    // Relative path to the local audio file for voice cloning
    private static final String AUDIO_FILE = "voice.mp3";
    private static final String AUDIO_MIME_TYPE = "audio/mpeg";

    // Generate a data URI
    public static String toDataUrl(String filePath) throws IOException {
        byte[] bytes = Files.readAllBytes(Paths.get(filePath));
        String encoded = Base64.getEncoder().encodeToString(bytes);
        return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
    }

    // Call the API to create a voice
    public static String createVoice() throws Exception {
        // The API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
        // If you haven't configured the environment variable, replace the following line with: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");

        String jsonPayload =
                "{"
                        + "\"model\": \"qwen-voice-enrollment\","
                        + "\"input\": {"
                        +     "\"action\": \"create\","
                        +     "\"target_model\": \"" + TARGET_MODEL + "\","
                        +     "\"preferred_name\": \"" + PREFERRED_NAME + "\","
                        +     "\"audio\": {"
                        +         "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
                        +     "}"
                        + "}"
                        + "}";

        // The following URL is for the Singapore region. To use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        String url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
        HttpURLConnection con = (HttpURLConnection) new URL(url).openConnection();
        con.setRequestMethod("POST");
        con.setRequestProperty("Authorization", "Bearer " + apiKey);
        con.setRequestProperty("Content-Type", "application/json");
        con.setDoOutput(true);

        try (OutputStream os = con.getOutputStream()) {
            os.write(jsonPayload.getBytes(StandardCharsets.UTF_8));
        }

        int status = con.getResponseCode();
        try (BufferedReader br = new BufferedReader(
                new InputStreamReader(status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(),
                        StandardCharsets.UTF_8))) {
            StringBuilder response = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                response.append(line);
            }
            if (status == 200) {
                JsonObject jsonObj = new Gson().fromJson(response.toString(), JsonObject.class);
                return jsonObj.getAsJsonObject("output").get("voice").getAsString();
            }
            throw new IOException("Failed to create voice: " + status + " - " + response);
        }
    }

    // Simple audio player
    static class SimpleAudioPlayer {
        private final SourceDataLine line;
        private final Queue<byte[]> audioQueue = new ConcurrentLinkedQueue<>();
        private final Thread playerThread;
        private final AtomicBoolean shouldStop = new AtomicBoolean(false);

        public SimpleAudioPlayer() throws LineUnavailableException {
            AudioFormat format = new AudioFormat(24000, 16, 1, true, false);
            line = AudioSystem.getSourceDataLine(format);
            line.open(format);
            line.start();
            playerThread = new Thread(() -> {
                while (!shouldStop.get()) {
                    byte[] audio = audioQueue.poll();
                    if (audio != null) {
                        line.write(audio, 0, audio.length);
                    } else {
                        try { Thread.sleep(10); } catch (InterruptedException ignored) {}
                    }
                }
            }, "AudioPlayer");
            playerThread.start();
        }

        public void play(String base64Audio) {
            audioQueue.add(Base64.getDecoder().decode(base64Audio));
        }

        public void close() {
            shouldStop.set(true);
            try { playerThread.join(1000); } catch (InterruptedException ignored) {}
            line.drain();
            line.close();
        }
    }

    public static void main(String[] args) {
        try {
            // 1. Voice cloning: create a custom voice
            String voice = createVoice();
            System.out.println("Voice cloning complete. Voice: " + voice);

            // 2. Use the cloned voice in a real-time conversation
            SimpleAudioPlayer player = new SimpleAudioPlayer();
            AtomicBoolean shouldStop = new AtomicBoolean(false);

            OmniRealtimeParam param = OmniRealtimeParam.builder()
                    .model(TARGET_MODEL)
                    // The API keys for the Singapore and Beijing regions are different. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                    // If you haven't configured the environment variable, replace the following line with: .apikey("sk-xxx")
                    .apikey(System.getenv("DASHSCOPE_API_KEY"))
                    // The following URL is for the Singapore region. To use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/realtime
                    .url("wss://dashscope-intl.aliyuncs.com/api-ws/v1/realtime")
                    .build();

            OmniRealtimeConversation conversation = new OmniRealtimeConversation(param, new OmniRealtimeCallback() {
                @Override public void onOpen() { System.out.println("Connection established"); }
                @Override public void onClose(int code, String reason) {
                    System.out.println("Connection closed (" + code + "): " + reason);
                    shouldStop.set(true);
                }
                @Override public void onEvent(JsonObject event) {
                    String type = event.get("type").getAsString();
                    if ("response.audio.delta".equals(type)) {
                        player.play(event.get("delta").getAsString());
                    } else if ("conversation.item.input_audio_transcription.completed".equals(type)) {
                        System.out.println("[User] " + event.get("transcript").getAsString());
                    } else if ("response.audio_transcript.done".equals(type)) {
                        System.out.println("[LLM] " + event.get("transcript").getAsString());
                    }
                }
            });

            conversation.connect();
            conversation.updateSession(OmniRealtimeConfig.builder()
                    .modalities(Arrays.asList(OmniRealtimeModality.AUDIO, OmniRealtimeModality.TEXT))
                    .voice(voice)  // Use the cloned custom voice
                    .enableTurnDetection(true)
                    .enableInputAudioTranscription(true)
                    .build()
            );

            System.out.println("Conversation started. Speak into the microphone (Ctrl+C to exit)...");
            AudioFormat format = new AudioFormat(16000, 16, 1, true, false);
            TargetDataLine mic = AudioSystem.getTargetDataLine(format);
            mic.open(format);
            mic.start();

            ByteBuffer buffer = ByteBuffer.allocate(3200);
            while (!shouldStop.get()) {
                int bytesRead = mic.read(buffer.array(), 0, buffer.capacity());
                if (bytesRead > 0) {
                    conversation.appendAudio(Base64.getEncoder().encodeToString(buffer.array()));
                }
                Thread.sleep(20);
            }

            conversation.close(1000, "Normal exit");
            player.close();
            mic.close();
            System.out.println("\nConversation ended");
        } catch (NoApiKeyException e) {
            System.err.println("API KEY not found. Set the DASHSCOPE_API_KEY environment variable.");
        } catch (Exception e) {
            e.printStackTrace();
        }
        System.exit(0);
    }
}

API reference

Make sure you use the same account across different APIs.

Create a voice

Upload audio for cloning and create a custom voice.

URL

China (Beijing):

POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

International (Singapore):

POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization

Request headers

Parameter	Type	Required	Description
Authorization	string	Supported	Authentication token in the format `Bearer <your_api_key>`. Replace `<your_api_key>` with your actual API key.
Content-Type	string	Supported	Media type of the request body. Set to `application/json`.

Request body The following request body includes all parameters. Optional fields can be omitted as needed. Distinguish between the following parameters:

Important

model: The voice cloning model. Set to qwen-voice-enrollment.

target_model: The omni model that drives the voice. This must match the model used in the subsequent real-time multimodal API call. Otherwise, synthesis fails.

{
    "model": "qwen-voice-enrollment",
    "input": {
        "action": "create",
        "target_model": "qwen3.5-omni-plus-realtime",
        "preferred_name": "guanyu",
        "audio": {
            "data": "https://xxx.wav"
        },
        "text": "Optional. The transcript of the audio in audio.data.",
        "language": "Optional. The language of the audio in audio.data, such as zh."
    }
}

Request parameters

Parameter

Type

Default

Required

Description

model

string

-

Supported

The voice cloning model. Set to qwen-voice-enrollment.

action

string

-

Supported

The action type. Set to create.

target_model

string

-

Supported

The omni model that drives the voice:

qwen3.5-omni-plus-realtime
qwen3.5-omni-flash-realtime

This must match the model used in the subsequent real-time multimodal API call. Otherwise, synthesis fails.

preferred_name

string

-

Supported

A human-readable name for the voice. Only digits, letters, and underscores are allowed. Maximum: 16 characters. Use a name related to the role or scenario.

This keyword appears in the final voice name. For example, if the keyword is "guanyu", the resulting voice name is "qwen-omni-vc-guanyu-voice-20250812105009984-838b".

audio.data

string

-

Supported

The audio for cloning (follow the Recording guide when recording, and make sure the audio meets the Audio requirements).

Submit audio data in one of the following ways:

Data URL

Format: data:<mediatype>;base64,<data>

<mediatype>: The MIME type
- WAV: audio/wav
- MP3: audio/mpeg
- M4A: audio/mp4
<data>: The Base64-encoded string of the audio

Base64 encoding increases the file size. Keep the original file small enough so the encoded result stays under 10 MB.

Example: data:audio/wav;base64,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4LjI5LjEwMAAAAAAAAAAAAAAA//PAxABQ/BXRbMPe4IQAhl9

View sample code

Python

import base64, pathlib

# input.mp3 is the local audio file for voice cloning. Replace with your own file path. Make sure the file meets the audio requirements.
file_path = pathlib.Path("input.mp3")
base64_str = base64.b64encode(file_path.read_bytes()).decode()
data_uri = f"data:audio/mpeg;base64,{base64_str}"

Java

import java.nio.file.*;
import java.util.Base64;

public class Main {
    /**
     * filePath is the local audio file for voice cloning. Replace with your own file path. Make sure the file meets the audio requirements.
     */
    public static String toDataUrl(String filePath) throws Exception {
        byte[] bytes = Files.readAllBytes(Paths.get(filePath));
        String encoded = Base64.getEncoder().encodeToString(bytes);
        return "data:audio/mpeg;base64," + encoded;
    }

    // Usage example
    public static void main(String[] args) throws Exception {
        System.out.println(toDataUrl("input.mp3"));
    }
}

Audio URL (we recommend uploading your audio to OSS)
- File size must not exceed 10 MB.
- The URL must be publicly accessible without authentication.

text

string

-

Unsupported

The transcript that matches the audio in audio.data.

When this parameter is provided, the server compares the audio against the text. If the difference is too large, an Audio.PreprocessError is returned.

language

string

-

Unsupported

The language of the audio in audio.data.

Supported values: zh (Chinese), en (English), de (German), it (Italian), pt (Portuguese), es (Spanish), ja (Japanese), ko (Korean), fr (French), ru (Russian), th (Thai), id (Indonesian), ar (Arabic), cs (Czech), da (Danish), nl (Dutch), fi (Finnish), he (Hebrew), hi (Hindi), is (Icelandic), ms (Malay), no (Norwegian), fa (Persian), pl (Polish), sv (Swedish), tl (Tagalog), tr (Turkish), ur (Urdu), vi (Vietnamese).

Chinese dialects: Dongbei, Shannxi, Sichuan, Henan, Changsha, Tianjin, Hangzhou, Liaoning, Shenyang, Anshan.

If you use this parameter, set it to the actual language of the audio used for cloning.

Response parameters

View response example

{
    "output": {
        "voice": "yourVoice",
        "target_model": "qwen3.5-omni-plus-realtime"
    },
    "usage": {
        "count": 1
    },
    "request_id": "yourRequestId"
}

Key response parameters:

Parameter

Type

Description

voice

string

The voice name. Use this value directly as the voice parameter in the real-time multimodal API.

target_model

string

The omni model that drives the voice:

qwen3.5-omni-plus-realtime
qwen3.5-omni-flash-realtime

This must match the model used in the subsequent real-time multimodal API call. Otherwise, synthesis fails.

request_id

string

Request ID.

count

integer

The number of "create voice" operations billed for this request. The cost is $ $co u n t \times 0.01$ .

When creating a voice, count is always 1.

Sample code

Important

model: The voice cloning model. Set to qwen-voice-enrollment.

target_model: The omni model that drives the voice. This must match the model used in the subsequent real-time multimodal API call. Otherwise, synthesis fails.

curl

If you have not set the API key as an environment variable, you must replace $DASHSCOPE_API_KEY in the example with your actual API key.

# ======= Important =======
# Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
# API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Remove these comments before running ===

curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \
--header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-voice-enrollment",
    "input": {
        "action": "list",
        "page_size": 10,
        "page_index": 0
    }
}'

python

import os
import requests

# API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you haven't set an environment variable, replace the following line with: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
# Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"

payload = {
    "model": "qwen-voice-enrollment", # Do not modify
    "input": {
        "action": "list",
        "page_size": 10,
        "page_index": 0
    }
}

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print("HTTP status:", response.status_code)

if response.status_code == 200:
    data = response.json()
    voice_list = data["output"]["voice_list"]

    print("Voice list:")
    for item in voice_list:
        print(f"- Voice: {item['voice']}  Created: {item['gmt_create']}  Model: {item['target_model']}")
else:
    print("Request failed:", response.text)

java

import com.google.gson.Gson;
import com.google.gson.JsonArray;
import com.google.gson.JsonObject;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;

public class Main {
    public static void main(String[] args) {
        // API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
        // If you haven't set an environment variable, replace the following line with: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");
        // Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";

        // JSON request body
        String jsonPayload =
                "{"
                        + "\"model\": \"qwen-voice-enrollment\"," // Do not modify
                        + "\"input\": {"
                        +     "\"action\": \"list\","
                        +     "\"page_size\": 10,"
                        +     "\"page_index\": 0"
                        + "}"
                        + "}";

        try {
            HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
            con.setRequestMethod("POST");
            con.setRequestProperty("Authorization", "Bearer " + apiKey);
            con.setRequestProperty("Content-Type", "application/json");
            con.setDoOutput(true);

            try (OutputStream os = con.getOutputStream()) {
                os.write(jsonPayload.getBytes("UTF-8"));
            }

            int status = con.getResponseCode();
            BufferedReader br = new BufferedReader(new InputStreamReader(
                    status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8"));

            StringBuilder response = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                response.append(line);
            }
            br.close();

            System.out.println("HTTP status: " + status);
            System.out.println("Response JSON: " + response.toString());

            if (status == 200) {
                Gson gson = new Gson();
                JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
                JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list");

                System.out.println("\n Voice list:");
                for (int i = 0; i < voiceList.size(); i++) {
                    JsonObject voiceItem = voiceList.get(i).getAsJsonObject();
                    String voice = voiceItem.get("voice").getAsString();
                    String gmtCreate = voiceItem.get("gmt_create").getAsString();
                    String targetModel = voiceItem.get("target_model").getAsString();

                    System.out.printf("- Voice: %s  Created: %s  Model: %s\n",
                            voice, gmtCreate, targetModel);
                }
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

List voices

Query your created voices with pagination.

URL

China (Beijing):

POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

International (Singapore):

POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization

Request headers

Parameter	Type	Required	Description
Authorization	string	Supported	Authentication token in the format `Bearer <your_api_key>`. Replace `<your_api_key>` with your actual API key.
Content-Type	string	Supported	Media type of the request body. Set to `application/json`.

Request parameters

Parameter	Type	Default	Required	Description
model	string	-	Supported	The voice cloning model. Set to `qwen-voice-enrollment`.
action	string	-	Supported	The action type. Set to `list`.
page_index	integer	0	Unsupported	Page number index. Valid values: 0 to 1,000,000.
page_size	integer	10	Unsupported	Number of items per page. Valid values: 0 to 1,000,000.

Response parameters

View response example

{
    "output": {
        "voice_list": [
            {
                "voice": "yourVoice1",
                "gmt_create": "2025-08-11 17:59:32",
                "target_model": "qwen3.5-omni-plus-realtime"
            },
            {
                "voice": "yourVoice2",
                "gmt_create": "2025-08-11 17:38:10",
                "target_model": "qwen3.5-omni-plus-realtime"
            }
        ]
    },
    "usage": {
        "count": 0
    },
    "request_id": "yourRequestId"
}

Key response parameters:

Parameter	Type	Description
voice	string	The voice name. Use this value directly in the `voice` parameter of the real-time multimodal API.
gmt_create	string	The time when the voice was created.
target_model	string	The omni model that drives the voice: qwen3.5-omni-plus-realtime qwen3.5-omni-flash-realtime This must match the model used in the subsequent real-time multimodal API call. Otherwise, synthesis fails.
request_id	string	Request ID.
count	integer	This request is charged based on the actual number of 'Create Voice' operations. The cost for this request is $ $co u n t \times 0.01$ . Listing voices is free. `count` is always 0.

Sample code

Important

model: The voice cloning model. The value is fixed as qwen-voice-enrollment. Do not modify this value.

cURL

If you have not set the API key as an environment variable, you must replace $DASHSCOPE_API_KEY in the example with your actual API key.

# ======= Important =======
# Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
# API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Remove these comments before running ===

curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \
--header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-voice-enrollment",
    "input": {
        "action": "list",
        "page_size": 10,
        "page_index": 0
    }
}'

Python

import os
import requests

# API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you haven't set an environment variable, replace the following line with: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
# Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"

payload = {
    "model": "qwen-voice-enrollment", # Do not modify
    "input": {
        "action": "list",
        "page_size": 10,
        "page_index": 0
    }
}

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print("HTTP status:", response.status_code)

if response.status_code == 200:
    data = response.json()
    voice_list = data["output"]["voice_list"]

    print("Voice list:")
    for item in voice_list:
        print(f"- Voice: {item['voice']}  Created: {item['gmt_create']}  Model: {item['target_model']}")
else:
    print("Request failed:", response.text)

Java

import com.google.gson.Gson;
import com.google.gson.JsonArray;
import com.google.gson.JsonObject;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;

public class Main {
    public static void main(String[] args) {
        // API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
        // If you haven't set an environment variable, replace the following line with: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");
        // Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";

        // JSON request body
        String jsonPayload =
                "{"
                        + "\"model\": \"qwen-voice-enrollment\"," // Do not modify
                        + "\"input\": {"
                        +     "\"action\": \"list\","
                        +     "\"page_size\": 10,"
                        +     "\"page_index\": 0"
                        + "}"
                        + "}";

        try {
            HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
            con.setRequestMethod("POST");
            con.setRequestProperty("Authorization", "Bearer " + apiKey);
            con.setRequestProperty("Content-Type", "application/json");
            con.setDoOutput(true);

            try (OutputStream os = con.getOutputStream()) {
                os.write(jsonPayload.getBytes("UTF-8"));
            }

            int status = con.getResponseCode();
            BufferedReader br = new BufferedReader(new InputStreamReader(
                    status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8"));

            StringBuilder response = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                response.append(line);
            }
            br.close();

            System.out.println("HTTP status: " + status);
            System.out.println("Response JSON: " + response.toString());

            if (status == 200) {
                Gson gson = new Gson();
                JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
                JsonArray voiceList = jsonObj.getAsJsonObject("output").getAsJsonArray("voice_list");

                System.out.println("\n Voice list:");
                for (int i = 0; i < voiceList.size(); i++) {
                    JsonObject voiceItem = voiceList.get(i).getAsJsonObject();
                    String voice = voiceItem.get("voice").getAsString();
                    String gmtCreate = voiceItem.get("gmt_create").getAsString();
                    String targetModel = voiceItem.get("target_model").getAsString();

                    System.out.printf("- Voice: %s  Created: %s  Model: %s\n",
                            voice, gmtCreate, targetModel);
                }
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Delete a voice

Delete a specific voice and release the corresponding quota.

URL

China (Beijing):

POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

International (Singapore):

POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization

Request headers

Parameter	Type	Required	Description
Authorization	string	Supported	Authentication token in the format `Bearer <your_api_key>`. Replace `<your_api_key>` with your actual API key.
Content-Type	string	Supported	Media type of the request body. Set to `application/json`.

Request body

The following request body includes all parameters. Optional fields can be omitted as needed.

Important
model: The voice cloning model. The value is fixed as qwen-voice-enrollment. Do not modify this value.
```
{
    "model": "qwen-voice-enrollment",
    "input": {
        "action": "delete",
        "voice": "yourVoice"
    }
}
```

Request parameters

Parameter	Type	Default	Required	Description
model	string	-	Supported	The voice cloning model. Set to `qwen-voice-enrollment`.
action	string	-	Supported	The action type. Set to `delete`.
voice	string	-	Supported	The voice to delete.

Response parameters

View response example

{
    "usage": {
        "count": 0
    },
    "request_id": "yourRequestId"
}

Key response parameters:

Parameter

Type

Description

request_id

string

Request ID.

count

integer

This request is charged based on the actual number of 'Create Voice' operations. The cost for this request is $ $co u n t \times 0.01$ .

Deleting a voice is free. count is always 0.

Sample code

Important

model: The voice cloning model. The value is fixed as qwen-voice-enrollment. Do not modify this value.

cURL

If you have not set the API key as an environment variable, you must replace $DASHSCOPE_API_KEY in the example with your actual API key.

# ======= Important =======
# Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
# API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Remove these comments before running ===

curl --location --request POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization' \
--header 'Authorization: Bearer $DASHSCOPE_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "qwen-voice-enrollment",
    "input": {
        "action": "delete",
        "voice": "yourVoice"
    }
}'

Python

import os
import requests

# API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you haven't set an environment variable, replace the following line with: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
# Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"

voice_to_delete = "yourVoice"  # Replace with the actual voice name

payload = {
    "model": "qwen-voice-enrollment", # Do not modify
    "input": {
        "action": "delete",
        "voice": voice_to_delete
    }
}

headers = {
    "Authorization": f"Bearer {api_key}",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print("HTTP status:", response.status_code)

if response.status_code == 200:
    data = response.json()
    request_id = data["request_id"]

    print(f"Deleted successfully")
    print(f"Request ID: {request_id}")
else:
    print("Request failed:", response.text)

Java

import com.google.gson.Gson;
import com.google.gson.JsonObject;

import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;

public class Main {
    public static void main(String[] args) {
        // API keys differ between regions. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
        // If you haven't set an environment variable, replace the following line with: String apiKey = "sk-xxx"
        String apiKey = System.getenv("DASHSCOPE_API_KEY");
        // Singapore region URL. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
        String apiUrl = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
        String voiceToDelete = "yourVoice"; // Replace with the actual voice name

        // Build JSON request body
        String jsonPayload =
                "{"
                        + "\"model\": \"qwen-voice-enrollment\"," // Do not modify
                        + "\"input\": {"
                        +     "\"action\": \"delete\","
                        +     "\"voice\": \"" + voiceToDelete + "\""
                        + "}"
                        + "}";

        try {
            // Create POST connection
            HttpURLConnection con = (HttpURLConnection) new URL(apiUrl).openConnection();
            con.setRequestMethod("POST");
            con.setRequestProperty("Authorization", "Bearer " + apiKey);
            con.setRequestProperty("Content-Type", "application/json");
            con.setDoOutput(true);

            // Send request body
            try (OutputStream os = con.getOutputStream()) {
                os.write(jsonPayload.getBytes("UTF-8"));
            }

            int status = con.getResponseCode();
            BufferedReader br = new BufferedReader(new InputStreamReader(
                    status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(), "UTF-8"));

            StringBuilder response = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                response.append(line);
            }
            br.close();

            System.out.println("HTTP status: " + status);
            System.out.println("Response JSON: " + response.toString());

            if (status == 200) {
                Gson gson = new Gson();
                JsonObject jsonObj = gson.fromJson(response.toString(), JsonObject.class);
                String requestId = jsonObj.get("request_id").getAsString();

                System.out.println("Deleted successfully");
                System.out.println("Request ID: " + requestId);
            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Real-time conversation

To use a cloned voice in a real-time conversation, see the Quick start: from cloning to real-time conversation.

Voice quota and auto-cleanup

Total limit: 1,000 voices per account

The current API doesn't provide a voice count query. Call the API to count your voices.

Auto-cleanup: If a voice hasn't been used in any model invocation request for one year, the system automatically deletes it.

Billing

Voice cloning and model invocation are billed separately:

Voice cloning: billed at $0.01/voice. Failed creations are not billed.
Note
Free quota (available only for the China site Beijing region and the International site Singapore region):
- Within 90 days of activating Alibaba Cloud Model Studio, you get 1,000 free voice creations.
- Failed creations don't consume the free quota.
- Deleting a voice doesn't restore the free quota.
- After the free quota is used up or the 90-day period expires, voice creation is billed at $0.01/voice.
Real-time conversation with a cloned voice: Billed by token usage for model invocation. For details, see Model invocation pricing.

Copyright and legal compliance

You are responsible for the ownership and legal use of the voice you provide. Read the Service Agreement.

Recording guide

Recording equipment

Use a microphone with noise cancellation, or record with a phone at close range in a quiet environment to keep the audio clean.

Recording environment

Location

Record in a small enclosed space of 10 square meters or less.
Choose a room with sound-absorbing materials such as acoustic foam, carpets, or curtains.
Avoid large open halls, conference rooms, or classrooms where reverberation is high.

Noise control

Outdoor noise: Close doors and windows to block traffic, construction, and other external sounds.
Indoor noise: Turn off air conditioners, fans, and fluorescent light ballasts.
Record ambient sound with your phone and play it back at high volume to identify hidden noise sources.

Reverberation control

Reverberation causes audio to sound muffled and reduces clarity.
Reduce reflections from smooth surfaces: close curtains, open wardrobe doors, and cover desks and shelves with clothing or blankets.
Use irregular objects such as bookshelves and upholstered furniture to create diffuse reflections.

Script preparation

Match the script to the target use case. For example, use customer service dialog style for a customer service scenario.
Make sure the script doesn't contain sensitive or illegal content (such as political, pornographic, or violent material), as this causes cloning to fail.
Avoid short phrases (such as "hello" or "yes"). Use complete sentences.
Maintain semantic coherence and avoid frequent pauses when reading. At least 3 consecutive seconds without interruption is recommended.
You can convey the target emotion (such as friendly or serious), but avoid overly dramatic or theatrical delivery. Keep the tone natural.

Recommended steps

Using a typical bedroom as an example:

Close doors and windows to block external noise.
Turn off air conditioners, fans, and other appliances.
Close curtains to reduce glass reflections.
Cover the desk surface with clothing or a blanket to reduce desktop reflections.
Familiarize yourself with the script, set the character's tone, and deliver naturally.
Maintain about 10 cm distance from the recording device to avoid plosive distortion or weak signal.

Error messages

If you encounter errors, see Error messagesfor troubleshooting.