CosyVoice voice cloning API - Alibaba Cloud Model Studio

The CosyVoice voice cloning service uses a generative large voice model to create a highly similar and natural-sounding custom voice from an audio sample of only 10 to 20 seconds, without the need of traditional training. Voice cloning and speech synthesis are two sequential steps. This document describes the parameters and API details for voice cloning. For speech synthesis, see Real-time speech synthesis - CosyVoice.

Important

This document applies only to the China (Beijing) region. To use the model, you must use an API key from the China (Beijing) region.

User guide: For an introduction to the models and how to select them, see Real-time speech synthesis - CosyVoice.

Audio requirements

High-quality input audio is crucial for high-quality cloning results.

Item	Requirement
Supported formats	WAV (16-bit), MP3, M4A
Audio duration	Recommended: 10 to 20 seconds. Maximum: 60 seconds.
File size	≤ 10 MB
Sample rate	≥ 16 kHz
Sound channel	Mono / Stereo
Content	The audio must contain at least 5 seconds of continuous, clear reading without background sound. The rest of the audio can only have short pauses (≤ 2 seconds). The entire audio clip must be free of background music, noise, or other voices to ensure the quality of the core reading content. Use normal speech audio as input. Do not upload songs or singing audio to ensure the accuracy and usability of the cloned voice.
Language	Varies depending on the speech synthesis model that drives the voice (specified by the `target_model`/`targetModel` parameter): cosyvoice-v2: Chinese (Mandarin), English cosyvoice-v3-flash, cosyvoice-v3-plus: Chinese (Mandarin, Cantonese, Northeastern Mandarin, Gansu dialect, Guizhou dialect, Henan dialect, Hubei dialect, Jiangxi dialect, Minnan, Ningxia dialect, Shanxi dialect, Shaanxi dialect, Shandong dialect, Shanghainese, Sichuanese, Tianjin dialect, Yunnan dialect), English, French, German, Japanese, Korean, Russian

Getting started: From cloning to synthesis

1. Workflow

Voice cloning and speech synthesis are two closely related but separate steps. They follow a "create first, then use" flow:

Create a voice
Call the Create voice API and upload an audio clip. The system analyzes the audio and creates a unique cloned voice. In this step, you must specify target_model / targetModel to select the speech synthesis model that will be used with the created voice.
If you already have a created voice, skip this step. Call the Query voice list API to check for existing voices.
Use the voice for speech synthesis
Call the speech synthesis API and pass the voice obtained in the previous step. The speech synthesis model specified in this step must be the same as the target_model / targetModel specified in the previous step.

2. Model configuration and preparations

Select the appropriate models and complete the required preparations.

Model configuration

When cloning a voice, you need to specify the following two models:

Voice cloning model: voice-enrollment

Speech synthesis model to be used with the voice:

For the best results, use cosyvoice-v3-plus if your resources and budget allow.

Version	Scenarios
cosyvoice-v3-plus	For the best sound quality and expressiveness with a sufficient budget.
cosyvoice-v3-flash	For a balance between performance and cost, offering high value.
cosyvoice-v2	For compatibility with older versions or low-requirement scenarios.

Preparations

Obtain an API key: Obtain and configure an API key. For security, configure the API key as an environment variable.
Install the SDK: Make sure you have installed the latest version of the DashScope SDK.
Prepare the audio URL: Upload an audio file that meets the audio requirements to a publicly accessible location, such as Object Storage Service (OSS). Make sure the URL is publicly accessible.

3. End-to-end example: From cloning to synthesis

The following example shows how to use a custom voice generated by voice cloning in speech synthesis to produce an output that closely resembles the original voice.

Key principle: When cloning a voice, the target_model (the speech synthesis model to be used with the voice) must be the same as the speech synthesis model specified when you call the speech synthesis API. Otherwise, the synthesis will fail.
Remember to replace AUDIO_URL in the example with your actual audio URL.

import os
import time
import dashscope
from dashscope.audio.tts_v2 import VoiceEnrollmentService, SpeechSynthesizer

# 1. Prepare the environment
# Configure the API key as an environment variable.
# export DASHSCOPE_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
if not dashscope.api_key:
    raise ValueError("DASHSCOPE_API_KEY environment variable not set.")

# 2. Define cloning parameters
TARGET_MODEL = "cosyvoice-v3-plus" 
# Give the voice a meaningful prefix.
VOICE_PREFIX = "myvoice" # Only digits and lowercase letters are allowed, less than 10 characters.
# Publicly accessible audio URL
AUDIO_URL = "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/cosyvoice/cosyvoice-zeroshot-sample.wav" # Example URL, replace it with your own.

# 3. Create the voice (asynchronous task)
print("--- Step 1: Creating voice enrollment ---")
service = VoiceEnrollmentService()
try:
    voice_id = service.create_voice(
        target_model=TARGET_MODEL,
        prefix=VOICE_PREFIX,
        url=AUDIO_URL
    )
    print(f"Voice enrollment submitted successfully. Request ID: {service.get_last_request_id()}")
    print(f"Generated Voice ID: {voice_id}")
except Exception as e:
    print(f"Error during voice creation: {e}")
    raise e
# 4. Poll for the voice status
print("\n--- Step 2: Polling for voice status ---")
max_attempts = 30
poll_interval = 10 # seconds
for attempt in range(max_attempts):
    try:
        voice_info = service.query_voice(voice_id=voice_id)
        status = voice_info.get("status")
        print(f"Attempt {attempt + 1}/{max_attempts}: Voice status is '{status}'")
        
        if status == "OK":
            print("Voice is ready for synthesis.")
            break
        elif status == "UNDEPLOYED":
            print(f"Voice processing failed with status: {status}. Please check audio quality or contact support.")
            raise RuntimeError(f"Voice processing failed with status: {status}")
        # For intermediate statuses such as "DEPLOYING", continue to wait.
        time.sleep(poll_interval)
    except Exception as e:
        print(f"Error during status polling: {e}")
        time.sleep(poll_interval)
else:
    print("Polling timed out. The voice is not ready after several attempts.")
    raise RuntimeError("Polling timed out. The voice is not ready after several attempts.")

# 5. Synthesize speech using the cloned voice
print("\n--- Step 3: Synthesizing speech with the new voice ---")
try:
    synthesizer = SpeechSynthesizer(model=TARGET_MODEL, voice=voice_id)
    text_to_synthesize = "Congratulations, you have successfully cloned and synthesized your own voice!"
    
    # The call() method returns binary audio data.
    audio_data = synthesizer.call(text_to_synthesize)
    print(f"Speech synthesis successful. Request ID: {synthesizer.get_last_request_id()}")

    # 6. Save the audio file
    output_file = "my_custom_voice_output.mp3"
    with open(output_file, "wb") as f:
        f.write(audio_data)
    print(f"Audio saved to {output_file}")

except Exception as e:
    print(f"Error during speech synthesis: {e}")

API reference

When using different APIs, ensure all operations are performed using the same account.

Create voice

Upload an audio file for cloning to create a custom voice.

Python SDK

API description

def create_voice(self, target_model: str, prefix: str, url: str, language_hints: List[str] = None) -> str:
    '''
    Creates a new custom voice.
    param: target_model The speech synthesis model that drives the voice. This must be the same as the speech synthesis model used when you later call the speech synthesis API. Otherwise, synthesis will fail. Recommended models are cosyvoice-v3-flash or cosyvoice-v3-plus.
    param: prefix A user-friendly name for the voice (only digits, uppercase and lowercase letters, and underscores are allowed, up to 10 characters). Use an identifier related to the role or scenario. This keyword appears in the cloned voice name. The generated voice name format is: model_name-prefix-unique_identifier, for example, cosyvoice-v3-plus-myvoice-xxxxxxxx.
    param: url The URL of the audio file for voice cloning. The URL must be publicly accessible.
    param: language_hints Specifies the target language to help improve synthesis accuracy. This parameter is effective for English, French, German, Japanese, Korean, and Russian (Chinese does not need to be specified). Currently, only one language can be selected. Valid values: en, fr, de, ja, ko, ru.
    return: voice_id The voice ID. It can be directly used for the voice parameter in the speech synthesis API.
    '''

Important

target_model: The speech synthesis model to be used with the voice. This must be the same as the speech synthesis model used when you call the speech synthesis API. Otherwise, the synthesis will fail.
language_hints: This parameter applies only to the cosyvoice-v3-flash and cosyvoice-v3-plus models.

Request example

from dashscope.audio.tts_v2 import VoiceEnrollmentService

service = VoiceEnrollmentService()

# Avoid frequent calls. Each call creates a new voice. You cannot create more voices after you reach the quota limit.
voice_id = service.create_voice(
    target_model='cosyvoice-v3-plus',
    prefix='myvoice',
    url='https://your-audio-file-url',
    language_hints=['en']
)

print(f"Request ID: {service.get_last_request_id()}")
print(f"Voice ID: {voice_id}")

Java SDK

API description

/**
 * Creates a new custom voice.
 *
 * @param targetModel The speech synthesis model that drives the voice. This must be the same as the speech synthesis model used when you later call the speech synthesis API. Otherwise, synthesis will fail. Recommended models are cosyvoice-v3-flash or cosyvoice-v3-plus.
 * @param prefix A user-friendly name for the voice (only digits, uppercase and lowercase letters, and underscores are allowed, up to 10 characters). Use an identifier related to the role or scenario. This keyword appears in the cloned voice name. The generated voice name format is: model_name-prefix-unique_identifier, for example, cosyvoice-v3-plus-myvoice-xxxxxxxx.
 * @param url The URL of the audio file for voice cloning. The URL must be publicly accessible.
 * @param customParam Custom parameters. You can specify the target language (languageHints) here to help improve synthesis accuracy. This parameter is effective for English, French, German, Japanese, Korean, and Russian (Chinese does not need to be specified). Currently, only one language can be selected. Valid values: en, fr, de, ja, ko, ru.
 * @return Voice The newly created voice. You can get the voice ID using the getVoiceId method of the Voice object. The ID can be directly used for the voice parameter in the speech synthesis API.
 * @throws NoApiKeyException if the API key is empty.
 * @throws InputRequiredException if a required parameter is empty.
 */
public Voice createVoice(String targetModel, String prefix, String url, VoiceEnrollmentParam customParam) throws NoApiKeyException, InputRequiredException

Important

targetModel: The speech synthesis model to be used with the voice. This must be the same as the speech synthesis model used when you call the speech synthesis API. Otherwise, the synthesis will fail.
languageHints: This parameter applies only to the cosyvoice-v3-flash and cosyvoice-v3-plus models.

Request example

import com.alibaba.dashscope.audio.ttsv2.enrollment.Voice;
import com.alibaba.dashscope.audio.ttsv2.enrollment.VoiceEnrollmentParam;
import com.alibaba.dashscope.audio.ttsv2.enrollment.VoiceEnrollmentService;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.Collections;

public class Main {
    private static final Logger logger = LoggerFactory.getLogger(Main.class);

    public static void main(String[] args) {
        String apiKey = System.getenv("DASHSCOPE_API_KEY");
        String targetModel = "cosyvoice-v3-plus";
        String prefix = "myvoice";
        String fileUrl = "https://your-audio-file-url";
        String cloneModelName = "voice-enrollment";

        try {
            VoiceEnrollmentService service = new VoiceEnrollmentService(apiKey);
            Voice myVoice = service.createVoice(
                    targetModel,
                    prefix,
                    fileUrl,
                    VoiceEnrollmentParam.builder()
                    .model(cloneModelName)
                    .languageHints(Collections.singletonList("en")).build());

            logger.info("Voice creation submitted. Request ID: {}", service.getLastRequestId());
            logger.info("Generated Voice ID: {}", myVoice.getVoiceId());
        } catch (Exception e) {
            logger.error("Failed to create voice", e);
        }
    }
}

RESTful API

Basic information

URL	`https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization`
Request method	POST
Request headers	`Authorization: Bearer {api-key} // Replace with your API key Content-Type: application/json`
Message body	The message body that contains all request parameters is as follows. You can omit optional fields as needed: Important `model`: The voice cloning model. The value is fixed to `voice-enrollment`. `target_model`: The speech synthesis model that drives the voice. This must be the same as the speech synthesis model used when you later call the speech synthesis API. Otherwise, synthesis will fail. `language_hints`: This parameter applies only to the cosyvoice-v3-flash and cosyvoice-v3-plus models. `{ "model": "voice-enrollment", "input": { "action": "create_voice", "target_model": "cosyvoice-v3-plus", "prefix": "myvoice", "url": "https://yourAudioFileUrl", "language_hints": ["en"] } }`

Request parameters

Click to view a request example

Important

model: The voice cloning model. The value is fixed to voice-enrollment.
target_model: The speech synthesis model that drives the voice. This must be the same as the speech synthesis model used when you later call the speech synthesis API. Otherwise, synthesis will fail.
language_hints: This parameter applies only to the cosyvoice-v3-flash and cosyvoice-v3-plus models.

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "voice-enrollment",
    "input": {
        "action": "create_voice",
        "target_model": "cosyvoice-v3-plus",
        "prefix": "myvoice",
        "url": "https://yourAudioFileUrl",
        "language_hints": ["en"]
    }
}'

Parameter	Type	Default value	Required	Description
model	string	-	Yes	The voice cloning model. The value is fixed to `voice-enrollment`.
action	string	-	Yes	The operation type. The value is fixed to `create_voice`.
target_model	string	-	Yes	The speech synthesis model that drives the voice. Recommended models are cosyvoice-v3-flash or cosyvoice-v3-plus. This must be the same as the speech synthesis model used when you later call the speech synthesis API. Otherwise, synthesis will fail.
prefix	string	-	Yes	A user-friendly name for the voice (only digits, uppercase and lowercase letters, and underscores are allowed, up to 10 characters). Use an identifier related to the role or scenario. This keyword appears in the cloned voice name. The generated voice name format is: `model_name-prefix-unique_identifier`, for example, `cosyvoice-v3-plus-myvoice-xxxxxxxx`.
url	string	-	Yes	The URL of the audio file for voice cloning. The URL must be publicly accessible.
language_hints	array[string]	-	No	Specifies the target language to help improve synthesis accuracy. This parameter is effective for English, French, German, Japanese, Korean, and Russian (Chinese does not need to be specified). Currently, only one language can be selected. This parameter applies only to the cosyvoice-v3-flash and cosyvoice-v3-plus models. Valid values: en: English fr: French de: German ja: Japanese ko: Korean ru: Russian

Response parameters

Click to view a response example

{
    "output": {
        "voice_id": "yourVoiceId"
    },
    "usage": {
        "count": 1
    },
    "request_id": "yourRequestId"
}

Parameter	Type	Description
voice_id	string	The voice ID. It can be directly used for the `voice` parameter in the speech synthesis API.

Query voice list

Retrieve a list of created voices using a paged query.

Python SDK

API description

def list_voices(self, prefix=None, page_index: int = 0, page_size: int = 10) -> List[dict]:
    '''
    Queries all created voices.
    param: prefix The custom prefix for the voice. Only digits and lowercase letters are allowed, up to 10 characters.
    param: page_index The page index for the query.
    param: page_size The page size for the query.
    return: List[dict] A list of voices. It includes the ID, creation time, modification time, and status of each voice. Format: [{'gmt_create': '2025-10-09 14:51:01', 'gmt_modified': '2025-10-09 14:51:07', 'status': 'OK', 'voice_id': 'cosyvoice-v3-myvoice-xxx'}]
    There are three voice statuses:
        DEPLOYING: Under review
        OK: Approved and ready to use
        UNDEPLOYED: Not approved and cannot be used
    '''

Request example

from dashscope.audio.tts_v2 import VoiceEnrollmentService

service = VoiceEnrollmentService()

# Filter by prefix, or set to None to query all.
voices = service.list_voices(prefix='myvoice', page_index=0, page_size=10)

print(f"Request ID: {service.get_last_request_id()}")
print(f"Found voices: {voices}")

Response example

[
    {
        "gmt_create": "2024-09-13 11:29:41",
        "voice_id": "yourVoiceId",
        "gmt_modified": "2024-09-13 11:29:41",
        "status": "OK"
    },
    {
        "gmt_create": "2024-09-13 13:22:38",
        "voice_id": "yourVoiceId",
        "gmt_modified": "2024-09-13 13:22:38",
        "status": "OK"
    }
]

Response parameters

Parameter	Type	Description
voice_id	string	The voice ID.
gmt_create	string	The time when the voice was created.
gmt_modified	string	The time when the voice was modified.
status	string	The voice status: DEPLOYING: Under review OK: Approved and ready to use UNDEPLOYED: Not approved and cannot be used

Java SDK

API description

// There are three voice statuses:
//        DEPLOYING: Under review
//        OK: Approved and ready to use
//        UNDEPLOYED: Not approved and cannot be used
/**
 * Queries all created voices. The default page index is 0, and the default page size is 10.
 *
 * @param prefix The custom prefix for the voice. Only digits and lowercase letters are allowed, up to 10 characters. Can be null.
 * @return Voice[] An array of Voice objects. The Voice object encapsulates the voice ID, creation time, modification time, and status.
 * @throws NoApiKeyException if the API key is empty.
 * @throws InputRequiredException if a required parameter is empty.
 */
public Voice[] listVoice(String prefix) throws NoApiKeyException, InputRequiredException 

/**
 * Queries all created voices.
 *
 * @param prefix The custom prefix for the voice. Only digits and lowercase letters are allowed, up to 10 characters.
 * @param pageIndex The page index for the query.
 * @param pageSize The page size for the query.
 * @return Voice[] An array of Voice objects. The Voice object encapsulates the voice ID, creation time, modification time, and status.
 * @throws NoApiKeyException if the API key is empty.
 * @throws InputRequiredException if a required parameter is empty.
 */
public Voice[] listVoice(String prefix, int pageIndex, int pageSize) throws NoApiKeyException, InputRequiredException

Request example

You must import the third-party library com.google.gson.Gson.

import com.alibaba.dashscope.audio.ttsv2.enrollment.Voice;
import com.alibaba.dashscope.audio.ttsv2.enrollment.VoiceEnrollmentService;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.Gson;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class Main {
    public static String apiKey = System.getenv("DASHSCOPE_API_KEY");  // If you have not configured an environment variable, replace this with your API key.
    private static String prefix = "myvoice"; // Replace this with your actual value.
    private static final Logger logger = LoggerFactory.getLogger(Main.class);

    public static void main(String[] args)
            throws NoApiKeyException, InputRequiredException {
        VoiceEnrollmentService service = new VoiceEnrollmentService(apiKey);
        // Query voices.
        Voice[] voices = service.listVoice(prefix, 0, 10);
        logger.info("List successful. Request ID: {}", service.getLastRequestId());
        logger.info("Voices Details: {}", new Gson().toJson(voices));
    }
}

Response example

[
    {
        "gmt_create": "2024-09-13 11:29:41",
        "voice_id": "yourVoiceId",
        "gmt_modified": "2024-09-13 11:29:41",
        "status": "OK"
    },
    {
        "gmt_create": "2024-09-13 13:22:38",
        "voice_id": "yourVoiceId",
        "gmt_modified": "2024-09-13 13:22:38",
        "status": "OK"
    }
]

Response parameters

Parameter	Type	Description
voice_id	string	The voice ID.
gmt_create	string	The time when the voice was created.
gmt_modified	string	The time when the voice was modified.
status	string	The voice status: DEPLOYING: Under review OK: Approved and ready to use UNDEPLOYED: Not approved and cannot be used

RESTful API

Basic information

URL	`https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization`
Request method	POST
Request headers	`Authorization: Bearer {api-key} // Replace with your API key Content-Type: application/json`
Message body	The message body that contains all request parameters is as follows. You can omit optional fields as needed: Important The `model` is the voice cloning model. The value is fixed to `voice-enrollment`. `{ "model": "voice-enrollment", "input": { "action": "list_voice", "prefix": "myvoice", "page_index": 0, "page_size": 10 } }`

Request parameters

Click to view a request example

Important

The model is the voice cloning model. The value is fixed to voice-enrollment.

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "voice-enrollment",
    "input": {
        "action": "list_voice",
        "prefix": "myvoice",
        "page_index": 0,
        "page_size": 10
    }
}'

Parameter	Type	Default value	Required	Description
model	string	-	Yes	The voice cloning model. The value is fixed to `voice-enrollment`.
action	string	-	Yes	The operation type. The value is fixed to `list_voice`.
prefix	string	null	No	The custom prefix for the voice. Only digits and lowercase letters are allowed, up to 10 characters.
page_index	integer	0	No	The page number index, starting from 0.
page_size	integer	10	No	The number of data entries per page.

Response parameters

Click to view a response example

{
    "output": {
        "voice_list": [
            {
                "gmt_create": "2024-12-11 13:38:02",
                "voice_id": "yourVoiceId",
                "gmt_modified": "2024-12-11 13:38:02",
                "status": "OK"
            }
        ]
    },
    "usage": {
        "count": 1
    },
    "request_id": "yourRequestId"
}

Parameter	Type	Description
voice_id	string	The voice ID.
gmt_create	string	The time when the voice was created.
gmt_modified	string	The time when the voice was modified.
status	string	The voice status: DEPLOYING: Under review OK: Approved and ready to use UNDEPLOYED: Not approved and cannot be used

Query a specific voice

Retrieves the details of a specific voice.

Python SDK

API description

def query_voice(self, voice_id: str) -> List[str]:
    '''
    Queries the details of a specific voice.
    param: voice_id The ID of the voice to query.
    return: List[str] The voice details, including status, creation time, audio link, and more.
    '''

Request example

from dashscope.audio.tts_v2 import VoiceEnrollmentService

service = VoiceEnrollmentService()
voice_id = 'cosyvoice-v3-plus-myvoice-xxxxxxxx'

voice_details = service.query_voice(voice_id=voice_id)

print(f"Request ID: {service.get_last_request_id()}")
print(f"Voice Details: {voice_details}")

Response example

{
    "gmt_create": "2024-09-13 11:29:41",
    "resource_link": "https://yourAudioFileUrl",
    "target_model": "cosyvoice-v3-plus",
    "gmt_modified": "2024-09-13 11:29:41",
    "status": "OK"
}

Response parameters

Parameter	Type	Description
resource_link	string	The URL of the audio that was cloned.
target_model	string	The speech synthesis model that drives the voice. Recommended models are cosyvoice-v3-flash or cosyvoice-v3-plus. This must be the same as the speech synthesis model used when you later call the speech synthesis API. Otherwise, synthesis will fail.
gmt_create	string	The time when the voice was created.
gmt_modified	string	The time when the voice was modified.
status	string	The voice status: DEPLOYING: Under review OK: Approved and ready to use UNDEPLOYED: Not approved and cannot be used

Java SDK

API description

/**
 * Queries the details of a specific voice.
 *
 * @param voiceId The ID of the voice to query.
 * @return Voice The voice details, including status, creation time, audio link, and more.
 * @throws NoApiKeyException if the API key is empty.
 * @throws InputRequiredException if a required parameter is empty.
 */
public Voice queryVoice(String voiceId) throws NoApiKeyException, InputRequiredException

Request example

You must import the third-party library com.google.gson.Gson.

import com.alibaba.dashscope.audio.ttsv2.enrollment.Voice;
import com.alibaba.dashscope.audio.ttsv2.enrollment.VoiceEnrollmentService;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.Gson;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class Main {
    public static String apiKey = System.getenv("DASHSCOPE_API_KEY");  // If you have not configured an environment variable, replace this with your API key.
    private static String voiceId = "cosyvoice-v3-plus-myvoice-xxx"; // Replace this with your actual value.
    private static final Logger logger = LoggerFactory.getLogger(Main.class);

    public static void main(String[] args)
            throws NoApiKeyException, InputRequiredException {
        VoiceEnrollmentService service = new VoiceEnrollmentService(apiKey);
        Voice voice = service.queryVoice(voiceId);
        
        logger.info("Query successful. Request ID: {}", service.getLastRequestId());
        logger.info("Voice Details: {}", new Gson().toJson(voice));
    }
}

Response example

{
    "gmt_create": "2024-09-13 11:29:41",
    "resource_link": "https://yourAudioFileUrl",
    "target_model": "cosyvoice-v3-plus",
    "gmt_modified": "2024-09-13 11:29:41",
    "status": "OK"
}

Response parameters

Parameter	Type	Description
resource_link	string	The URL of the audio that was cloned.
target_model	string	The speech synthesis model that drives the voice. Recommended models are cosyvoice-v3-flash or cosyvoice-v3-plus. This must be the same as the speech synthesis model used when you later call the speech synthesis API. Otherwise, synthesis will fail.
gmt_create	string	The time when the voice was created.
gmt_modified	string	The time when the voice was modified.
status	string	The voice status: DEPLOYING: Under review OK: Approved and ready to use UNDEPLOYED: Not approved and cannot be used

RESTful API

Basic information

URL	`https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization`
Request method	POST
Request headers	`Authorization: Bearer {api-key} // Replace with your API key Content-Type: application/json`
Message body	The message body that contains all request parameters is as follows. You can omit optional fields as needed: Important The `model` is the voice cloning model. The value is fixed to `voice-enrollment`. `{ "model": "voice-enrollment", "input": { "action": "query_voice", "voice_id": "yourVoiceId" } }`

Request parameters

Click to view a request example

Important

The model is the voice cloning model. The value is fixed to voice-enrollment.

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "voice-enrollment",
    "input": {
        "action": "query_voice",
        "voice_id": "yourVoiceId"
    }
}'

Parameter	Type	Default value	Required	Description
model	string	-	Yes	The voice cloning model. The value is fixed to `voice-enrollment`.
action	string	-	Yes	The operation type. The value is fixed to `query_voice`.
voice_id	string	-	Yes	The ID of the voice to query.

Response parameters

Click to view a response example

{
    "output": {
        "gmt_create": "2024-12-11 13:38:02",
        "resource_link": "https://yourAudioFileUrl",
        "target_model": "cosyvoice-v3-plus",
        "gmt_modified": "2024-12-11 13:38:02",
        "status": "OK"
    },
    "usage": {
        "count": 1
    },
    "request_id": "2450f969-d9ea-9483-bafc-************"
}

Parameter	Type	Description
resource_link	string	The URL of the audio that was cloned.
target_model	string	The speech synthesis model that drives the voice. Recommended models are cosyvoice-v3-flash or cosyvoice-v3-plus. This must be the same as the speech synthesis model used when you later call the speech synthesis API. Otherwise, synthesis will fail.
gmt_create	string	The time when the voice was created.
gmt_modified	string	The time when the voice was modified.
status	string	The voice status: DEPLOYING: Under review OK: Approved and ready to use UNDEPLOYED: Not approved and cannot be used

Update a voice

Updates an existing voice with a new audio file.

Python SDK

API description

def update_voice(self, voice_id: str, url: str) -> None:
    '''
    Updates a voice.
    param: voice_id The ID of the voice.
    param: url The URL of the audio file for voice cloning.
    '''

Request example

from dashscope.audio.tts_v2 import VoiceEnrollmentService

service = VoiceEnrollmentService()
service.update_voice(
    voice_id='cosyvoice-v3-plus-myvoice-xxxxxxxx',
    url='https://your-new-audio-file-url'
)
print(f"Update submitted. Request ID: {service.get_last_request_id()}")

Java SDK

API description

/**
 * Updates a voice.
 *
 * @param voiceId The voice to update.
 * @param url The URL of the audio file for voice cloning.
 * @throws NoApiKeyException if the API key is empty.
 * @throws InputRequiredException if a required parameter is empty.
 */
public void updateVoice(String voiceId, String url)
    throws NoApiKeyException, InputRequiredException

Request example

import com.alibaba.dashscope.audio.ttsv2.enrollment.VoiceEnrollmentService;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class Main {
    public static String apiKey = System.getenv("DASHSCOPE_API_KEY");  // If you have not configured an environment variable, replace this with your API key.
    private static String fileUrl = "https://your-audio-file-url";  // Replace this with your actual value.
    private static String voiceId = "cosyvoice-v3-plus-myvoice-xxx"; // Replace this with your actual value.
    private static final Logger logger = LoggerFactory.getLogger(Main.class);
    
    public static void main(String[] args)
            throws NoApiKeyException, InputRequiredException {
        VoiceEnrollmentService service = new VoiceEnrollmentService(apiKey);
        // Update the voice.
        service.updateVoice(voiceId, fileUrl);
        logger.info("Update submitted. Request ID: {}", service.getLastRequestId());
    }
}

RESTful API

Basic information

URL	`https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization`
Request method	POST
Request headers	`Authorization: Bearer {api-key} // Replace with your API key Content-Type: application/json`
Message body	The message body that contains all request parameters is as follows. You can omit optional fields as needed: Important The `model` is the voice cloning model. The value is fixed to `voice-enrollment`. `{ "model": "voice-enrollment", "input": { "action": "update_voice", "voice_id": "yourVoiceId", "url": "https://yourAudioFileUrl" } }`

Request parameters

Click to view a request example

Important

The model is the voice cloning model. The value is fixed to voice-enrollment.

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "voice-enrollment",
    "input": {
        "action": "update_voice",
        "voice_id": "yourVoiceId",
        "url": "https://yourAudioFileUrl"
    }
}'

Parameter	Type	Default value	Required	Description
model	string	-	Yes	The voice cloning model. The value is fixed to `voice-enrollment`.
action	string	-	Yes	The operation type. The value is fixed to `update_voice`.
voice_id	string	-	Yes	The ID of the voice to update.
url	string	-	Yes	The URL of the audio file to update the voice. The URL must be publicly accessible. For information about how to record audio, see Recording guide.

Click to view a response example

{
    "output": {},
    "usage": {
        "count": 1
    },
    "request_id": "yourRequestId"
}

Delete a voice

Deletes a voice that is no longer needed to free up your quota. This operation is irreversible.

Python SDK

API description

def delete_voice(self, voice_id: str) -> None:
    '''
    Deletes a voice.
    param: voice_id The voice to delete.
    '''

Request example

from dashscope.audio.tts_v2 import VoiceEnrollmentService

service = VoiceEnrollmentService()
service.delete_voice(voice_id='cosyvoice-v3-plus-myvoice-xxxxxxxx')
print(f"Deletion submitted. Request ID: {service.get_last_request_id()}")

Java SDK

API description

/**
 * Deletes a voice.
 *
 * @param voiceId The voice to delete.
 * @throws NoApiKeyException if the API key is empty.
 * @throws InputRequiredException if a required parameter is empty.
 */
public void deleteVoice(String voiceId) throws NoApiKeyException, InputRequiredException

Request example

import com.alibaba.dashscope.audio.ttsv2.enrollment.VoiceEnrollmentService;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class Main {
    public static String apiKey = System.getenv("DASHSCOPE_API_KEY");  // If you have not configured an environment variable, replace this with your API key.
    private static String voiceId = "cosyvoice-v3-plus-myvoice-xxx"; // Replace this with your actual value.
    private static final Logger logger = LoggerFactory.getLogger(Main.class);
    
    public static void main(String[] args)
            throws NoApiKeyException, InputRequiredException {
        VoiceEnrollmentService service = new VoiceEnrollmentService(apiKey);
        // Delete the voice.
        service.deleteVoice(voiceId);
        logger.info("Deletion submitted. Request ID: {}", service.getLastRequestId());
    }
}

RESTful API

Basic information

URL	`https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization`
Request method	POST
Request headers	`Authorization: Bearer {api-key} // Replace with your API key Content-Type: application/json`
Message body	The message body that contains all request parameters is as follows. You can omit optional fields as needed: Important The `model` is the voice cloning model. The value is fixed to `voice-enrollment`. `{ "model": "voice-enrollment", "input": { "action": "delete_voice", "voice_id": "yourVoiceId" } }`

Request parameters

Click to view a request example

Important

The model is the voice cloning model. The value is fixed to voice-enrollment.

curl -X POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "voice-enrollment",
    "input": {
        "action": "delete_voice",
        "voice_id": "yourVoiceId"
    }
}'

Parameter	Type	Default value	Required	Description
model	string	-	Yes	The voice cloning model. The value is fixed to `voice-enrollment`.
action	string	-	Yes	The operation type. The value is fixed to `delete_voice`.
voice_id	string	-	Yes	The ID of the voice to delete.

Click to view a response example

{
    "output": {},
    "usage": {
        "count": 1
    },
    "request_id": "yourRequestId"
}

Voice quota and automatic cleanup rules

Total limit: 1000 voices per account
This API does not provide a feature for querying the number of voices. You can call the Query voice list API to count the number of voices.
Automatic cleanup: If a voice has not been used for any speech synthesis requests in the past year, the system automatically deletes it.

Billing

Voice cloning: Creating, querying, updating, and deleting voices are free of charge.
Speech synthesis using custom voices: You are billed based on the number of text characters. For more information, see Real-time speech synthesis - CosyVoice.

Copyright and legality

You are responsible for ensuring that you have the ownership and legal rights to use the voices you provide. Read the Terms of Service.

Error codes

If you encounter an error, see Error messages for troubleshooting information.

FAQ

Features

Q: How do I adjust the speech rate and volume of a custom voice?

A: The process is the same as that for preset voices. When you call the speech synthesis API, pass the required parameters. For example, use speech_rate (Python) or speechRate (Java) to adjust the speech rate, and volume to adjust the volume. For more information, see the speech synthesis API documentation (Java SDK / Python SDK / WebSocket API).

Q: How can I make calls in other languages, such as Go, C#, or Node.js?

A: For voice management, use the RESTful API provided in this document. For speech synthesis, use the WebSocket API and pass the voice_id retrieved from cloning for the voice parameter.

Troubleshooting

If you encounter a code error, refer to the information in Error codes to troubleshoot the issue.

Q: Why can't I find the VoiceEnrollmentService class?

A: This error occurs because your SDK version is outdated. Install the latest version of the SDK.

Q: What should I do if the voice cloning result is poor, noisy, or unclear?

A: This issue is usually caused by low-quality input audio. Refer to the Recording guide to re-record and upload the audio.