The CosyVoice voice cloning service uses a generative large voice model to create a highly similar and natural-sounding custom voice from an audio sample of only 10 to 20 seconds, without the need of traditional training. Voice cloning and speech synthesis are two sequential steps. This document describes the parameters and API details for voice cloning. For speech synthesis, see Real-time speech synthesis - CosyVoice.
This document applies only to the China (Beijing) region. To use the model, you must use an API key from the China (Beijing) region.
User guide: For an introduction to the models and how to select them, see Real-time speech synthesis - CosyVoice.
Audio requirements
High-quality input audio is crucial for high-quality cloning results.
Item | Requirement |
Supported formats | WAV (16-bit), MP3, M4A |
Audio duration | Recommended: 10 to 20 seconds. Maximum: 60 seconds. |
File size | ≤ 10 MB |
Sample rate | ≥ 16 kHz |
Sound channel | Mono / Stereo |
Content | The audio must contain at least 5 seconds of continuous, clear reading without background sound. The rest of the audio can only have short pauses (≤ 2 seconds). The entire audio clip must be free of background music, noise, or other voices to ensure the quality of the core reading content. Use normal speech audio as input. Do not upload songs or singing audio to ensure the accuracy and usability of the cloned voice. |
Language | Varies depending on the speech synthesis model that drives the voice (specified by the
|
Getting started: From cloning to synthesis
1. Workflow
Voice cloning and speech synthesis are two closely related but separate steps. They follow a "create first, then use" flow:
Create a voice
Call the Create voice API and upload an audio clip. The system analyzes the audio and creates a unique cloned voice. In this step, you must specify
target_model/targetModelto select the speech synthesis model that will be used with the created voice.If you already have a created voice, skip this step. Call the Query voice list API to check for existing voices.
Use the voice for speech synthesis
Call the speech synthesis API and pass the voice obtained in the previous step. The speech synthesis model specified in this step must be the same as the
target_model/targetModelspecified in the previous step.
2. Model configuration and preparations
Select the appropriate models and complete the required preparations.
Model configuration
When cloning a voice, you need to specify the following two models:
Voice cloning model: voice-enrollment
Speech synthesis model to be used with the voice:
For the best results, use
cosyvoice-v3-plusif your resources and budget allow.Version
Scenarios
cosyvoice-v3-plus
For the best sound quality and expressiveness with a sufficient budget.
cosyvoice-v3-flash
For a balance between performance and cost, offering high value.
cosyvoice-v2
For compatibility with older versions or low-requirement scenarios.
Preparations
Obtain an API key: Obtain and configure an API key. For security, configure the API key as an environment variable.
Install the SDK: Make sure you have installed the latest version of the DashScope SDK.
Prepare the audio URL: Upload an audio file that meets the audio requirements to a publicly accessible location, such as Object Storage Service (OSS). Make sure the URL is publicly accessible.
3. End-to-end example: From cloning to synthesis
The following example shows how to use a custom voice generated by voice cloning in speech synthesis to produce an output that closely resembles the original voice.
Key principle: When cloning a voice, the
target_model(the speech synthesis model to be used with the voice) must be the same as the speech synthesis model specified when you call the speech synthesis API. Otherwise, the synthesis will fail.Remember to replace
AUDIO_URLin the example with your actual audio URL.
import os
import time
import dashscope
from dashscope.audio.tts_v2 import VoiceEnrollmentService, SpeechSynthesizer
# 1. Prepare the environment
# Configure the API key as an environment variable.
# export DASHSCOPE_API_KEY="sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
dashscope.api_key = os.getenv("DASHSCOPE_API_KEY")
if not dashscope.api_key:
raise ValueError("DASHSCOPE_API_KEY environment variable not set.")
# 2. Define cloning parameters
TARGET_MODEL = "cosyvoice-v3-plus"
# Give the voice a meaningful prefix.
VOICE_PREFIX = "myvoice" # Only digits and lowercase letters are allowed, less than 10 characters.
# Publicly accessible audio URL
AUDIO_URL = "https://dashscope.oss-cn-beijing.aliyuncs.com/samples/audio/cosyvoice/cosyvoice-zeroshot-sample.wav" # Example URL, replace it with your own.
# 3. Create the voice (asynchronous task)
print("--- Step 1: Creating voice enrollment ---")
service = VoiceEnrollmentService()
try:
voice_id = service.create_voice(
target_model=TARGET_MODEL,
prefix=VOICE_PREFIX,
url=AUDIO_URL
)
print(f"Voice enrollment submitted successfully. Request ID: {service.get_last_request_id()}")
print(f"Generated Voice ID: {voice_id}")
except Exception as e:
print(f"Error during voice creation: {e}")
raise e
# 4. Poll for the voice status
print("\n--- Step 2: Polling for voice status ---")
max_attempts = 30
poll_interval = 10 # seconds
for attempt in range(max_attempts):
try:
voice_info = service.query_voice(voice_id=voice_id)
status = voice_info.get("status")
print(f"Attempt {attempt + 1}/{max_attempts}: Voice status is '{status}'")
if status == "OK":
print("Voice is ready for synthesis.")
break
elif status == "UNDEPLOYED":
print(f"Voice processing failed with status: {status}. Please check audio quality or contact support.")
raise RuntimeError(f"Voice processing failed with status: {status}")
# For intermediate statuses such as "DEPLOYING", continue to wait.
time.sleep(poll_interval)
except Exception as e:
print(f"Error during status polling: {e}")
time.sleep(poll_interval)
else:
print("Polling timed out. The voice is not ready after several attempts.")
raise RuntimeError("Polling timed out. The voice is not ready after several attempts.")
# 5. Synthesize speech using the cloned voice
print("\n--- Step 3: Synthesizing speech with the new voice ---")
try:
synthesizer = SpeechSynthesizer(model=TARGET_MODEL, voice=voice_id)
text_to_synthesize = "Congratulations, you have successfully cloned and synthesized your own voice!"
# The call() method returns binary audio data.
audio_data = synthesizer.call(text_to_synthesize)
print(f"Speech synthesis successful. Request ID: {synthesizer.get_last_request_id()}")
# 6. Save the audio file
output_file = "my_custom_voice_output.mp3"
with open(output_file, "wb") as f:
f.write(audio_data)
print(f"Audio saved to {output_file}")
except Exception as e:
print(f"Error during speech synthesis: {e}")API reference
When using different APIs, ensure all operations are performed using the same account.
Create voice
Upload an audio file for cloning to create a custom voice.
Python SDK
API description
def create_voice(self, target_model: str, prefix: str, url: str, language_hints: List[str] = None) -> str:
'''
Creates a new custom voice.
param: target_model The speech synthesis model that drives the voice. This must be the same as the speech synthesis model used when you later call the speech synthesis API. Otherwise, synthesis will fail. Recommended models are cosyvoice-v3-flash or cosyvoice-v3-plus.
param: prefix A user-friendly name for the voice (only digits, uppercase and lowercase letters, and underscores are allowed, up to 10 characters). Use an identifier related to the role or scenario. This keyword appears in the cloned voice name. The generated voice name format is: model_name-prefix-unique_identifier, for example, cosyvoice-v3-plus-myvoice-xxxxxxxx.
param: url The URL of the audio file for voice cloning. The URL must be publicly accessible.
param: language_hints Specifies the target language to help improve synthesis accuracy. This parameter is effective for English, French, German, Japanese, Korean, and Russian (Chinese does not need to be specified). Currently, only one language can be selected. Valid values: en, fr, de, ja, ko, ru.
return: voice_id The voice ID. It can be directly used for the voice parameter in the speech synthesis API.
'''target_model: The speech synthesis model to be used with the voice. This must be the same as the speech synthesis model used when you call the speech synthesis API. Otherwise, the synthesis will fail.language_hints: This parameter applies only to the cosyvoice-v3-flash and cosyvoice-v3-plus models.
Request example
from dashscope.audio.tts_v2 import VoiceEnrollmentService
service = VoiceEnrollmentService()
# Avoid frequent calls. Each call creates a new voice. You cannot create more voices after you reach the quota limit.
voice_id = service.create_voice(
target_model='cosyvoice-v3-plus',
prefix='myvoice',
url='https://your-audio-file-url',
language_hints=['en']
)
print(f"Request ID: {service.get_last_request_id()}")
print(f"Voice ID: {voice_id}")Java SDK
API description
/**
* Creates a new custom voice.
*
* @param targetModel The speech synthesis model that drives the voice. This must be the same as the speech synthesis model used when you later call the speech synthesis API. Otherwise, synthesis will fail. Recommended models are cosyvoice-v3-flash or cosyvoice-v3-plus.
* @param prefix A user-friendly name for the voice (only digits, uppercase and lowercase letters, and underscores are allowed, up to 10 characters). Use an identifier related to the role or scenario. This keyword appears in the cloned voice name. The generated voice name format is: model_name-prefix-unique_identifier, for example, cosyvoice-v3-plus-myvoice-xxxxxxxx.
* @param url The URL of the audio file for voice cloning. The URL must be publicly accessible.
* @param customParam Custom parameters. You can specify the target language (languageHints) here to help improve synthesis accuracy. This parameter is effective for English, French, German, Japanese, Korean, and Russian (Chinese does not need to be specified). Currently, only one language can be selected. Valid values: en, fr, de, ja, ko, ru.
* @return Voice The newly created voice. You can get the voice ID using the getVoiceId method of the Voice object. The ID can be directly used for the voice parameter in the speech synthesis API.
* @throws NoApiKeyException if the API key is empty.
* @throws InputRequiredException if a required parameter is empty.
*/
public Voice createVoice(String targetModel, String prefix, String url, VoiceEnrollmentParam customParam) throws NoApiKeyException, InputRequiredExceptiontargetModel: The speech synthesis model to be used with the voice. This must be the same as the speech synthesis model used when you call the speech synthesis API. Otherwise, the synthesis will fail.languageHints: This parameter applies only to the cosyvoice-v3-flash and cosyvoice-v3-plus models.
Request example
import com.alibaba.dashscope.audio.ttsv2.enrollment.Voice;
import com.alibaba.dashscope.audio.ttsv2.enrollment.VoiceEnrollmentParam;
import com.alibaba.dashscope.audio.ttsv2.enrollment.VoiceEnrollmentService;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.Collections;
public class Main {
private static final Logger logger = LoggerFactory.getLogger(Main.class);
public static void main(String[] args) {
String apiKey = System.getenv("DASHSCOPE_API_KEY");
String targetModel = "cosyvoice-v3-plus";
String prefix = "myvoice";
String fileUrl = "https://your-audio-file-url";
String cloneModelName = "voice-enrollment";
try {
VoiceEnrollmentService service = new VoiceEnrollmentService(apiKey);
Voice myVoice = service.createVoice(
targetModel,
prefix,
fileUrl,
VoiceEnrollmentParam.builder()
.model(cloneModelName)
.languageHints(Collections.singletonList("en")).build());
logger.info("Voice creation submitted. Request ID: {}", service.getLastRequestId());
logger.info("Generated Voice ID: {}", myVoice.getVoiceId());
} catch (Exception e) {
logger.error("Failed to create voice", e);
}
}
}RESTful API
Basic information
URL | |
Request method | POST |
Request headers | |
Message body | The message body that contains all request parameters is as follows. You can omit optional fields as needed: Important
|
Request parameters
Parameter | Type | Default value | Required | Description |
model | string | - | Yes | The voice cloning model. The value is fixed to |
action | string | - | Yes | The operation type. The value is fixed to |
target_model | string | - | Yes | The speech synthesis model that drives the voice. Recommended models are cosyvoice-v3-flash or cosyvoice-v3-plus. This must be the same as the speech synthesis model used when you later call the speech synthesis API. Otherwise, synthesis will fail. |
prefix | string | - | Yes | A user-friendly name for the voice (only digits, uppercase and lowercase letters, and underscores are allowed, up to 10 characters). Use an identifier related to the role or scenario. This keyword appears in the cloned voice name. The generated voice name format is: |
url | string | - | Yes | The URL of the audio file for voice cloning. The URL must be publicly accessible. |
language_hints | array[string] | - | No | Specifies the target language to help improve synthesis accuracy. This parameter is effective for English, French, German, Japanese, Korean, and Russian (Chinese does not need to be specified). Currently, only one language can be selected. This parameter applies only to the cosyvoice-v3-flash and cosyvoice-v3-plus models. Valid values:
|
Response parameters
Parameter | Type | Description |
voice_id | string | The voice ID. It can be directly used for the |
Query voice list
Retrieve a list of created voices using a paged query.
Python SDK
API description
def list_voices(self, prefix=None, page_index: int = 0, page_size: int = 10) -> List[dict]:
'''
Queries all created voices.
param: prefix The custom prefix for the voice. Only digits and lowercase letters are allowed, up to 10 characters.
param: page_index The page index for the query.
param: page_size The page size for the query.
return: List[dict] A list of voices. It includes the ID, creation time, modification time, and status of each voice. Format: [{'gmt_create': '2025-10-09 14:51:01', 'gmt_modified': '2025-10-09 14:51:07', 'status': 'OK', 'voice_id': 'cosyvoice-v3-myvoice-xxx'}]
There are three voice statuses:
DEPLOYING: Under review
OK: Approved and ready to use
UNDEPLOYED: Not approved and cannot be used
'''Request example
from dashscope.audio.tts_v2 import VoiceEnrollmentService
service = VoiceEnrollmentService()
# Filter by prefix, or set to None to query all.
voices = service.list_voices(prefix='myvoice', page_index=0, page_size=10)
print(f"Request ID: {service.get_last_request_id()}")
print(f"Found voices: {voices}")Response example
[
{
"gmt_create": "2024-09-13 11:29:41",
"voice_id": "yourVoiceId",
"gmt_modified": "2024-09-13 11:29:41",
"status": "OK"
},
{
"gmt_create": "2024-09-13 13:22:38",
"voice_id": "yourVoiceId",
"gmt_modified": "2024-09-13 13:22:38",
"status": "OK"
}
]Response parameters
Parameter | Type | Description |
voice_id | string | The voice ID. |
gmt_create | string | The time when the voice was created. |
gmt_modified | string | The time when the voice was modified. |
status | string | The voice status:
|
Java SDK
API description
// There are three voice statuses:
// DEPLOYING: Under review
// OK: Approved and ready to use
// UNDEPLOYED: Not approved and cannot be used
/**
* Queries all created voices. The default page index is 0, and the default page size is 10.
*
* @param prefix The custom prefix for the voice. Only digits and lowercase letters are allowed, up to 10 characters. Can be null.
* @return Voice[] An array of Voice objects. The Voice object encapsulates the voice ID, creation time, modification time, and status.
* @throws NoApiKeyException if the API key is empty.
* @throws InputRequiredException if a required parameter is empty.
*/
public Voice[] listVoice(String prefix) throws NoApiKeyException, InputRequiredException
/**
* Queries all created voices.
*
* @param prefix The custom prefix for the voice. Only digits and lowercase letters are allowed, up to 10 characters.
* @param pageIndex The page index for the query.
* @param pageSize The page size for the query.
* @return Voice[] An array of Voice objects. The Voice object encapsulates the voice ID, creation time, modification time, and status.
* @throws NoApiKeyException if the API key is empty.
* @throws InputRequiredException if a required parameter is empty.
*/
public Voice[] listVoice(String prefix, int pageIndex, int pageSize) throws NoApiKeyException, InputRequiredExceptionRequest example
You must import the third-party library com.google.gson.Gson.
import com.alibaba.dashscope.audio.ttsv2.enrollment.Voice;
import com.alibaba.dashscope.audio.ttsv2.enrollment.VoiceEnrollmentService;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.Gson;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class Main {
public static String apiKey = System.getenv("DASHSCOPE_API_KEY"); // If you have not configured an environment variable, replace this with your API key.
private static String prefix = "myvoice"; // Replace this with your actual value.
private static final Logger logger = LoggerFactory.getLogger(Main.class);
public static void main(String[] args)
throws NoApiKeyException, InputRequiredException {
VoiceEnrollmentService service = new VoiceEnrollmentService(apiKey);
// Query voices.
Voice[] voices = service.listVoice(prefix, 0, 10);
logger.info("List successful. Request ID: {}", service.getLastRequestId());
logger.info("Voices Details: {}", new Gson().toJson(voices));
}
}Response example
[
{
"gmt_create": "2024-09-13 11:29:41",
"voice_id": "yourVoiceId",
"gmt_modified": "2024-09-13 11:29:41",
"status": "OK"
},
{
"gmt_create": "2024-09-13 13:22:38",
"voice_id": "yourVoiceId",
"gmt_modified": "2024-09-13 13:22:38",
"status": "OK"
}
]Response parameters
Parameter | Type | Description |
voice_id | string | The voice ID. |
gmt_create | string | The time when the voice was created. |
gmt_modified | string | The time when the voice was modified. |
status | string | The voice status:
|
RESTful API
Basic information
URL | |
Request method | POST |
Request headers | |
Message body | The message body that contains all request parameters is as follows. You can omit optional fields as needed: Important The |
Request parameters
Parameter | Type | Default value | Required | Description |
model | string | - | Yes | The voice cloning model. The value is fixed to |
action | string | - | Yes | The operation type. The value is fixed to |
prefix | string | null | No | The custom prefix for the voice. Only digits and lowercase letters are allowed, up to 10 characters. |
page_index | integer | 0 | No | The page number index, starting from 0. |
page_size | integer | 10 | No | The number of data entries per page. |
Response parameters
Parameter | Type | Description |
voice_id | string | The voice ID. |
gmt_create | string | The time when the voice was created. |
gmt_modified | string | The time when the voice was modified. |
status | string | The voice status:
|
Query a specific voice
Retrieves the details of a specific voice.
Python SDK
API description
def query_voice(self, voice_id: str) -> List[str]:
'''
Queries the details of a specific voice.
param: voice_id The ID of the voice to query.
return: List[str] The voice details, including status, creation time, audio link, and more.
'''Request example
from dashscope.audio.tts_v2 import VoiceEnrollmentService
service = VoiceEnrollmentService()
voice_id = 'cosyvoice-v3-plus-myvoice-xxxxxxxx'
voice_details = service.query_voice(voice_id=voice_id)
print(f"Request ID: {service.get_last_request_id()}")
print(f"Voice Details: {voice_details}")Response example
{
"gmt_create": "2024-09-13 11:29:41",
"resource_link": "https://yourAudioFileUrl",
"target_model": "cosyvoice-v3-plus",
"gmt_modified": "2024-09-13 11:29:41",
"status": "OK"
}Response parameters
Parameter | Type | Description |
resource_link | string | The URL of the audio that was cloned. |
target_model | string | The speech synthesis model that drives the voice. Recommended models are cosyvoice-v3-flash or cosyvoice-v3-plus. This must be the same as the speech synthesis model used when you later call the speech synthesis API. Otherwise, synthesis will fail. |
gmt_create | string | The time when the voice was created. |
gmt_modified | string | The time when the voice was modified. |
status | string | The voice status:
|
Java SDK
API description
/**
* Queries the details of a specific voice.
*
* @param voiceId The ID of the voice to query.
* @return Voice The voice details, including status, creation time, audio link, and more.
* @throws NoApiKeyException if the API key is empty.
* @throws InputRequiredException if a required parameter is empty.
*/
public Voice queryVoice(String voiceId) throws NoApiKeyException, InputRequiredExceptionRequest example
You must import the third-party library com.google.gson.Gson.
import com.alibaba.dashscope.audio.ttsv2.enrollment.Voice;
import com.alibaba.dashscope.audio.ttsv2.enrollment.VoiceEnrollmentService;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.google.gson.Gson;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class Main {
public static String apiKey = System.getenv("DASHSCOPE_API_KEY"); // If you have not configured an environment variable, replace this with your API key.
private static String voiceId = "cosyvoice-v3-plus-myvoice-xxx"; // Replace this with your actual value.
private static final Logger logger = LoggerFactory.getLogger(Main.class);
public static void main(String[] args)
throws NoApiKeyException, InputRequiredException {
VoiceEnrollmentService service = new VoiceEnrollmentService(apiKey);
Voice voice = service.queryVoice(voiceId);
logger.info("Query successful. Request ID: {}", service.getLastRequestId());
logger.info("Voice Details: {}", new Gson().toJson(voice));
}
}Response example
{
"gmt_create": "2024-09-13 11:29:41",
"resource_link": "https://yourAudioFileUrl",
"target_model": "cosyvoice-v3-plus",
"gmt_modified": "2024-09-13 11:29:41",
"status": "OK"
}Response parameters
Parameter | Type | Description |
resource_link | string | The URL of the audio that was cloned. |
target_model | string | The speech synthesis model that drives the voice. Recommended models are cosyvoice-v3-flash or cosyvoice-v3-plus. This must be the same as the speech synthesis model used when you later call the speech synthesis API. Otherwise, synthesis will fail. |
gmt_create | string | The time when the voice was created. |
gmt_modified | string | The time when the voice was modified. |
status | string | The voice status:
|
RESTful API
Basic information
URL | |
Request method | POST |
Request headers | |
Message body | The message body that contains all request parameters is as follows. You can omit optional fields as needed: Important The |
Request parameters
Parameter | Type | Default value | Required | Description |
model | string | - | Yes | The voice cloning model. The value is fixed to |
action | string | - | Yes | The operation type. The value is fixed to |
voice_id | string | - | Yes | The ID of the voice to query. |
Response parameters
Parameter | Type | Description |
resource_link | string | The URL of the audio that was cloned. |
target_model | string | The speech synthesis model that drives the voice. Recommended models are cosyvoice-v3-flash or cosyvoice-v3-plus. This must be the same as the speech synthesis model used when you later call the speech synthesis API. Otherwise, synthesis will fail. |
gmt_create | string | The time when the voice was created. |
gmt_modified | string | The time when the voice was modified. |
status | string | The voice status:
|
Update a voice
Updates an existing voice with a new audio file.
Python SDK
API description
def update_voice(self, voice_id: str, url: str) -> None:
'''
Updates a voice.
param: voice_id The ID of the voice.
param: url The URL of the audio file for voice cloning.
'''Request example
from dashscope.audio.tts_v2 import VoiceEnrollmentService
service = VoiceEnrollmentService()
service.update_voice(
voice_id='cosyvoice-v3-plus-myvoice-xxxxxxxx',
url='https://your-new-audio-file-url'
)
print(f"Update submitted. Request ID: {service.get_last_request_id()}")Java SDK
API description
/**
* Updates a voice.
*
* @param voiceId The voice to update.
* @param url The URL of the audio file for voice cloning.
* @throws NoApiKeyException if the API key is empty.
* @throws InputRequiredException if a required parameter is empty.
*/
public void updateVoice(String voiceId, String url)
throws NoApiKeyException, InputRequiredExceptionRequest example
import com.alibaba.dashscope.audio.ttsv2.enrollment.VoiceEnrollmentService;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class Main {
public static String apiKey = System.getenv("DASHSCOPE_API_KEY"); // If you have not configured an environment variable, replace this with your API key.
private static String fileUrl = "https://your-audio-file-url"; // Replace this with your actual value.
private static String voiceId = "cosyvoice-v3-plus-myvoice-xxx"; // Replace this with your actual value.
private static final Logger logger = LoggerFactory.getLogger(Main.class);
public static void main(String[] args)
throws NoApiKeyException, InputRequiredException {
VoiceEnrollmentService service = new VoiceEnrollmentService(apiKey);
// Update the voice.
service.updateVoice(voiceId, fileUrl);
logger.info("Update submitted. Request ID: {}", service.getLastRequestId());
}
}RESTful API
Basic information
URL | |
Request method | POST |
Request headers | |
Message body | The message body that contains all request parameters is as follows. You can omit optional fields as needed: Important The |
Request parameters
Parameter | Type | Default value | Required | Description |
model | string | - | Yes | The voice cloning model. The value is fixed to |
action | string | - | Yes | The operation type. The value is fixed to |
voice_id | string | - | Yes | The ID of the voice to update. |
url | string | - | Yes | The URL of the audio file to update the voice. The URL must be publicly accessible. For information about how to record audio, see Recording guide. |
Delete a voice
Deletes a voice that is no longer needed to free up your quota. This operation is irreversible.
Python SDK
API description
def delete_voice(self, voice_id: str) -> None:
'''
Deletes a voice.
param: voice_id The voice to delete.
'''Request example
from dashscope.audio.tts_v2 import VoiceEnrollmentService
service = VoiceEnrollmentService()
service.delete_voice(voice_id='cosyvoice-v3-plus-myvoice-xxxxxxxx')
print(f"Deletion submitted. Request ID: {service.get_last_request_id()}")Java SDK
API description
/**
* Deletes a voice.
*
* @param voiceId The voice to delete.
* @throws NoApiKeyException if the API key is empty.
* @throws InputRequiredException if a required parameter is empty.
*/
public void deleteVoice(String voiceId) throws NoApiKeyException, InputRequiredException Request example
import com.alibaba.dashscope.audio.ttsv2.enrollment.VoiceEnrollmentService;
import com.alibaba.dashscope.exception.InputRequiredException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class Main {
public static String apiKey = System.getenv("DASHSCOPE_API_KEY"); // If you have not configured an environment variable, replace this with your API key.
private static String voiceId = "cosyvoice-v3-plus-myvoice-xxx"; // Replace this with your actual value.
private static final Logger logger = LoggerFactory.getLogger(Main.class);
public static void main(String[] args)
throws NoApiKeyException, InputRequiredException {
VoiceEnrollmentService service = new VoiceEnrollmentService(apiKey);
// Delete the voice.
service.deleteVoice(voiceId);
logger.info("Deletion submitted. Request ID: {}", service.getLastRequestId());
}
}RESTful API
Basic information
URL | |
Request method | POST |
Request headers | |
Message body | The message body that contains all request parameters is as follows. You can omit optional fields as needed: Important The |
Request parameters
Parameter | Type | Default value | Required | Description |
model | string | - | Yes | The voice cloning model. The value is fixed to |
action | string | - | Yes | The operation type. The value is fixed to |
voice_id | string | - | Yes | The ID of the voice to delete. |
Voice quota and automatic cleanup rules
Total limit: 1000 voices per account
This API does not provide a feature for querying the number of voices. You can call the Query voice list API to count the number of voices.
Automatic cleanup: If a voice has not been used for any speech synthesis requests in the past year, the system automatically deletes it.
Billing
Voice cloning: Creating, querying, updating, and deleting voices are free of charge.
Speech synthesis using custom voices: You are billed based on the number of text characters. For more information, see Real-time speech synthesis - CosyVoice.
Copyright and legality
You are responsible for ensuring that you have the ownership and legal rights to use the voices you provide. Read the Terms of Service.
Error codes
If you encounter an error, see Error messages for troubleshooting information.
FAQ
Features
Q: How do I adjust the speech rate and volume of a custom voice?
A: The process is the same as that for preset voices. When you call the speech synthesis API, pass the required parameters. For example, use speech_rate (Python) or speechRate (Java) to adjust the speech rate, and volume to adjust the volume. For more information, see the speech synthesis API documentation (Java SDK / Python SDK / WebSocket API).
Q: How can I make calls in other languages, such as Go, C#, or Node.js?
A: For voice management, use the RESTful API provided in this document. For speech synthesis, use the WebSocket API and pass the voice_id retrieved from cloning for the voice parameter.
Troubleshooting
If you encounter a code error, refer to the information in Error codes to troubleshoot the issue.
Q: Why can't I find the VoiceEnrollmentService class?
A: This error occurs because your SDK version is outdated. Install the latest version of the SDK.
Q: What should I do if the voice cloning result is poor, noisy, or unclear?
A: This issue is usually caused by low-quality input audio. Refer to the Recording guide to re-record and upload the audio.