CosyVoice voice cloning is accessible through the DashScope Python SDK.
Service endpoint
The SDK uses the China (Beijing) region endpoint by default. To switch to a different region, set dashscope.base_http_api_url before you initialize the client.
International
If you select the International deployment scope, model inference compute resources are dynamically scheduled worldwide, excluding the Chinese mainland. Static data is stored in your selected region. Supported region: Singapore.
https://dashscope-intl.aliyuncs.com/api/v1
Chinese mainland
If you select the Chinese mainland deployment scope, model inference compute resources are restricted to the Chinese mainland. Static data is stored in your selected region. Supported region: China (Beijing).
https://dashscope.aliyuncs.com/api/v1
Switch to the Singapore region:
import dashscope
# Set at the beginning of your code
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'Note:
API keys differ between regions. Use the API key that corresponds to the target region.
The region setting is global and affects all DashScope SDK API calls.
VoiceEnrollmentService class
Package path: dashscope.audio.tts_v2.VoiceEnrollmentService
Purpose: Manage the lifecycle of CosyVoice cloned voices, including creating, querying, updating, and deleting voices.
Constructor
VoiceEnrollmentService()create_voice() — Create a voice
Method signature:
def create_voice(self, target_model: str, prefix: str, url: str,
language_hints: List[str] = None,
max_prompt_audio_length: float = None,
enable_preprocess: bool = None) -> strParameters:
Parameter | Type | Required | Description |
target_model | str | Yes | The text-to-speech (TTS) model that drives the cloned voice. It must match the model you specify when calling the TTS API; otherwise, synthesis fails. |
prefix | str | Yes | A prefix for the voice name. Only alphanumeric characters are allowed, with a maximum length of 10 characters. The resulting voice name follows this format: |
url | str | Yes | The URL of the audio file for voice cloning. The URL must be publicly accessible. |
language_hints | List[str] | No | Important
Applies only to CosyVoice voice cloning (when model is Helps the model identify the language of the sample audio to extract voice features more accurately and improve cloning quality. If the specified language doesn't match the actual audio language (for example, setting This parameter is an array, but the current version processes only the first element. Valid values vary by model:
Default: ["zh"]. |
max_prompt_audio_length | float | No | Important
Applies only to CosyVoice voice cloning (when model is The maximum duration (in seconds) of the reference audio after preprocessing. Valid values: [3.0, 30.0]. Longer durations produce better results. Default: 10.0. |
enable_preprocess | bool | No | Important
Applies only to CosyVoice voice cloning (when model is Whether to enable audio preprocessing (noise reduction, audio enhancement, and volume normalization). Enable this for recordings with background noise. Disable it for recordings in quiet environments to preserve the original voice characteristics. Default: false. |
Return value: str — The voice ID (voice_id).
list_voice() — List voices
Method signature:
def list_voice(self, prefix: str = None, page_index: int = 0, page_size: int = 10) -> listParameters:
Parameter | Type | Required | Description |
prefix | str | No | Filter by voice name prefix. |
page_index | int | No | Page index. Default: 0. |
page_size | int | No | Number of entries per page. Default: 10. |
Return value: list — A list of voices.
query_voice() — Query voice details
Method signature:
def query_voice(self, voice_id: str) -> dictParameters:
Parameter | Type | Required | Description |
voice_id | str | Yes | The voice ID to query. |
Return value: dict — Voice details.
update_voice() — Update a voice
Method signature:
def update_voice(self, voice_id: str, url: str, language_hints: List[str] = None,
max_prompt_audio_length: float = None, enable_preprocess: bool = None) -> NoneParameters:
Parameter | Type | Required | Description |
voice_id | str | Yes | The voice ID to update. |
url | str | Yes | The new audio file URL. |
language_hints | List[str] | No | Language hints for the sample audio. |
max_prompt_audio_length | float | No | Maximum duration of the reference audio. |
enable_preprocess | bool | No | Whether to enable audio preprocessing. |
delete_voice() — Delete a voice
Method signature:
def delete_voice(self, voice_id: str) -> NoneParameters:
Parameter | Type | Required | Description |
voice_id | str | Yes | The voice ID to delete. |