All Products
Search
Document Center

Alibaba Cloud Model Studio:Voice cloning Python SDK reference

Last Updated:May 12, 2026

CosyVoice voice cloning is accessible through the DashScope Python SDK.

Service endpoint

The SDK uses the China (Beijing) region endpoint by default. To switch to a different region, set dashscope.base_http_api_url before you initialize the client.

International

If you select the International deployment scope, model inference compute resources are dynamically scheduled worldwide, excluding the Chinese mainland. Static data is stored in your selected region. Supported region: Singapore.

https://dashscope-intl.aliyuncs.com/api/v1

Chinese mainland

If you select the Chinese mainland deployment scope, model inference compute resources are restricted to the Chinese mainland. Static data is stored in your selected region. Supported region: China (Beijing).

https://dashscope.aliyuncs.com/api/v1

Switch to the Singapore region:

import dashscope

# Set at the beginning of your code
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
Note:
  • API keys differ between regions. Use the API key that corresponds to the target region.
  • The region setting is global and affects all DashScope SDK API calls.

VoiceEnrollmentService class

Package path: dashscope.audio.tts_v2.VoiceEnrollmentService

Purpose: Manage the lifecycle of CosyVoice cloned voices, including creating, querying, updating, and deleting voices.

Constructor

VoiceEnrollmentService()

create_voice() — Create a voice

Method signature:

def create_voice(self, target_model: str, prefix: str, url: str,
                 language_hints: List[str] = None,
                 max_prompt_audio_length: float = None,
                 enable_preprocess: bool = None) -> str

Parameters:

Parameter

Type

Required

Description

target_model

str

Yes

The text-to-speech (TTS) model that drives the cloned voice. It must match the model you specify when calling the TTS API; otherwise, synthesis fails.

prefix

str

Yes

A prefix for the voice name. Only alphanumeric characters are allowed, with a maximum length of 10 characters. The resulting voice name follows this format: {target_model}-{prefix}-{unique_id}.

url

str

Yes

The URL of the audio file for voice cloning. The URL must be publicly accessible.

language_hints

List[str]

No

Important

Applies only to CosyVoice voice cloning (when model is voice-enrollment). Supported only by cosyvoice-v3.5-plus, v3.5-flash, v3-plus, and v3-flash.

Helps the model identify the language of the sample audio to extract voice features more accurately and improve cloning quality. If the specified language doesn't match the actual audio language (for example, setting en when the audio is in Chinese), the system ignores this value and detects the language automatically.

This parameter is an array, but the current version processes only the first element.

Valid values vary by model:

  • cosyvoice-v3-plus:

    • zh: Chinese

    • en: English

    • fr: French

    • de: German

    • ja: Japanese

    • ko: Korean

    • ru: Russian

  • cosyvoice-v3.5-plus, cosyvoice-v3.5-flash, cosyvoice-v3-flash:

    • zh: Chinese

    • en: English

    • fr: French

    • de: German

    • ja: Japanese

    • ko: Korean

    • ru: Russian

    • pt: Portuguese

    • th: Thai

    • id: Indonesian

    • vi: Vietnamese

Default: ["zh"].

max_prompt_audio_length

float

No

Important

Applies only to CosyVoice voice cloning (when model is voice-enrollment). Supported only by cosyvoice-v3.5-plus, v3.5-flash, and v3-flash.

The maximum duration (in seconds) of the reference audio after preprocessing. Valid values: [3.0, 30.0]. Longer durations produce better results.

Default: 10.0.

enable_preprocess

bool

No

Important

Applies only to CosyVoice voice cloning (when model is voice-enrollment). Supported only by cosyvoice-v3.5-plus, v3.5-flash, and v3-flash.

Whether to enable audio preprocessing (noise reduction, audio enhancement, and volume normalization). Enable this for recordings with background noise. Disable it for recordings in quiet environments to preserve the original voice characteristics.

Default: false.

Return value: str — The voice ID (voice_id).

list_voice() — List voices

Method signature:

def list_voice(self, prefix: str = None, page_index: int = 0, page_size: int = 10) -> list

Parameters:

Parameter

Type

Required

Description

prefix

str

No

Filter by voice name prefix.

page_index

int

No

Page index. Default: 0.

page_size

int

No

Number of entries per page. Default: 10.

Return value: list — A list of voices.

query_voice() — Query voice details

Method signature:

def query_voice(self, voice_id: str) -> dict

Parameters:

Parameter

Type

Required

Description

voice_id

str

Yes

The voice ID to query.

Return value: dict — Voice details.

update_voice() — Update a voice

Method signature:

def update_voice(self, voice_id: str, url: str, language_hints: List[str] = None,
                 max_prompt_audio_length: float = None, enable_preprocess: bool = None) -> None

Parameters:

Parameter

Type

Required

Description

voice_id

str

Yes

The voice ID to update.

url

str

Yes

The new audio file URL.

language_hints

List[str]

No

Language hints for the sample audio.

max_prompt_audio_length

float

No

Maximum duration of the reference audio.

enable_preprocess

bool

No

Whether to enable audio preprocessing.

delete_voice() — Delete a voice

Method signature:

def delete_voice(self, voice_id: str) -> None

Parameters:

Parameter

Type

Required

Description

voice_id

str

Yes

The voice ID to delete.