Voice Design HTTP API reference - Alibaba Cloud Model Studio

Use the Voice Design HTTP API to create, list, query, and delete custom voices.

Endpoint

International

If you select the International deployment scope, model inference compute resources are dynamically scheduled worldwide, excluding the Chinese mainland. Static data is stored in your selected region. Supported region: Singapore.

POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization

Chinese mainland

If you select the Chinese mainland deployment scope, model inference compute resources are restricted to the Chinese mainland. Static data is stored in your selected region. Supported region: China (Beijing).

POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

Request headers

Parameter	Type	Required	Description
Authorization	string	Yes	Authorization token in the format `Bearer <your_api_key>`. Replace `<your_api_key>` with your actual API key.
Content-Type	string	Yes	Media type of the request body. Set to `application/json`.

Create a voice

Request body	The following example uses the Singapore region URL. To use a model deployed in the Beijing region, replace the URL with: `https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization`. CosyVoice voice design curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "voice-enrollment", "input": { "action": "create_voice", "target_model": "cosyvoice-v3.5-plus", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "prefix": "announcer", "language_hints": ["en"] }, "parameters": { "sample_rate": 24000, "response_format": "wav" } }' Qwen voice design curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \ -H "Authorization: Bearer $DASHSCOPE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-realtime-2026-01-15", "preferred_name": "announcer", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "language": "en" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } }'
model `string` (Required) The voice design model. Valid values: `voice-enrollment`: CosyVoice voice design. `qwen-voice-design`: Qwen voice design.
input `object` (Required) The input parameter object. Properties action `string` (Required) The operation type. CosyVoice (`voice-enrollment`): Set to `create_voice`. Qwen (`qwen-voice-design`): Set to `create`. target_model `string` (Required) The text-to-speech (TTS) model that drives the voice. This value must match the model used when you call the TTS API. A mismatch causes synthesis to fail. voice_prompt `string` (Required) A voice description of the desired voice characteristics. Only Chinese and English are supported. CosyVoice (`voice-enrollment`): Maximum 500 characters. Qwen (`qwen-voice-design`): Maximum 2,048 characters. preview_text `string` (Required) The text for the preview audio. CosyVoice (`voice-enrollment`): Maximum 200 characters. Chinese and English are supported. Qwen (`qwen-voice-design`): Maximum 1,024 characters. Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, and Russian are supported. prefix `string` (Conditionally required) Important Only applicable to CosyVoice (when model is `voice-enrollment`). The voice name prefix. Only digits and letters are allowed, with a maximum of 10 characters. The generated voice name follows the format: `{target_model}-vd-{prefix}-{unique_id}` preferred_name `string` (Conditionally required) Important Only applicable to Qwen (when model is `qwen-voice-design`). The voice name prefix. Only digits, letters, and underscores are allowed, with a maximum of 16 characters. language_hints `array[string]` (Optional) Important Only applicable to CosyVoice (when model is `voice-enrollment`). The language hints for the generated voice. This determines the voice's language characteristics and pronunciation patterns. Set this to the language that matches your use case. The specified language must match the language of `preview_text`. Currently, only the first element is used. Valid values: zh: Chinese en: English Default: ["zh"]. language `string` (Optional) Important Only applicable to Qwen (when model is `qwen-voice-design`). The language hints for the generated voice. This determines the voice's language characteristics and pronunciation patterns. Set this to the language that matches your use case. The specified language must match the language of `preview_text`. Valid values: zh: Chinese en: English de: German it: Italian pt: Portuguese es: Spanish ja: Japanese ko: Korean fr: French ru: Russian Default: zh.
parameters `object` (Optional) Configuration for voice design. Properties sample_rate `int` (Optional) Sample rate of the preview audio, in Hz. CosyVoice: 16000, 24000, or 48000. Qwen: 8000, 16000, 24000, or 48000. Default: 24000. response_format `string` (Optional) Format of the preview audio. CosyVoice: pcm, wav, or mp3. Qwen: pcm, wav, mp3, or opus. Default: wav.

Response body	CosyVoice voice design `{ "output": { "preview_audio": { "data": "{base64_encoded_audio}", "sample_rate": 24000, "response_format": "wav" }, "target_model": "cosyvoice-v3.5-plus", "voice_id": "cosyvoice-v3.5-plus-vd-announcer-xxxxxx" }, "usage": { "count": 1 }, "request_id": "xxxx-xxxx-xxxx" }` Qwen voice design `{ "output": { "preview_audio": { "data": "{base64_encoded_audio}", "sample_rate": 24000, "response_format": "wav" }, "target_model": "qwen3-tts-vd-realtime-2026-01-15", "voice": "yourVoice" }, "usage": { "count": 1 }, "request_id": "xxxx-xxxx-xxxx" }` Important CosyVoice returns the `voice_id` field, while Qwen returns the `voice` field.
request_id `string` The unique identifier of this request.
output `object` The data returned by the model. Properties voice_id / voice `string` The voice ID. CosyVoice returns `voice_id`, and Qwen returns `voice`. Use this value directly as the voice parameter in the TTS API. preview_audio `object` The preview audio data. Properties data `string` The preview audio data, Base64-encoded. sample_rate `int` The sample rate of the preview audio, in Hz. response_format `string` The format of the preview audio. target_model `string` The TTS model that drives the voice.
usage `object` Usage information for this request. Properties count `integer` The number of voices created. Always 1.

List voices

Request body

The following example uses the Singapore region URL. To use a model deployed in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization.

CosyVoice

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "voice-enrollment",
    "input": {
        "action": "list_voice",
        "prefix": "myvoice",
        "page_size": 10,
        "page_index": 0
    }
}'

Qwen voice design

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "list",
        "page_size": 10,
        "page_index": 0
    }
}'

model string (Required)

The voice design model. Valid values:

voice-enrollment: CosyVoice voice design.
qwen-voice-design: Qwen voice design.

input object (Required)

The input parameter object.

Properties

action string (Required)

The operation type. CosyVoice: list_voice. Qwen: list.

prefix string (Optional)

Important

Only applicable to CosyVoice.

Filter voices by name prefix.

page_index integer (Optional)

The page index.

page_size integer (Optional)

The number of entries per page.

Response body	CosyVoice { "output": { "voice_list": [ { "voice_id": "cosyvoice-v3.5-plus-vd-announcer-xxxxxx", "gmt_create": "2025-12-10 14:54:09", "gmt_modified": "2025-12-10 17:47:48", "status": "OK", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news." } ] }, "usage": { "count": 1 }, "request_id": "xxxx-xxxx-xxxx" } Qwen { "output": { "page_index": 0, "page_size": 10, "total_count": 1, "voice_list": [ { "voice": "yourVoice", "gmt_create": "2025-08-11 17:59:32", "gmt_modified": "2025-08-11 17:59:32", "language": "zh", "target_model": "qwen3-tts-vd-realtime-2026-01-15", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news." } ] }, "usage": { "count": 0 }, "request_id": "xxxx-xxxx-xxxx" } Important CosyVoice returns a `voice_list` array where each item contains a `voice_id` field. Qwen also returns a `voice_list` array, but each item contains a `voice` field instead. The Qwen output also includes the `page_index`, `page_size`, and `total_count` pagination fields.
request_id `string` The unique identifier of this request.
output `object` The data returned by the model. Properties page_index `integer` Important Returned by Qwen only. The current page index. page_size `integer` Important Returned by Qwen only. The number of entries per page. total_count `integer` Important Returned by Qwen only. The total number of voices. voice_list `array[object]` The list of voices returned by the query. Properties voice_id / voice `string` The voice ID. CosyVoice uses `voice_id`, and Qwen uses `voice`. gmt_create `string` The creation time. gmt_modified `string` The last modification time. status `string` Important Returned by CosyVoice only. The voice status. For valid values, see "Voice status reference". target_model `string` Important Returned by Qwen only. The TTS model that drives the voice. language `string` The voice language. voice_prompt `string` The voice description text. preview_text `string` The preview audio text.
usage `object` Usage information for this request. Properties count `integer` CosyVoice: always 1. Qwen: always 0.

Query voice details

Request body

The following example uses the Singapore region URL. To use a model deployed in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization.

CosyVoice

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "voice-enrollment",
    "input": {
        "action": "query_voice",
        "voice_id": "yourVoiceId"
    }
}'

Qwen voice design

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "query",
        "voice": "yourVoice"
    }
}'

model string (Required)

The voice design model. Valid values:

voice-enrollment: CosyVoice voice design.
qwen-voice-design: Qwen voice design.

input object (Required)

The input parameter object.

Properties

action string (Required)

The operation type. CosyVoice: query_voice. Qwen voice design: query.

voice_id string (Conditionally required)

Important

Only applicable to CosyVoice.

The voice ID to query.

voice string (Conditionally required)

Important

Only applicable to Qwen voice design (when model is qwen-voice-design).

The voice name to query.

Response body	CosyVoice voice design { "output": { "voice_id": "cosyvoice-v3.5-plus-vd-announcer-xxxxxx", "gmt_create": "2025-12-10 14:54:09", "gmt_modified": "2025-12-10 17:47:48", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "target_model": "cosyvoice-v3.5-plus", "status": "OK", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary." }, "usage": {}, "request_id": "xxxx-xxxx-xxxx" } Qwen voice design `{ "output": { "voice": "yourVoice", "gmt_create": "2025-08-11 17:59:32", "gmt_modified": "2025-08-11 17:59:32", "language": "zh", "target_model": "qwen3-tts-vd-realtime-2026-01-15" }, "usage": { "count": 0 }, "request_id": "xxxx-xxxx-xxxx" }` Important CosyVoice returns `voice_id`, `voice_prompt`, and other fields. Qwen returns the `voice` and `language` fields.
request_id `string` The unique identifier of this request.
output `object` The data returned by the model. Properties voice_id / voice `string` The voice ID. CosyVoice returns `voice_id`, and Qwen returns `voice`. gmt_create `string` The creation time. gmt_modified `string` The last modification time. status `string` Important Returned by CosyVoice only. The voice status. For valid values, see "Voice status reference". target_model `string` The TTS model that drives the voice. language `string` Important Returned by Qwen voice design only. The voice language. voice_prompt `string` Important Returned by CosyVoice voice design only. The voice description text. preview_text `string` Important Returned by CosyVoice voice design only. The preview audio text.
usage `object` Usage information for this request. Properties count `integer` Always 1.

Delete a voice

Request body

The following example uses the Singapore region URL. To use a model deployed in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization.

CosyVoice

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "voice-enrollment",
    "input": {
        "action": "delete_voice",
        "voice_id": "yourVoiceId"
    }
}'

Qwen voice design

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "delete",
        "voice": "yourVoice"
    }
}'

model string (Required)

The voice design model. Valid values:

voice-enrollment: CosyVoice voice design.
qwen-voice-design: Qwen voice design.

input object (Required)

The input parameter object.

Properties

action string (Required)

The operation type. CosyVoice: delete_voice. Qwen: delete.

voice_id string (Conditionally required)

Important

Only applicable to CosyVoice.

The voice ID to delete.

voice string (Conditionally required)

Important

Only applicable to Qwen.

The voice name to delete.

Response body	CosyVoice `{ "output": {}, "usage": { "count": 1 }, "request_id": "xxxx-xxxx-xxxx" }` Qwen `{ "output": { "voice": "yourVoice" }, "usage": { "count": 0 }, "request_id": "xxxx-xxxx-xxxx" }` Important CosyVoice returns an empty output object, while Qwen returns the `voice` field.
request_id `string` The unique identifier of this request.
output `object` The data returned by the model. CosyVoice returns an empty object. Qwen returns the name of the deleted voice. Properties voice `string` Important Returned by Qwen only. The name of the deleted voice.
usage `object` Usage information for this request. Properties count `integer` Always 1.

Voice status reference

After a voice is created, it goes through a review process. The following table describes each status. This status system applies only to CosyVoice (when model is voice-enrollment). Qwen query and list responses don't include a status field.

Status	Description
DEPLOYING	Under review or processing.
OK	Review passed. The voice is ready for use.
UNDEPLOYED	Review rejected. The voice can't be used.

Endpoint

International

Chinese mainland

Request headers

Create a voice

Request body

CosyVoice voice design

Qwen voice design

Response body

CosyVoice voice design

Qwen voice design

List voices

Request body

CosyVoice

Qwen voice design

Response body

CosyVoice

Qwen

Query voice details

Request body

CosyVoice

Qwen voice design

Response body

CosyVoice voice design

Qwen voice design

Delete a voice

Request body

CosyVoice

Qwen voice design

Response body

CosyVoice

Qwen

Voice status reference