All Products
Search
Document Center

Alibaba Cloud Model Studio:Voice Design API reference

Last Updated:May 12, 2026

Use the Voice Design HTTP API to create, list, query, and delete custom voices.

Endpoint

International

If you select the International deployment scope, model inference compute resources are dynamically scheduled worldwide, excluding the Chinese mainland. Static data is stored in your selected region. Supported region: Singapore.

POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization

Chinese mainland

If you select the Chinese mainland deployment scope, model inference compute resources are restricted to the Chinese mainland. Static data is stored in your selected region. Supported region: China (Beijing).

POST https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization

Request headers

Parameter

Type

Required

Description

Authorization

string

Yes

Authorization token in the format Bearer <your_api_key>. Replace <your_api_key> with your actual API key.

Content-Type

string

Yes

Media type of the request body. Set to application/json.

Create a voice

Request body

The following example uses the Singapore region URL. To use a model deployed in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization.

CosyVoice voice design

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "voice-enrollment",
    "input": {
        "action": "create_voice",
        "target_model": "cosyvoice-v3.5-plus",
        "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.",
        "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
        "prefix": "announcer",
        "language_hints": ["en"]
    },
    "parameters": {
        "sample_rate": 24000,
        "response_format": "wav"
    }
}'

Qwen voice design

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "create",
        "target_model": "qwen3-tts-vd-realtime-2026-01-15",
        "preferred_name": "announcer",
        "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.",
        "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
        "language": "en"
    },
    "parameters": {
        "sample_rate": 24000,
        "response_format": "wav"
    }
}'

model string (Required)

The voice design model. Valid values:

  • voice-enrollment: CosyVoice voice design.

  • qwen-voice-design: Qwen voice design.

input object (Required)

The input parameter object.

Properties

action string (Required)

The operation type.

  • CosyVoice (voice-enrollment): Set to create_voice.

  • Qwen (qwen-voice-design): Set to create.

target_model string (Required)

The text-to-speech (TTS) model that drives the voice. This value must match the model used when you call the TTS API. A mismatch causes synthesis to fail.

voice_prompt string (Required)

A voice description of the desired voice characteristics. Only Chinese and English are supported.

  • CosyVoice (voice-enrollment): Maximum 500 characters.

  • Qwen (qwen-voice-design): Maximum 2,048 characters.

preview_text string (Required)

The text for the preview audio.

  • CosyVoice (voice-enrollment): Maximum 200 characters. Chinese and English are supported.

  • Qwen (qwen-voice-design): Maximum 1,024 characters. Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, and Russian are supported.

prefix string (Conditionally required)

Important

Only applicable to CosyVoice (when model is voice-enrollment).

The voice name prefix. Only digits and letters are allowed, with a maximum of 10 characters. The generated voice name follows the format: {target_model}-vd-{prefix}-{unique_id}

preferred_name string (Conditionally required)

Important

Only applicable to Qwen (when model is qwen-voice-design).

The voice name prefix. Only digits, letters, and underscores are allowed, with a maximum of 16 characters.

language_hints array[string] (Optional)

Important

Only applicable to CosyVoice (when model is voice-enrollment).

The language hints for the generated voice. This determines the voice's language characteristics and pronunciation patterns. Set this to the language that matches your use case. The specified language must match the language of preview_text.

Currently, only the first element is used.

Valid values:

  • zh: Chinese

  • en: English

Default: ["zh"].

language string (Optional)

Important

Only applicable to Qwen (when model is qwen-voice-design).

The language hints for the generated voice. This determines the voice's language characteristics and pronunciation patterns. Set this to the language that matches your use case. The specified language must match the language of preview_text.

Valid values:

  • zh: Chinese

  • en: English

  • de: German

  • it: Italian

  • pt: Portuguese

  • es: Spanish

  • ja: Japanese

  • ko: Korean

  • fr: French

  • ru: Russian

Default: zh.

parameters object (Optional)

Configuration for voice design.

Properties

sample_rate int (Optional)

Sample rate of the preview audio, in Hz.

  • CosyVoice: 16000, 24000, or 48000.

  • Qwen: 8000, 16000, 24000, or 48000.

Default: 24000.

response_format string (Optional)

Format of the preview audio.

  • CosyVoice: pcm, wav, or mp3.

  • Qwen: pcm, wav, mp3, or opus.

Default: wav.

Response body

CosyVoice voice design

{
    "output": {
        "preview_audio": {
            "data": "{base64_encoded_audio}",
            "sample_rate": 24000,
            "response_format": "wav"
        },
        "target_model": "cosyvoice-v3.5-plus",
        "voice_id": "cosyvoice-v3.5-plus-vd-announcer-xxxxxx"
    },
    "usage": {
        "count": 1
    },
    "request_id": "xxxx-xxxx-xxxx"
}

Qwen voice design

{
    "output": {
        "preview_audio": {
            "data": "{base64_encoded_audio}",
            "sample_rate": 24000,
            "response_format": "wav"
        },
        "target_model": "qwen3-tts-vd-realtime-2026-01-15",
        "voice": "yourVoice"
    },
    "usage": {
        "count": 1
    },
    "request_id": "xxxx-xxxx-xxxx"
}
Important

CosyVoice returns the voice_id field, while Qwen returns the voice field.

request_id string

The unique identifier of this request.

output object

The data returned by the model.

Properties

voice_id / voice string

The voice ID. CosyVoice returns voice_id, and Qwen returns voice. Use this value directly as the voice parameter in the TTS API.

preview_audio object

The preview audio data.

Properties

data string

The preview audio data, Base64-encoded.

sample_rate int

The sample rate of the preview audio, in Hz.

response_format string

The format of the preview audio.

target_model string

The TTS model that drives the voice.

usage object

Usage information for this request.

Properties

count integer

The number of voices created. Always 1.

List voices

Request body

The following example uses the Singapore region URL. To use a model deployed in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization.

CosyVoice

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "voice-enrollment",
    "input": {
        "action": "list_voice",
        "prefix": "myvoice",
        "page_size": 10,
        "page_index": 0
    }
}'

Qwen voice design

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "list",
        "page_size": 10,
        "page_index": 0
    }
}'

model string (Required)

The voice design model. Valid values:

  • voice-enrollment: CosyVoice voice design.

  • qwen-voice-design: Qwen voice design.

input object (Required)

The input parameter object.

Properties

action string (Required)

The operation type. CosyVoice: list_voice. Qwen: list.

prefix string (Optional)

Important

Only applicable to CosyVoice.

Filter voices by name prefix.

page_index integer (Optional)

The page index.

page_size integer (Optional)

The number of entries per page.

Response body

CosyVoice

{
    "output": {
        "voice_list": [
            {
                "voice_id": "cosyvoice-v3.5-plus-vd-announcer-xxxxxx",
                "gmt_create": "2025-12-10 14:54:09",
                "gmt_modified": "2025-12-10 17:47:48",
                "status": "OK",
                "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.",
                "preview_text": "Dear listeners, hello everyone. Welcome to the evening news."
            }
        ]
    },
    "usage": {
        "count": 1
    },
    "request_id": "xxxx-xxxx-xxxx"
}

Qwen

{
    "output": {
        "page_index": 0,
        "page_size": 10,
        "total_count": 1,
        "voice_list": [
            {
                "voice": "yourVoice",
                "gmt_create": "2025-08-11 17:59:32",
                "gmt_modified": "2025-08-11 17:59:32",
                "language": "zh",
                "target_model": "qwen3-tts-vd-realtime-2026-01-15",
                "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.",
                "preview_text": "Dear listeners, hello everyone. Welcome to the evening news."
            }
        ]
    },
    "usage": {
        "count": 0
    },
    "request_id": "xxxx-xxxx-xxxx"
}
Important

CosyVoice returns a voice_list array where each item contains a voice_id field. Qwen also returns a voice_list array, but each item contains a voice field instead. The Qwen output also includes the page_index, page_size, and total_count pagination fields.

request_id string

The unique identifier of this request.

output object

The data returned by the model.

Properties

page_index integer

Important

Returned by Qwen only.

The current page index.

page_size integer

Important

Returned by Qwen only.

The number of entries per page.

total_count integer

Important

Returned by Qwen only.

The total number of voices.

voice_list array[object]

The list of voices returned by the query.

Properties

voice_id / voice string

The voice ID. CosyVoice uses voice_id, and Qwen uses voice.

gmt_create string

The creation time.

gmt_modified string

The last modification time.

status string

Important

Returned by CosyVoice only.

The voice status. For valid values, see "Voice status reference".

target_model string

Important

Returned by Qwen only.

The TTS model that drives the voice.

language string

The voice language.

voice_prompt string

The voice description text.

preview_text string

The preview audio text.

usage object

Usage information for this request.

Properties

count integer

CosyVoice: always 1. Qwen: always 0.

Query voice details

Request body

The following example uses the Singapore region URL. To use a model deployed in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization.

CosyVoice

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "voice-enrollment",
    "input": {
        "action": "query_voice",
        "voice_id": "yourVoiceId"
    }
}'

Qwen voice design

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "query",
        "voice": "yourVoice"
    }
}'

model string (Required)

The voice design model. Valid values:

  • voice-enrollment: CosyVoice voice design.

  • qwen-voice-design: Qwen voice design.

input object (Required)

The input parameter object.

Properties

action string (Required)

The operation type. CosyVoice: query_voice. Qwen voice design: query.

voice_id string (Conditionally required)

Important

Only applicable to CosyVoice.

The voice ID to query.

voice string (Conditionally required)

Important

Only applicable to Qwen voice design (when model is qwen-voice-design).

The voice name to query.

Response body

CosyVoice voice design

{
    "output": {
        "voice_id": "cosyvoice-v3.5-plus-vd-announcer-xxxxxx",
        "gmt_create": "2025-12-10 14:54:09",
        "gmt_modified": "2025-12-10 17:47:48",
        "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.",
        "target_model": "cosyvoice-v3.5-plus",
        "status": "OK",
        "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary."
    },
    "usage": {},
    "request_id": "xxxx-xxxx-xxxx"
}

Qwen voice design

{
    "output": {
        "voice": "yourVoice",
        "gmt_create": "2025-08-11 17:59:32",
        "gmt_modified": "2025-08-11 17:59:32",
        "language": "zh",
        "target_model": "qwen3-tts-vd-realtime-2026-01-15"
    },
    "usage": {
        "count": 0
    },
    "request_id": "xxxx-xxxx-xxxx"
}
Important

CosyVoice returns voice_id, voice_prompt, and other fields. Qwen returns the voice and language fields.

request_id string

The unique identifier of this request.

output object

The data returned by the model.

Properties

voice_id / voice string

The voice ID. CosyVoice returns voice_id, and Qwen returns voice.

gmt_create string

The creation time.

gmt_modified string

The last modification time.

status string

Important

Returned by CosyVoice only.

The voice status. For valid values, see "Voice status reference".

target_model string

The TTS model that drives the voice.

language string

Important

Returned by Qwen voice design only.

The voice language.

voice_prompt string

Important

Returned by CosyVoice voice design only.

The voice description text.

preview_text string

Important

Returned by CosyVoice voice design only.

The preview audio text.

usage object

Usage information for this request.

Properties

count integer

Always 1.

Delete a voice

Request body

The following example uses the Singapore region URL. To use a model deployed in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization.

CosyVoice

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "voice-enrollment",
    "input": {
        "action": "delete_voice",
        "voice_id": "yourVoiceId"
    }
}'

Qwen voice design

curl -X POST https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
    "model": "qwen-voice-design",
    "input": {
        "action": "delete",
        "voice": "yourVoice"
    }
}'

model string (Required)

The voice design model. Valid values:

  • voice-enrollment: CosyVoice voice design.

  • qwen-voice-design: Qwen voice design.

input object (Required)

The input parameter object.

Properties

action string (Required)

The operation type. CosyVoice: delete_voice. Qwen: delete.

voice_id string (Conditionally required)

Important

Only applicable to CosyVoice.

The voice ID to delete.

voice string (Conditionally required)

Important

Only applicable to Qwen.

The voice name to delete.

Response body

CosyVoice

{
    "output": {},
    "usage": {
        "count": 1
    },
    "request_id": "xxxx-xxxx-xxxx"
}

Qwen

{
    "output": {
        "voice": "yourVoice"
    },
    "usage": {
        "count": 0
    },
    "request_id": "xxxx-xxxx-xxxx"
}
Important

CosyVoice returns an empty output object, while Qwen returns the voice field.

request_id string

The unique identifier of this request.

output object

The data returned by the model. CosyVoice returns an empty object. Qwen returns the name of the deleted voice.

Properties

voice string

Important

Returned by Qwen only.

The name of the deleted voice.

usage object

Usage information for this request.

Properties

count integer

Always 1.

Voice status reference

After a voice is created, it goes through a review process. The following table describes each status. This status system applies only to CosyVoice (when model is voice-enrollment). Qwen query and list responses don't include a status field.

Status

Description

DEPLOYING

Under review or processing.

OK

Review passed. The voice is ready for use.

UNDEPLOYED

Review rejected. The voice can't be used.