All Products
Search
Document Center

Intelligent Media Services:Create a custom voice based on CosyVoice

Last Updated:May 06, 2025

This topic describes how to use recorded audio files to generate custom voices by using CosyVoice of Alibaba Cloud Model Studio and apply the voices in AI real-time interaction.

Prerequisites

Prepare an audio file

When you prepare audio files, take note of the following items:

  • Number of channels: mono or binaural

  • Sampling rate: greater than or equal to 16,000 Hz

  • Format: WAV (16bit), MP3, and M4A

  • File size: no larger than 10 MB

After you record an audio file, upload it to a public URL. We recommend that you upload it to Object Storage Service (OSS). For more information, see Simple upload.

Note

You are responsible for the ownership and legal use of the voice. Read the Terms of Service.

Voice cloning

The following sample code shows how to clone a voice:

import os
import dashscope
from dashscope.audio.tts_v2 import VoiceEnrollmentService, SpeechSynthesizer

dashscope.api_key=os.getenv ('DASHSCOPE_API_KEY') # If you have not configured environment variables, specify the API key.
url = "https://your-audio-file-url" # Specify the actual URL.
prefix = 'prefix' # You can use a custom prefix.
target_model = "cosyvoice-v2"

# Create a voice registration instance.
service = VoiceEnrollmentService()

# Call the create_voice method to clone a voice and generate a voice ID.
voice_id = service.create_voice(target_model=target_model, prefix=prefix, url=url)
print(f"your voice id is {voice_id}")
# Generate cosyvoice-prefix-xxxxx.

After the call is complete, save the returned value of voice_id. Your AI agent can use the voice.

Use the voice

  1. Go to the Real-time Workflow Template page.

  2. Click the workflow that you want to manage and click Modify in the upper-right corner.

  3. In the Text-to-speech node, select Alibaba Cloud Model Studio as the model and set other parameters.

    • ApiKey: the API key that is used to call Alibaba Cloud Model Studio. The API key must be the same as the API key used for voice cloning.

    • ModelId: the model ID in Alibaba Cloud Model Studio. cosyvoice-v2 is used.

    • Voice: the ID of the voice. Use the voice ID returned during voice cloning.

      image

  4. Click Save.