This topic describes how to use recorded audio files to generate custom voices by using CosyVoice of Alibaba Cloud Model Studio and apply the voices in AI real-time interaction.
Prerequisites
Alibaba Cloud Model Studio has been activated. To activate the service, go to the Alibaba Cloud Model Studio console.
Alibaba Cloud Model Studio SDK is installed. For more information, see Install SDKs.
An API key is created. For more information, see Obtain an API key.
Prepare an audio file
When you prepare audio files, take note of the following items:
Number of channels: mono or binaural
Sampling rate: greater than or equal to 16,000 Hz
Format: WAV (16bit), MP3, and M4A
File size: no larger than 10 MB
After you record an audio file, upload it to a public URL. We recommend that you upload it to Object Storage Service (OSS). For more information, see Simple upload.
You are responsible for the ownership and legal use of the voice. Read the Terms of Service.
Voice cloning
The following sample code shows how to clone a voice:
import os
import dashscope
from dashscope.audio.tts_v2 import VoiceEnrollmentService, SpeechSynthesizer
dashscope.api_key=os.getenv ('DASHSCOPE_API_KEY') # If you have not configured environment variables, specify the API key.
url = "https://your-audio-file-url" # Specify the actual URL.
prefix = 'prefix' # You can use a custom prefix.
target_model = "cosyvoice-v2"
# Create a voice registration instance.
service = VoiceEnrollmentService()
# Call the create_voice method to clone a voice and generate a voice ID.
voice_id = service.create_voice(target_model=target_model, prefix=prefix, url=url)
print(f"your voice id is {voice_id}")
# Generate cosyvoice-prefix-xxxxx.
After the call is complete, save the returned value of voice_id
. Your AI agent can use the voice.
Use the voice
Go to the Real-time Workflow Template page.
Click the workflow that you want to manage and click Modify in the upper-right corner.
In the Text-to-speech node, select Alibaba Cloud Model Studio as the model and set other parameters.
ApiKey: the API key that is used to call Alibaba Cloud Model Studio. The API key must be the same as the API key used for voice cloning.
ModelId: the model ID in Alibaba Cloud Model Studio.
cosyvoice-v2
is used.Voice: the ID of the voice. Use the voice ID returned during voice cloning.
Click Save.