High-quality recording data is crucial for model training. This document describes how to create high-quality recordings by considering the recording environment, devices, and process.
This document applies only to the China (Beijing) region. To use the models, you must use an API key from the China (Beijing) region.
Device
You can use devices such as mobile phones, digital voice recorders, or professional audio recorders.
Environment
Environment selection
When selecting a recording environment, the main considerations are reducing noise and reverberation. We recommend recording in a small room that is less than 10 square meters, especially one equipped with sound-absorbing devices for better results. You can also modify the room with low-cost, sound-absorbing cotton. This changes the planar reflection of sound waves to diffuse reflection, which reduces reverberation and improves recording quality.
Noise control
Outdoor noise: Close doors and windows to mitigate noise.
Indoor noise: Common sources of indoor noise include air conditioners, fans (including computer fans), fluorescent light ballasts, and human voices. To identify and eliminate these noise sources, you can record the ambient sound with a mobile phone and listen to the recording at a high volume.
Reverberation control
Reverberation is the auditory effect produced when sound reflects, refracts, diffuses, and gradually attenuates in a space. When sound waves reflect off smooth surfaces such as walls and glass, the sound can become muddy.
When you record, we recommend that you do not choose an empty room. Instead, use a location with sound-absorbing facilities or an environment with an irregular layout to reduce reverberation. Office areas and conference rooms typically have high reverberation and are not recommended as recording environments.
Instructions
A typical bedroom is a common and ideal recording environment. When you record, consider the following:
Maintain a distance of about 10 cm from the mobile phone to avoid plosives and air current problems that can result from being too close or too far away.
Close doors and windows to reduce outdoor noise.
Turn off the air conditioner or fan to reduce indoor noise interference.
Draw the curtains to reduce sound reflection from the glass.
Open cabinet doors and use items such as clothes or bed sheets to cover cabinet and desk surfaces. This reduces sound reflection from smooth surfaces and improves recording quality.
Scripts
In the script, avoid short sentences with only a few words. When you read, maintain fluency and avoid frequent or unnecessary pauses lasting 5 seconds or more. Long pauses can negatively affect the cloning and may cause it to fail.
We recommend that you familiarize yourself with the script before recording to determine the persona and performance style. Read with emotion and avoid a mechanical delivery to ensure the cloning meets your expectations.
There are no special restrictions on the script content. You can use content that is similar to the content you plan to synthesize.
If the scenario involves a mix of Chinese and English, you only need to record the part that you can read. After cloning, the model can automatically speak in both Chinese and English.
Do not read scripts that contain sensitive words. This will cause the cloning to fail.