High-quality source audio is essential for voice cloning. Learn environment setup, script preparation, and recording techniques to produce clean, natural-sounding recordings for Model Studio.
Quick-start checklist
For home recording in a bedroom or similar small room:
-
Close all windows and doors to block external noise.
-
Turn off air conditioners, fans, and other electrical devices.
-
Draw curtains to reduce glass reflections.
-
Cover hard surfaces with cloth to reduce reflections.
-
Read your script and practice natural delivery in your target tone.
-
Position the microphone 10 cm from your mouth (too close: plosive distortion; too far: weak signal).
-
Start recording.
Recording devices
Use smartphones, digital voice recorders, or professional audio recorders.
Set up your recording environment
Focus on three key areas: room selection, noise control, and reverberation reduction.
Choose the right room
| Requirement | Details |
|---|---|
| Room size | Use small enclosed spaces (under 10 m²). |
| Acoustic treatment | Choose rooms with sound-absorbing materials like acoustic foam, carpets, or curtains. |
| Spaces to avoid | Avoid auditoriums, conference rooms, and classrooms - these large open spaces cause reverberation that degrades clone quality. |
Control noise
| Noise source | Mitigation |
|---|---|
| Outdoor noise | Close windows and doors; avoid recording near traffic or construction. |
| Indoor noise | Turn off air conditioners, fans, and fluorescent lamp ballasts before recording. |
Tip: Record a few seconds of ambient sound on your smartphone, then play it back at high volume to identify hidden noise sources.
Reduce reverberation
Reverberation blurs speech and reduces clone fidelity.
-
Draw curtains, open closet doors, or cover desks and cabinets with cloth to reduce reflections.
-
Add irregular objects like bookshelves or upholstered furniture to scatter sound waves.
Prepare your script
| Guideline | Details |
|---|---|
| Content | Align content with your target use case. |
| Sentence structure | Use complete sentences and avoid short phrases like "Hello" or "Yes" that lack vocal information for cloning. |
| Continuity | Maintain semantic continuity with infrequent pauses and aim for 3+ seconds of uninterrupted speech per segment. |
| Emotional expression | Add appropriate emotion (warmth, friendliness, or seriousness). Monotone delivery reduces naturalness. |
| Content restrictions | Avoid sensitive content (politics, pornography, violence) - recordings with such content will fail cloning. |