All Products
Search
Document Center

Alibaba Cloud Model Studio:Recording guide

Last Updated:Mar 15, 2026

High-quality source audio is essential for voice cloning. Learn environment setup, script preparation, and recording techniques to produce clean, natural-sounding recordings for Model Studio.

Quick-start checklist

For home recording in a bedroom or similar small room:

  1. Close all windows and doors to block external noise.

  2. Turn off air conditioners, fans, and other electrical devices.

  3. Draw curtains to reduce glass reflections.

  4. Cover hard surfaces with cloth to reduce reflections.

  5. Read your script and practice natural delivery in your target tone.

  6. Position the microphone 10 cm from your mouth (too close: plosive distortion; too far: weak signal).

  7. Start recording.

Recording devices

Use smartphones, digital voice recorders, or professional audio recorders.

Set up your recording environment

Focus on three key areas: room selection, noise control, and reverberation reduction.

Choose the right room

Requirement Details
Room size Use small enclosed spaces (under 10 m²).
Acoustic treatment Choose rooms with sound-absorbing materials like acoustic foam, carpets, or curtains.
Spaces to avoid Avoid auditoriums, conference rooms, and classrooms - these large open spaces cause reverberation that degrades clone quality.

Control noise

Noise source Mitigation
Outdoor noise Close windows and doors; avoid recording near traffic or construction.
Indoor noise Turn off air conditioners, fans, and fluorescent lamp ballasts before recording.
Tip: Record a few seconds of ambient sound on your smartphone, then play it back at high volume to identify hidden noise sources.

Reduce reverberation

Reverberation blurs speech and reduces clone fidelity.

  • Draw curtains, open closet doors, or cover desks and cabinets with cloth to reduce reflections.

  • Add irregular objects like bookshelves or upholstered furniture to scatter sound waves.

Prepare your script

Guideline Details
Content Align content with your target use case.
Sentence structure Use complete sentences and avoid short phrases like "Hello" or "Yes" that lack vocal information for cloning.
Continuity Maintain semantic continuity with infrequent pauses and aim for 3+ seconds of uninterrupted speech per segment.
Emotional expression Add appropriate emotion (warmth, friendliness, or seriousness). Monotone delivery reduces naturalness.
Content restrictions Avoid sensitive content (politics, pornography, violence) - recordings with such content will fail cloning.