Qwen speech synthesis delivers human-like voices with natural intonation and expressive delivery. It supports multiple languages and dialects, including Chinese dialects, and enables multilingual output using a single voice. The system automatically adapts tone and handles complex text smoothly.
Core features
Supports streaming output, enabling real-time audio synthesis and playback.
Supports multiple languages and dialects, including Chinese dialects.
Provides a wide range of voices to suit diverse use cases.
Offers two voice customization methods: voice cloning and voice design.
Supports instruction control, which lets you adjust speech expressiveness using natural language instructions.
Applicability
Available models:
International
In international deployment mode, the endpoint and data storage are in the Singapore region. Model inference computing resources are dynamically scheduled worldwide, excluding the Chinese mainland.
When you call the following models, select an API key for the Singapore region:
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash (stable version, currently equivalent to qwen3-tts-instruct-flash-2026-01-26), qwen3-tts-instruct-flash-2026-01-26 (latest snapshot version)
Qwen3-TTS-VD: qwen3-tts-vd-2026-01-26 (latest snapshot version)
Qwen3-TTS-VC: qwen3-tts-vc-2026-01-22 (latest snapshot version)
Qwen3-TTS-Flash: qwen3-tts-flash (stable version, currently equivalent to qwen3-tts-flash-2025-11-27), qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
Chinese mainland
In Chinese mainland deployment mode, the endpoint and data storage are in the Beijing region. Model inference computing resources are limited to the Chinese mainland.
When you call the following models, select an API key for the Beijing region:
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash (stable version, currently equivalent to qwen3-tts-instruct-flash-2026-01-26), qwen3-tts-instruct-flash-2026-01-26 (latest snapshot version)
Qwen3-TTS-VD: qwen3-tts-vd-2026-01-26 (latest snapshot version)
Qwen3-TTS-VC: qwen3-tts-vc-2026-01-22 (latest snapshot version)
Qwen3-TTS-Flash: qwen3-tts-flash (stable version, currently equivalent to qwen3-tts-flash-2025-11-27), qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
Qwen-TTS: qwen-tts (stable version, currently equivalent to qwen-tts-2025-04-10), qwen-tts-latest (latest version, currently equivalent to qwen-tts-2025-05-22), qwen-tts-2025-05-22 (snapshot version), qwen-tts-2025-04-10 (snapshot version)
See Model list.
Choose a model
Scenario | Recommended model | Reason |
Custom voice for branding, exclusive voices, or extended system voices (from a text description) | qwen3-tts-vd-2026-01-26 | Supports voice design. Create a custom voice from a text description without an audio sample. Ideal for designing a brand voice from scratch. |
Custom voice for branding, exclusive voices, or extended system voices (from an audio sample) | qwen3-tts-vc-2026-01-22 | Supports voice cloning. Clone a voice from an audio sample to create a human-like brand voiceprint with high fidelity and consistency. |
Emotional content production (audiobooks, radio dramas, game/animation dubbing) | qwen3-tts-instruct-flash | Supports instruction control. Use natural language descriptions to control pitch, speed, emotion, and character personality. Ideal for scenarios requiring rich expression and character creation. |
Mobile navigation or notification announcements | qwen3-tts-flash | Simple and transparent per-character billing. Ideal for high-frequency calls with short text. |
Online education courseware dubbing | qwen3-tts-flash | Supports multiple languages and dialects to meet regional teaching needs. |
Batch audiobook production | qwen3-tts-flash | Cost-effective. A wide selection of voices enriches content expression. |
Getting started
Preparations
Configure an API key and export it as an environment variable.
If you call the service using the DashScope SDK, install the latest SDK version. The DashScope Java SDK must be version 2.21.9 or later. The DashScope Python SDK must be version 1.24.6 or later.
NoteIn the DashScope Python SDK, the
SpeechSynthesizerinterface has been unified intoMultiModalConversation. Replace the interface name. All other parameters remain compatible.
Synthesize speech with a system voice
These examples synthesize speech with a system voice.
Non-streaming output
Retrieve the synthesized speech from the returned url. The URL remains valid for 24 hours.
Python
import os
import dashscope
# This is the URL for the Singapore region. If you use a model in the China (Beijing) region, change the URL to: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
text = "Today is a wonderful day to build something people love!"
# How to use the SpeechSynthesizer interface: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...)
response = dashscope.MultiModalConversation.call(
# To use the instruction control feature, change the model to qwen3-tts-instruct-flash.
model="qwen3-tts-flash",
# The API keys for the Singapore and China (Beijing) regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
text=text,
voice="Cherry",
language_type="English", # Match the language type to the text language for correct pronunciation and natural intonation.
# To use the instruction control feature, uncomment the following lines and change the model to qwen3-tts-instruct-flash.
# instructions='Speak at a relatively fast speed with a noticeable rising intonation, suitable for introducing fashion products.',
# optimize_instructions=True,
stream=False
)
print(response)Java
You need to import the Gson dependency. If you use Maven or Gradle, add the dependency as follows:
Maven
Add the following content to pom.xml:
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.13.1</version>
</dependency>Gradle
Add the following content to build.gradle:
// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")import com.alibaba.dashscope.aigc.multimodalconversation.AudioParameters;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.utils.Constants;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.net.URL;
public class Main {
// To use the instruction control feature, change MODEL to qwen3-tts-instruct-flash.
private static final String MODEL = "qwen3-tts-flash";
public static void call() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore and China (Beijing) regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(MODEL)
.text("Today is a wonderful day to build something people love!")
.voice(AudioParameters.Voice.CHERRY)
.languageType("English") // Match the language type to the text language for correct pronunciation and natural intonation.
// To use the instruction control feature, uncomment the following lines and change the model to qwen3-tts-instruct-flash.
// .parameter("instructions","Speak at a relatively fast speed with a noticeable rising intonation, suitable for introducing fashion products.")
// .parameter("optimize_instructions",true)
.build();
MultiModalConversationResult result = conv.call(param);
String audioUrl = result.getOutput().getAudio().getUrl();
System.out.print(audioUrl);
// Download the audio file to your local machine.
try (InputStream in = new URL(audioUrl).openStream();
FileOutputStream out = new FileOutputStream("downloaded_audio.wav")) {
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = in.read(buffer)) != -1) {
out.write(buffer, 0, bytesRead);
}
System.out.println("\nAudio file downloaded to: downloaded_audio.wav");
} catch (Exception e) {
System.out.println("\nError downloading audio file: " + e.getMessage());
}
}
public static void main(String[] args) {
// This is the URL for the Singapore region. If you use a model in the China (Beijing) region, change the URL to: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
try {
call();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}cURL
# ======= Important =======
# This is the URL for the Singapore region. If you use a model in the China (Beijing) region, change the URL to: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# The API keys for the Singapore and China (Beijing) regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-tts-flash",
"input": {
"text": "Today is a wonderful day to build something people love!",
"voice": "Cherry",
"language_type": "English"
}
}'Streaming output
Audio data is streamed in Base64 format. The final data packet contains the URL of the complete audio file.
Python
# coding=utf-8
#
# Installation instructions for pyaudio:
# APPLE Mac OS X
# brew install portaudio
# pip install pyaudio
# Debian/Ubuntu
# sudo apt-get install python-pyaudio python3-pyaudio
# or
# pip install pyaudio
# CentOS
# sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
# python -m pip install pyaudio
import os
import dashscope
import pyaudio
import time
import base64
import numpy as np
# This is the URL for the Singapore region. If you use a model in the China (Beijing) region, change the URL to: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
p = pyaudio.PyAudio()
# Create an audio stream.
stream = p.open(format=pyaudio.paInt16,
channels=1,
rate=24000,
output=True)
text = "Today is a wonderful day to build something people love!"
response = dashscope.MultiModalConversation.call(
# The API keys for the Singapore and China (Beijing) regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured the environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
# To use the instruction control feature, change the model to qwen3-tts-instruct-flash.
model="qwen3-tts-flash",
text=text,
voice="Cherry",
language_type="English", # Match the language type to the text language for correct pronunciation and natural intonation.
# To use the instruction control feature, uncomment the following lines and change the model to qwen3-tts-instruct-flash.
# instructions='Speak at a relatively fast speed with a noticeable rising intonation, suitable for introducing fashion products.',
# optimize_instructions=True,
stream=True
)
for chunk in response:
if chunk.output is not None:
audio = chunk.output.audio
if audio.data is not None:
wav_bytes = base64.b64decode(audio.data)
audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
# Play the audio data directly.
stream.write(audio_np.tobytes())
if chunk.output.finish_reason == "stop":
print("finish at: {} ", chunk.output.audio.expires_at)
time.sleep(0.8)
# Clean up resources.
stream.stop_stream()
stream.close()
p.terminate()
Java
You need to import the Gson dependency. If you use Maven or Gradle, add the dependency as follows:
Maven
Add the following content to pom.xml:
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.13.1</version>
</dependency>Gradle
Add the following content to build.gradle:
// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")// Install the latest version of the DashScope SDK.
import com.alibaba.dashscope.aigc.multimodalconversation.AudioParameters;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.utils.Constants;
import io.reactivex.Flowable;
import javax.sound.sampled.*;
import java.util.Base64;
public class Main {
// To use the instruction control feature, change MODEL to qwen3-tts-instruct-flash.
private static final String MODEL = "qwen3-tts-flash";
public static void streamCall() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API keys for the Singapore and China (Beijing) regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured the environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(MODEL)
.text("Today is a wonderful day to build something people love!")
.voice(AudioParameters.Voice.CHERRY)
.languageType("English") // Match the language type to the text language for correct pronunciation and natural intonation.
// To use the instruction control feature, uncomment the following lines and change the model to qwen3-tts-instruct-flash.
// .parameter("instructions","Speak at a relatively fast speed with a noticeable rising intonation, suitable for introducing fashion products.")
// .parameter("optimize_instructions",true)
.build();
Flowable<MultiModalConversationResult> result = conv.streamCall(param);
result.blockingForEach(r -> {
try {
// 1. Get the Base64-encoded audio data.
String base64Data = r.getOutput().getAudio().getData();
byte[] audioBytes = Base64.getDecoder().decode(base64Data);
// 2. Configure the audio format. Adjust the format based on the API response.
AudioFormat format = new AudioFormat(
AudioFormat.Encoding.PCM_SIGNED,
24000, // Sample rate. This must be consistent with the format returned by the API.
16, // Audio bit depth
1, // The number of sound channels
2, // Frame size (bit depth/8)
24000, // Data transmission rate. This must be consistent with the sample rate.
false // Specifies whether the data is compressed.
);
// 3. Play the audio data in real time.
DataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
try (SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info)) {
if (line != null) {
line.open(format);
line.start();
line.write(audioBytes, 0, audioBytes.length);
line.drain();
}
}
} catch (LineUnavailableException e) {
e.printStackTrace();
}
});
}
public static void main(String[] args) {
// This is the URL for the Singapore region. If you use a model in the China (Beijing) region, change the URL to: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
try {
streamCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
cURL
# ======= Important =======
# This is the URL for the Singapore region. If you use a model in the China (Beijing) region, change the URL to: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# The API keys for the Singapore and China (Beijing) regions are different. Get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
"model": "qwen3-tts-flash",
"input": {
"text": "Today is a wonderful day to build something people love!",
"voice": "Cherry",
"language_type": "Chinese"
}
}'Synthesize speech with a cloned voice
Voice cloning does not provide preview audio. To evaluate the result, apply the cloned voice to speech synthesis.
This example uses a cloned voice for speech synthesis, producing output closely matching the original voice. It builds on the non-streaming output sample for the DashScope SDK, with the voice parameter set to the cloned voice.
Key principle: The model used for voice cloning (
target_model) must match the model used for speech synthesis (model). Otherwise, synthesis fails.The example uses the local audio file
voice.mp3for voice cloning. When you run the code, replace the file path with your own.
Python
import os
import requests
import base64
import pathlib
import dashscope
# ======= Constant configuration =======
DEFAULT_TARGET_MODEL = "qwen3-tts-vc-2026-01-22" # Use the same model for voice cloning and speech synthesis
DEFAULT_PREFERRED_NAME = "guanyu"
DEFAULT_AUDIO_MIME_TYPE = "audio/mpeg"
VOICE_FILE_PATH = "voice.mp3" # Relative path to the local audio file used for voice cloning
def create_voice(file_path: str,
target_model: str = DEFAULT_TARGET_MODEL,
preferred_name: str = DEFAULT_PREFERRED_NAME,
audio_mime_type: str = DEFAULT_AUDIO_MIME_TYPE) -> str:
"""
Create a voice and return the voice parameter.
"""
# If you haven't configured an environment variable, replace the following line with: api_key = "sk-xxx"
api_key = os.getenv("DASHSCOPE_API_KEY")
file_path_obj = pathlib.Path(file_path)
if not file_path_obj.exists():
raise FileNotFoundError(f"Audio file does not exist: {file_path}")
base64_str = base64.b64encode(file_path_obj.read_bytes()).decode()
data_uri = f"data:{audio_mime_type};base64,{base64_str}"
# Singapore region endpoint. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"
payload = {
"model": "qwen-voice-enrollment", # Do not change this value
"input": {
"action": "create",
"target_model": target_model,
"preferred_name": preferred_name,
"audio": {"data": data_uri}
}
}
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
}
resp = requests.post(url, json=payload, headers=headers)
if resp.status_code != 200:
raise RuntimeError(f"Failed to create voice: {resp.status_code}, {resp.text}")
try:
return resp.json()["output"]["voice"]
except (KeyError, ValueError) as e:
raise RuntimeError(f"Failed to parse voice response: {e}")
if __name__ == '__main__':
# Singapore region endpoint. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
text = "How's the weather today?"
# SpeechSynthesizer interface usage: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...)
response = dashscope.MultiModalConversation.call(
model=DEFAULT_TARGET_MODEL,
# If you haven't configured an environment variable, replace the following line with: api_key = "sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
text=text,
voice=create_voice(VOICE_FILE_PATH), # Replace the voice parameter with the custom voice generated by cloning
stream=False
)
print(response)Java
Add the Gson dependency to your project.
Maven
Add the following content to pom.xml:
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.13.1</version>
</dependency>Gradle
Add the following content to build.gradle:
// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")When you use speech synthesis with a custom voice generated by voice cloning, configure the voice as follows:
MultiModalConversationParam param = MultiModalConversationParam.builder()
.parameter("voice", "your_voice") // Replace the voice parameter with the custom voice generated by cloning
.build();import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.utils.Constants;
import com.google.gson.Gson;
import com.google.gson.JsonObject;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
import java.util.Base64;
public class Main {
// ===== Constant definitions =====
// Use the same model for voice cloning and speech synthesis
private static final String TARGET_MODEL = "qwen3-tts-vc-2026-01-22";
private static final String PREFERRED_NAME = "guanyu";
// Relative path to the local audio file used for voice cloning
private static final String AUDIO_FILE = "voice.mp3";
private static final String AUDIO_MIME_TYPE = "audio/mpeg";
// Generate a data URI
public static String toDataUrl(String filePath) throws IOException {
byte[] bytes = Files.readAllBytes(Paths.get(filePath));
String encoded = Base64.getEncoder().encodeToString(bytes);
return "data:" + AUDIO_MIME_TYPE + ";base64," + encoded;
}
// Call the API to create a voice
public static String createVoice() throws Exception {
// API keys differ between the Singapore and Beijing regions. To obtain an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you haven't configured an environment variable, replace the following line with: String apiKey = "sk-xxx"
String apiKey = System.getenv("DASHSCOPE_API_KEY");
String jsonPayload =
"{"
+ "\"model\": \"qwen-voice-enrollment\"," // Do not change this value
+ "\"input\": {"
+ "\"action\": \"create\","
+ "\"target_model\": \"" + TARGET_MODEL + "\","
+ "\"preferred_name\": \"" + PREFERRED_NAME + "\","
+ "\"audio\": {"
+ "\"data\": \"" + toDataUrl(AUDIO_FILE) + "\""
+ "}"
+ "}"
+ "}";
// Singapore region endpoint. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization
String url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization";
HttpURLConnection con = (HttpURLConnection) new URL(url).openConnection();
con.setRequestMethod("POST");
con.setRequestProperty("Authorization", "Bearer " + apiKey);
con.setRequestProperty("Content-Type", "application/json");
con.setDoOutput(true);
try (OutputStream os = con.getOutputStream()) {
os.write(jsonPayload.getBytes(StandardCharsets.UTF_8));
}
int status = con.getResponseCode();
System.out.println("HTTP status code: " + status);
try (BufferedReader br = new BufferedReader(
new InputStreamReader(status >= 200 && status < 300 ? con.getInputStream() : con.getErrorStream(),
StandardCharsets.UTF_8))) {
StringBuilder response = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
response.append(line);
}
System.out.println("Response content: " + response);
if (status == 200) {
JsonObject jsonObj = new Gson().fromJson(response.toString(), JsonObject.class);
return jsonObj.getAsJsonObject("output").get("voice").getAsString();
}
throw new IOException("Failed to create voice: " + status + " - " + response);
}
}
public static void call() throws Exception {
MultiModalConversation conv = new MultiModalConversation();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys differ between the Singapore and Beijing regions. To obtain an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you haven't configured an environment variable, replace the following line with: .apikey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(TARGET_MODEL)
.text("How's the weather today?")
.parameter("voice", createVoice()) // Replace the voice parameter with the custom voice generated by cloning
.build();
MultiModalConversationResult result = conv.call(param);
String audioUrl = result.getOutput().getAudio().getUrl();
System.out.print(audioUrl);
// Download the audio file locally
try (InputStream in = new URL(audioUrl).openStream();
FileOutputStream out = new FileOutputStream("downloaded_audio.wav")) {
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = in.read(buffer)) != -1) {
out.write(buffer, 0, bytesRead);
}
System.out.println("\nAudio file downloaded locally: downloaded_audio.wav");
} catch (Exception e) {
System.out.println("\nError downloading audio file: " + e.getMessage());
}
}
public static void main(String[] args) {
try {
// Singapore region endpoint. For the Beijing region, use: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
call();
} catch (Exception e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}Using voice design timbre for speech synthesis
The voice design feature returns preview audio. Listen to the preview to confirm it meets your expectations before using it for synthesis. This reduces call costs.
Create a custom voice and preview it. If satisfied, proceed. Otherwise, recreate.
Python
import requests import base64 import os def create_voice_and_play(): # API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx" api_key = os.getenv("DASHSCOPE_API_KEY") if not api_key: print("Error: DASHSCOPE_API_KEY environment variable not found. Please set the API key first.") return None, None, None # Prepare request data headers = { "Authorization": f"Bearer {api_key}", "Content-Type": "application/json" } data = { "model": "qwen-voice-design", "input": { "action": "create", "target_model": "qwen3-tts-vd-2026-01-26", "voice_prompt": "A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.", "preview_text": "Dear listeners, hello everyone. Welcome to the evening news.", "preferred_name": "announcer", "language": "en" }, "parameters": { "sample_rate": 24000, "response_format": "wav" } } # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization url = "https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization" try: # Send the request response = requests.post( url, headers=headers, json=data, timeout=60 # Add a timeout setting ) if response.status_code == 200: result = response.json() # Get the voice name voice_name = result["output"]["voice"] print(f"Voice name: {voice_name}") # Get the preview audio data base64_audio = result["output"]["preview_audio"]["data"] # Decode the Base64 audio data audio_bytes = base64.b64decode(base64_audio) # Save the audio file locally filename = f"{voice_name}_preview.wav" # Write the audio data to a local file with open(filename, 'wb') as f: f.write(audio_bytes) print(f"Audio saved to local file: {filename}") print(f"File path: {os.path.abspath(filename)}") return voice_name, audio_bytes, filename else: print(f"Request failed with status code: {response.status_code}") print(f"Response content: {response.text}") return None, None, None except requests.exceptions.RequestException as e: print(f"A network request error occurred: {e}") return None, None, None except KeyError as e: print(f"Response data format error, missing required field: {e}") print(f"Response content: {response.text if 'response' in locals() else 'No response'}") return None, None, None except Exception as e: print(f"An unknown error occurred: {e}") return None, None, None if __name__ == "__main__": print("Starting to create voice...") voice_name, audio_data, saved_filename = create_voice_and_play() if voice_name: print(f"\nSuccessfully created voice '{voice_name}'") print(f"Audio file saved as: '{saved_filename}'") print(f"File size: {os.path.getsize(saved_filename)} bytes") else: print("\nVoice creation failed")Java
Add the Gson dependency to your project:
Maven
Add the following to your
pom.xml:<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson --> <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.13.1</version> </dependency>Gradle
Add the following to your
build.gradle:// https://mvnrepository.com/artifact/com.google.code.gson/gson implementation("com.google.code.gson:gson:2.13.1")ImportantTo use a custom voice generated by voice design for speech synthesis, configure the voice as follows:
MultiModalConversationParam param = MultiModalConversationParam.builder() .parameter("voice", "your_voice") // Replace the voice parameter with the custom voice generated by voice design .build();import com.google.gson.JsonObject; import com.google.gson.JsonParser; import java.io.*; import java.net.HttpURLConnection; import java.net.URL; import java.util.Base64; public class Main { public static void main(String[] args) { Main example = new Main(); example.createVoice(); } public void createVoice() { // API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If the environment variable is not set, replace the following line with your Model Studio API key: String apiKey = "sk-xxx" String apiKey = System.getenv("DASHSCOPE_API_KEY"); // Create the JSON request body string String jsonBody = "{\n" + " \"model\": \"qwen-voice-design\",\n" + " \"input\": {\n" + " \"action\": \"create\",\n" + " \"target_model\": \"qwen3-tts-vd-2026-01-26\",\n" + " \"voice_prompt\": \"A composed middle-aged male announcer with a deep, rich and magnetic voice, a steady speaking speed and clear articulation, is suitable for news broadcasting or documentary commentary.\",\n" + " \"preview_text\": \"Dear listeners, hello everyone. Welcome to the evening news.\",\n" + " \"preferred_name\": \"announcer\",\n" + " \"language\": \"en\"\n" + " },\n" + " \"parameters\": {\n" + " \"sample_rate\": 24000,\n" + " \"response_format\": \"wav\"\n" + " }\n" + "}"; HttpURLConnection connection = null; try { // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/audio/tts/customization URL url = new URL("https://dashscope-intl.aliyuncs.com/api/v1/services/audio/tts/customization"); connection = (HttpURLConnection) url.openConnection(); // Set the request method and headers connection.setRequestMethod("POST"); connection.setRequestProperty("Authorization", "Bearer " + apiKey); connection.setRequestProperty("Content-Type", "application/json"); connection.setDoOutput(true); connection.setDoInput(true); // Send the request body try (OutputStream os = connection.getOutputStream()) { byte[] input = jsonBody.getBytes("UTF-8"); os.write(input, 0, input.length); os.flush(); } // Get the response int responseCode = connection.getResponseCode(); if (responseCode == HttpURLConnection.HTTP_OK) { // Read the response content StringBuilder response = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getInputStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { response.append(responseLine.trim()); } } // Parse the JSON response JsonObject jsonResponse = JsonParser.parseString(response.toString()).getAsJsonObject(); JsonObject outputObj = jsonResponse.getAsJsonObject("output"); JsonObject previewAudioObj = outputObj.getAsJsonObject("preview_audio"); // Get the voice name String voiceName = outputObj.get("voice").getAsString(); System.out.println("Voice name: " + voiceName); // Get the Base64-encoded audio data String base64Audio = previewAudioObj.get("data").getAsString(); // Decode the Base64 audio data byte[] audioBytes = Base64.getDecoder().decode(base64Audio); // Save the audio to a local file String filename = voiceName + "_preview.wav"; saveAudioToFile(audioBytes, filename); System.out.println("Audio saved to local file: " + filename); } else { // Read the error response StringBuilder errorResponse = new StringBuilder(); try (BufferedReader br = new BufferedReader( new InputStreamReader(connection.getErrorStream(), "UTF-8"))) { String responseLine; while ((responseLine = br.readLine()) != null) { errorResponse.append(responseLine.trim()); } } System.out.println("Request failed with status code: " + responseCode); System.out.println("Error response: " + errorResponse.toString()); } } catch (Exception e) { System.err.println("An error occurred during the request: " + e.getMessage()); e.printStackTrace(); } finally { if (connection != null) { connection.disconnect(); } } } private void saveAudioToFile(byte[] audioBytes, String filename) { try { File file = new File(filename); try (FileOutputStream fos = new FileOutputStream(file)) { fos.write(audioBytes); } System.out.println("Audio saved to: " + file.getAbsolutePath()); } catch (IOException e) { System.err.println("An error occurred while saving the audio file: " + e.getMessage()); e.printStackTrace(); } } }Use the custom voice created in the previous step for non-streaming speech synthesis.
This example follows the "non-streaming output" sample code for system voices in the DashScope SDK. Replace the
voiceparameter with the custom voice generated by voice design. For unidirectional streaming synthesis, see Speech synthesis - Qwen.Key principle: The model used for voice design (
target_model) must match the model used for subsequent speech synthesis (model). Otherwise, synthesis fails.Python
import os import dashscope if __name__ == '__main__': # The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1 dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1' text = "What's the weather like today?" # How to use SpeechSynthesizer: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...) response = dashscope.MultiModalConversation.call( model="qwen3-tts-vd-2026-01-26", # API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key # If the environment variable is not set, replace the following line with your Model Studio API key: api_key = "sk-xxx" api_key=os.getenv("DASHSCOPE_API_KEY"), text=text, voice="myvoice", # Replace the voice parameter with the custom voice generated by voice design stream=False ) print(response)Java
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation; import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam; import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult; import com.alibaba.dashscope.exception.ApiException; import com.alibaba.dashscope.exception.NoApiKeyException; import com.alibaba.dashscope.exception.UploadFileException; import com.alibaba.dashscope.utils.Constants; import java.io.FileOutputStream; import java.io.InputStream; import java.net.URL; public class Main { private static final String MODEL = "qwen3-tts-vd-2026-01-26"; public static void call() throws ApiException, NoApiKeyException, UploadFileException { MultiModalConversation conv = new MultiModalConversation(); MultiModalConversationParam param = MultiModalConversationParam.builder() // API keys differ between Singapore and Beijing regions. Get an API key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key // If the environment variable is not set, replace the following line with your Model Studio API key: .apiKey("sk-xxx") .apiKey(System.getenv("DASHSCOPE_API_KEY")) .model(MODEL) .text("Today is a wonderful day to build something people love!") .parameter("voice", "myvoice") // Replace the voice parameter with the custom voice generated by voice design .build(); MultiModalConversationResult result = conv.call(param); String audioUrl = result.getOutput().getAudio().getUrl(); System.out.print(audioUrl); // Download the audio file locally try (InputStream in = new URL(audioUrl).openStream(); FileOutputStream out = new FileOutputStream("downloaded_audio.wav")) { byte[] buffer = new byte[1024]; int bytesRead; while ((bytesRead = in.read(buffer)) != -1) { out.write(buffer, 0, bytesRead); } System.out.println("\nAudio file downloaded locally: downloaded_audio.wav"); } catch (Exception e) { System.out.println("\nError downloading audio file: " + e.getMessage()); } } public static void main(String[] args) { try { // The following is the URL for the Singapore region. If you use a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1 Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1"; call(); } catch (ApiException | NoApiKeyException | UploadFileException e) { System.out.println(e.getMessage()); } System.exit(0); } }
Instruction control
Instruction control adjusts speech expressiveness through natural language descriptions. Adjust pitch, speed, emotion, and timbre without manually tuning audio parameters.
Supported models: Qwen3-TTS-Instruct-Flash series only.
Usage: Set the instructions parameter, for example, "Speak at a relatively fast speed with a noticeable rising intonation, suitable for introducing fashion products."
Supported languages: Instruction text supports only Chinese and English.
Length limit: 1,600 tokens maximum.
Scenarios:
Audiobook and radio drama dubbing
Advertising and promotional video dubbing
Game character and animation dubbing
Emotional intelligent voice assistants
Documentary and news broadcasting
How to write high-quality voice descriptions:
Core principles:
Be specific, not vague: Use words that describe concrete voice characteristics, such as "deep," "crisp," or "fast-paced." Avoid subjective and uninformative terms such as "nice" or "normal."
Be multi-dimensional, not single-dimensional: A good description combines multiple dimensions, such as pitch, speed, and emotion. A single-dimensional description, such as only "high-pitched," is too broad to generate a distinctive effect.
Be objective, not subjective: Focus on the physical and perceptual characteristics of the sound itself, not personal preferences. For example, use "high-pitched and energetic" instead of "my favorite sound."
Be original, not imitative: Describe voice characteristics rather than requesting imitation of a specific person, such as a celebrity or actor. Such requests involve copyright risks and are not supported.
Be concise, not redundant: Ensure every word adds meaning. Avoid repeating synonyms or meaningless intensifiers, such as "a very, very good voice."
Dimension reference: Combine multiple dimensions for richer expression.
Dimension
Description example
Pitch
High, medium, low, high-pitched, low-pitched
Speed
Fast, medium, slow, fast-paced, slow-paced
Emotion
Cheerful, calm, gentle, serious, lively, composed, soothing
Characteristics
Magnetic, crisp, hoarse, mellow, sweet, deep, powerful
Usage
News broadcast, ad voice-over, audiobook, animation character, voice assistant, documentary narration
Examples:
Standard broadcast style: Clear and precise articulation, well-rounded pronunciation.
Progressive emotional effect: Volume rapidly increases from normal conversation to a shout, with a straightforward personality and easily excited, expressive emotions.
Special emotional state: A sobbing tone causes slightly slurred and hoarse pronunciation, with noticeable tension in the crying voice.
Ad voice-over style: High-pitched, medium speed, full of energy and appeal, suitable for ad voice-overs.
Gentle and soothing style: Slow-paced, with a gentle and sweet pitch, and a soothing, warm tone, like a caring friend.
API reference
Model feature comparison
Features | Qwen3-TTS-Instruct-Flash | Qwen3-TTS-VD | Qwen3-TTS-VC | Qwen3-TTS-Flash | Qwen-TTS |
Supported languages | Varies by voice: Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese | Chinese (Mandarin), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese | Varies by voice: Chinese (Mandarin, Shanghainese, Beijing dialect, Sichuan dialect, Nanjing dialect, Shaanxi dialect, Southern Min, Tianjin dialect), Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean | Varies by voice: Chinese (Mandarin, Shanghainese, Beijing dialect, Sichuan dialect), English | |
Audio format |
| ||||
Audio sample rate | 24 kHz | ||||
Voice cloning | |||||
Voice design | |||||
SSML | |||||
LaTeX | |||||
Volume control | Adjustable via instruction control | ||||
Speech rate control | Adjustable via instruction control | ||||
Pitch control | Adjustable via instruction control | ||||
Bitrate control | |||||
Timestamp | |||||
Instruction control | |||||
Streaming input | |||||
Streaming output | |||||
Rate limiting | Requests per minute (RPM): 180 | RPM: 180 | RPM: 180 | RPM varies by model:
| RPM: 10 Tokens per minute (TPM), including input and output tokens: 100,000 |
Connection type | Java/Python SDK, WebSocket API | ||||
Pricing | International: $0.115/10,000 characters Chinese mainland: $0.115/10,000 characters | International: $0.115/10,000 characters Chinese mainland: $0.115/10,000 characters | International: $0.115/10,000 characters Chinese mainland: $0.115/10,000 characters | International: $0.1/10,000 characters Chinese mainland: $0.114682/10,000 characters | Chinese mainland:
Token conversion: 1 second of audio equals 50 tokens. Audio shorter than 1 second counts as 50 tokens. |
Supported system voices
Supported voices vary by model. Set the voice request parameter to the corresponding value in the voice parameter column of the voice list.
| Details | Supported languages | Supported models |
| Voice name: Cherry Description: A sunny, positive, friendly, and natural young woman (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Serena Description: A gentle young woman (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Ethan Description: Standard Mandarin with a slight northern accent. Sunny, warm, energetic, and vibrant (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Chelsie Description: A two-dimensional virtual girlfriend (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Momo Description: Playful and mischievous, cheering you up (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Vivian Description: Confident, cute, and slightly feisty (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Moon Description: A bold and handsome man named Yuebai (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Maia Description: A blend of intellect and gentleness (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Kai Description: A soothing audio spa for your ears (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Nofish Description: A designer who cannot pronounce retroflex sounds (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Bella Description: A little girl who drinks but never throws punches when drunk (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Jennifer Description: A premium, cinematic-quality American English female voice (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Ryan Description: Full of rhythm, bursting with dramatic flair, balancing authenticity and tension (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Katerina Description: A mature-woman voice with rich, memorable rhythm (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Aiden Description: An American English young man skilled in cooking (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Eldric Sage Description: A calm and wise elder—weathered like a pine tree, yet clear-minded as a mirror (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Mia Description: Gentle as spring water, obedient as fresh snow (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Mochi Description: A clever, quick-witted young adult—childlike innocence remains, yet wisdom shines through (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Bellona Description: A powerful, clear voice that brings characters to life—so stirring it makes your blood boil. With heroic grandeur and perfect diction, this voice captures the full spectrum of human expression. | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Vincent Description: A uniquely raspy, smoky voice—just one line evokes armies and heroic tales (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Bunny Description: A little girl overflowing with "cuteness" (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Neil Description: A flat baseline intonation with precise, clear pronunciation—the most professional news anchor (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Elias Description: Maintains academic rigor while using storytelling techniques to turn complex knowledge into digestible learning modules (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Arthur Description: A simple, earthy voice steeped in time and tobacco smoke—slowly unfolding village stories and curiosities (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Nini Description: A soft, clingy voice like sweet rice cakes—those drawn-out calls of “Big Brother” are so sweet they melt your bones (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Seren Description: A gentle, soothing voice to help you fall asleep faster. Good night, sweet dreams (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Pip Description: A playful, mischievous boy full of childlike wonder—is this your memory of Shin-chan? (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Stella Description: Normally a cloyingly sweet, dazed teenage-girl voice—but when shouting “I represent the moon to defeat you!”, she instantly radiates unwavering love and justice (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Bodega Description: A passionate Spanish man (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Sonisa Description: A cheerful, outgoing Latin American woman (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Alek Description: Cold like the Russian spirit, yet warm like wool coat lining (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Dolce Description: A laid-back Italian man (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Sohee Description: A warm, cheerful, emotionally expressive Korean unnie (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Ono Anna Description: A clever, spirited childhood friend (female) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Lenn Description: Rational at heart, rebellious in detail—a German youth who wears suits and listens to post-punk | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Emilien Description: A romantic French big brother (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Andre Description: A magnetic, natural, and steady male voice | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Radio Gol Description: Football poet Radio Gol! Today I’ll commentate on football using my name (male) | Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Shanghai - Jada Description: A fast-paced, energetic Shanghai auntie (female) | Shanghainese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Beijing - Dylan Description: A young man raised in Beijing’s hutongs (male) | Beijing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Nanjing - Li Description: A patient yoga teacher (male) | Nanjing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Shaanxi - Marcus Description: Broad face, few words, sincere heart, deep voice—the authentic Shaanxi flavor (male) | Shaanxi dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Southern Min - Roy Description: A humorous, straightforward, lively Taiwanese guy (male) | Southern Min, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Tianjin - Peter Description: Tianjin-style crosstalk, professional foil (male) | Tianjin dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Sichuan - Sunny Description: A Sichuan girl sweet enough to melt your heart (female) | Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Sichuan - Eric Description: A Sichuanese man from Chengdu who stands out in everyday life (male) | Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Cantonese - Rocky Description: A humorous, witty A Qiang providing live chat (male) | Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
| Voice name: Cantonese - Kiki Description: A sweet Hong Kong girl best friend (female) | Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean |
|
FAQ
Q: How long is the audio file URL valid?
A: The audio file URL expires after 24 hours.