Non-real-time speech synthesis converts text to speech (TTS) through an HTTP API, making it suitable for latency-tolerant scenarios such as audiobook production, e-learning narration, and content production. The service supports the Qwen-TTS families, with a wide selection of voices, multilingual support, voice cloning, and voice design.
Overview
Convert text to speech files through an HTTP API. This approach suits latency-tolerant scenarios such as audiobook production, e-learning narration, and batch content production.
Submit complete text to the HTTP API to receive audio output. Streaming output (synthesize while playing) is also supported.
Supports multiple languages, including Chinese dialects.
Supports Voice cloning and Voice Design for custom voice creation.
Supports Instruction control, which lets you control speech expressiveness through natural-language instructions.
For real-time, low-latency speech synthesis, see Real-time speech synthesis(WebSocket API). To choose a model, see Speech synthesis.
Prerequisites
To call the API through the DashScope SDK, install the latest SDK version.
Quick start
The following examples demonstrate how to synthesize speech with each model family. For more language examples and detailed parameter descriptions, see the API reference for each model.
Qwen-TTS
The following examples show how to synthesize speech with a built-in voice.
Non-streaming output
In non-streaming mode, the response includes a url field pointing to the synthesized audio file. The URL expires after 24 hours.
Python
import os
import dashscope
# The following is the Singapore region URL. To use models in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
text = "Today is a wonderful day to build something people love!"
# SpeechSynthesizer usage: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...)
response = dashscope.MultiModalConversation.call(
# To use the instruction control feature, replace model with qwen3-tts-instruct-flash
model="qwen3-tts-flash",
# The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured an environment variable, replace the following line with your Model Studio API Key: api_key = "sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
text=text,
voice="Cherry",
language_type="English", # It is recommended to match the language of the text to ensure correct pronunciation and natural intonation.
# To use the instruction control feature, uncomment the following lines and replace model with qwen3-tts-instruct-flash
# instructions='Speak at a relatively fast speed with a noticeable rising intonation, suitable for introducing fashion products.',
# optimize_instructions=True,
stream=False
)
print(response)Java
Import the Gson dependency. If you use Maven or Gradle, add the dependency as follows:
Maven
Add the following to pom.xml:
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.13.1</version>
</dependency>Gradle
Add the following to build.gradle:
// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")import com.alibaba.dashscope.aigc.multimodalconversation.AudioParameters;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.utils.Constants;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.net.URL;
public class Main {
// To use the instruction control feature, replace MODEL with qwen3-tts-instruct-flash
private static final String MODEL = "qwen3-tts-flash";
public static void call() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you have not configured an environment variable, replace the following line with your Model Studio API Key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(MODEL)
.text("Today is a wonderful day to build something people love!")
.voice(AudioParameters.Voice.CHERRY)
.languageType("English") // It is recommended to match the language of the text to ensure correct pronunciation and natural intonation.
// To use the instruction control feature, uncomment the following lines and replace model with qwen3-tts-instruct-flash
// .parameter("instructions","Speak at a relatively fast speed with a noticeable rising intonation, suitable for introducing fashion products.")
// .parameter("optimize_instructions",true)
.build();
MultiModalConversationResult result = conv.call(param);
String audioUrl = result.getOutput().getAudio().getUrl();
System.out.print(audioUrl);
// Download the audio file to local storage
try (InputStream in = new URL(audioUrl).openStream();
FileOutputStream out = new FileOutputStream("downloaded_audio.wav")) {
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = in.read(buffer)) != -1) {
out.write(buffer, 0, bytesRead);
}
System.out.println("\nAudio file downloaded to local storage: downloaded_audio.wav");
} catch (Exception e) {
System.out.println("\nError downloading audio file: " + e.getMessage());
}
}
public static void main(String[] args) {
// The following is the Singapore region URL. To use models in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
try {
call();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}cURL
# ======= IMPORTANT =======
# The URL below points to the Singapore region. If you are using a model in the China (Beijing) region, replace it with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# Note: API Keys differ between the Singapore and Beijing regions. To obtain an API Key, visit: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# === Remove this comment before running ===
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-tts-flash",
"input": {
"text": "Today is a wonderful day to build something people love!",
"voice": "Cherry",
"language_type": "English"
}
}'Streaming output
In streaming mode, audio data is returned incrementally as Base64-encoded PCM segments. The last packet includes a URL for the complete audio file.
Python
# coding=utf-8
#
# Installation instructions for pyaudio:
# APPLE Mac OS X
# brew install portaudio
# pip install pyaudio
# Debian/Ubuntu
# sudo apt-get install python-pyaudio python3-pyaudio
# or
# pip install pyaudio
# CentOS
# sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
# python -m pip install pyaudio
import os
import dashscope
import pyaudio
import time
import base64
import numpy as np
# The following is the Singapore region URL. To use models in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
p = pyaudio.PyAudio()
# Create an audio stream
stream = p.open(format=pyaudio.paInt16,
channels=1,
rate=24000,
output=True)
text = "Today is a wonderful day to build something people love!"
response = dashscope.MultiModalConversation.call(
# The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured an environment variable, replace the following line with your Model Studio API Key: api_key = "sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
# To use the instruction control feature, replace model with qwen3-tts-instruct-flash
model="qwen3-tts-flash",
text=text,
voice="Cherry",
language_type="English", # It is recommended to match the language of the text to ensure correct pronunciation and natural intonation.
# To use the instruction control feature, uncomment the following lines and replace model with qwen3-tts-instruct-flash
# instructions='Speak at a relatively fast speed with a noticeable rising intonation, suitable for introducing fashion products.',
# optimize_instructions=True,
stream=True
)
for chunk in response:
if chunk.output is not None:
audio = chunk.output.audio
if audio.data is not None:
wav_bytes = base64.b64decode(audio.data)
audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
# Play the audio data directly
stream.write(audio_np.tobytes())
if chunk.output.finish_reason == "stop":
print("finish at: {} ", chunk.output.audio.expires_at)
time.sleep(0.8)
# Clean up resources
stream.stop_stream()
stream.close()
p.terminate()Java
Import the Gson dependency. If you use Maven or Gradle, add the dependency as follows:
Maven
Add the following to pom.xml:
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.13.1</version>
</dependency>Gradle
Add the following to build.gradle:
// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")// Install the latest version of the DashScope SDK
import com.alibaba.dashscope.aigc.multimodalconversation.AudioParameters;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.utils.Constants;
import io.reactivex.Flowable;
import javax.sound.sampled.*;
import java.util.Base64;
public class Main {
// To use the instruction control feature, replace MODEL with qwen3-tts-instruct-flash
private static final String MODEL = "qwen3-tts-flash";
public static void streamCall() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you have not configured an environment variable, replace the following line with your Model Studio API Key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(MODEL)
.text("Today is a wonderful day to build something people love!")
.voice(AudioParameters.Voice.CHERRY)
.languageType("English") // It is recommended to match the language of the text to ensure correct pronunciation and natural intonation.
// To use the instruction control feature, uncomment the following lines and replace model with qwen3-tts-instruct-flash
// .parameter("instructions","Speak at a relatively fast speed with a noticeable rising intonation, suitable for introducing fashion products.")
// .parameter("optimize_instructions",true)
.build();
Flowable<MultiModalConversationResult> result = conv.streamCall(param);
result.blockingForEach(r -> {
try {
// 1. Get the Base64-encoded audio data
String base64Data = r.getOutput().getAudio().getData();
byte[] audioBytes = Base64.getDecoder().decode(base64Data);
// 2. Configure the audio format (adjust according to the audio format returned by the API)
AudioFormat format = new AudioFormat(
AudioFormat.Encoding.PCM_SIGNED,
24000, // Sample rate (must match the format returned by the API)
16, // Bits per sample
1, // Number of channels
2, // Frame size (bytes)
24000, // Frame rate (must match the sample rate)
false // Big-endian
);
// 3. Play the audio data in real time
DataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
try (SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info)) {
if (line != null) {
line.open(format);
line.start();
line.write(audioBytes, 0, audioBytes.length);
line.drain();
}
}
} catch (LineUnavailableException e) {
e.printStackTrace();
}
});
}
public static void main(String[] args) {
// The following is the Singapore region URL. To use models in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
try {
streamCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}cURL
# ======= IMPORTANT =======
# The URL below points to the Singapore region. If you are using a model in the China (Beijing) region, replace it with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# Note: API Keys differ between the Singapore and Beijing regions. To obtain an API Key, visit: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# === Remove this comment before running ===
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
"model": "qwen3-tts-flash",
"input": {
"text": "Today is a wonderful day to build something people love!",
"voice": "Cherry",
"language_type": "English"
}
}'Advanced features
Instruction control
Instruction-based control lets you precisely shape the vocal expression through natural language descriptions, without adjusting complex audio parameters. Describe the desired tone, speed, emotion, or timbre in plain text to produce the corresponding speech effect.
Supported models: Qwen3-TTS-Instruct-Flash-Realtime family
Usage: Pass instruction content in the instruction parameter. For example: "Speak quickly with a noticeable rising tone, as if you're introducing a fashion item."
Supported instruction languages: Chinese and English.
Instruction text length limit: Up to 1,600 tokens.
Use cases:
Audiobook and radio drama voiceover
Advertising and promotional voiceover
Game character and animation voiceover
Emotionally expressive voice assistants
Documentary narration and news broadcasting
Tips for writing high-quality voice descriptions:
Core principles:
Be specific, not vague: Use words that describe concrete vocal qualities, such as "deep," "crisp," or "slightly fast." Avoid subjective, low-information terms like "nice" or "normal."
Be multidimensional, not single-faceted: A good description combines multiple dimensions (pitch, speed, emotion, etc.). Describing only one dimension (e.g., "high pitch") is too broad to produce a distinctive effect.
Be objective, not subjective: Focus on the physical and perceptual qualities of the voice, not personal preferences. For example, use "slightly high pitch with energy" rather than "my favorite voice."
Be original, not imitative: Describe the vocal qualities you want, rather than requesting imitation of specific public figures (such as celebrities or actors). Imitation requests involve copyright risks and are not supported.
Be concise, not redundant: Make every word count. Avoid repeating synonyms or stacking meaningless intensifiers (e.g., "a very very great voice").
Description dimensions: Combining multiple dimensions creates richer expression effects.
Dimension
Example descriptions
Pitch
High, mid, low, slightly high, slightly low
Speed
Fast, moderate, slow, slightly fast, slightly slow
Emotion
Cheerful, calm, gentle, serious, lively, composed, soothing
Timbre
Magnetic, crisp, husky, mellow, sweet, rich, powerful
Use case
News broadcasting, advertising, audiobook, animation character, voice assistant, documentary narration
Examples:
Standard broadcasting style: Clear and precise articulation with standard pronunciation
Emotional escalation: Volume rising rapidly from normal conversation to a shout; straightforward personality with externalized, easily agitated emotions
Special emotional state: Slightly slurred pronunciation from a teary voice, slightly husky, with noticeable tension from a sobbing tone
Advertising voiceover style: Slightly high pitch, moderate speed, energetic and engaging, suitable for advertising
Gentle soothing style: Slightly slow speed, soft and sweet tone, warm and comforting like a caring friend
Supported scope
Model availability varies by deployment region:
International
If you select the International deployment scope, model inference compute resources are dynamically scheduled worldwide, excluding the Chinese mainland. Static data is stored in your selected region. Supported region: Singapore.
To call the following models, use an API key from the Singapore region:
Qwen-TTS:
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash (stable, currently equivalent to qwen3-tts-instruct-flash-2026-01-26), qwen3-tts-instruct-flash-2026-01-26 (latest snapshot)
Qwen3-TTS-VD: qwen3-tts-vd-2026-01-26 (latest snapshot)
Qwen3-TTS-VC: qwen3-tts-vc-2026-01-22 (latest snapshot)
Qwen3-TTS-Flash: qwen3-tts-flash (stable, currently equivalent to qwen3-tts-flash-2025-11-27), qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
Chinese mainland
If you select the Chinese mainland deployment scope, model inference compute resources are restricted to the Chinese mainland. Static data is stored in your selected region. Supported region: China (Beijing).
To call the following models, use an API key from the Beijing region:
Qwen-TTS:
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash (stable, currently equivalent to qwen3-tts-instruct-flash-2026-01-26), qwen3-tts-instruct-flash-2026-01-26 (latest snapshot)
Qwen3-TTS-VD: qwen3-tts-vd-2026-01-26 (latest snapshot)
Qwen3-TTS-VC: qwen3-tts-vc-2026-01-22 (latest snapshot)
Qwen3-TTS-Flash: qwen3-tts-flash (stable, currently equivalent to qwen3-tts-flash-2025-11-27), qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
Qwen-TTS: qwen-tts (stable, currently equivalent to qwen-tts-2025-04-10), qwen-tts-latest (latest, currently equivalent to qwen-tts-2025-05-22), qwen-tts-2025-05-22 (snapshot), qwen-tts-2025-04-10 (snapshot)
Built-in voices
Voices vary by model. To specify a voice, set the voice parameter to the value in the voice parameter column of the tables below.
Qwen-TTS voice list:
voiceparameterDetails
Supported languages
Supported models
CherryVoice name: Cherry
Description: A sunny, positive, friendly, and natural young woman (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
Qwen-TTS: qwen-tts, qwen-tts-2025-04-10, qwen-tts-latest, qwen-tts-2025-05-22
SerenaVoice name: Serena
Description: A gentle young woman (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
Qwen-TTS: qwen-tts, qwen-tts-2025-04-10, qwen-tts-latest, qwen-tts-2025-05-22
EthanVoice name: Ethan
Description: Standard Mandarin with a slight northern accent. Sunny, warm, energetic, and vibrant (male)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
Qwen-TTS: qwen-tts, qwen-tts-2025-04-10, qwen-tts-latest, qwen-tts-2025-05-22
ChelsieVoice name: Chelsie
Description: A two-dimensional virtual girlfriend (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
Qwen-TTS: qwen-tts, qwen-tts-2025-04-10, qwen-tts-latest, qwen-tts-2025-05-22
MomoVoice name: Momo
Description: Playful and mischievous, cheering you up (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
VivianVoice name: Vivian
Description: Confident, cute, and slightly feisty (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
MoonVoice name: Moon
Description: A bold and handsome man named Yuebai (male)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
MaiaVoice name: Maia
Description: A blend of intellect and gentleness (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
KaiVoice name: Kai
Description: A soothing audio spa for your ears (male)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
NofishVoice name: Nofish
Description: A designer who cannot pronounce retroflex sounds (male)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
BellaVoice name: Bella
Description: A little girl who drinks but never throws punches when drunk (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
JenniferVoice name: Jennifer
Description: A premium, cinematic-quality American English female voice (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
RyanVoice name: Ryan
Description: Full of rhythm, bursting with dramatic flair, balancing authenticity and tension (male)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
KaterinaVoice name: Katerina
Description: A mature-woman voice with rich, memorable rhythm (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
AidenVoice name: Aiden
Description: An American English young man skilled in cooking (male)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
Eldric SageVoice name: Eldric Sage
Description: A calm and wise elder—weathered like a pine tree, yet clear-minded as a mirror (male)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
MiaVoice name: Mia
Description: Gentle as spring water, obedient as fresh snow (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
MochiVoice name: Mochi
Description: A clever, quick-witted young adult—childlike innocence remains, yet wisdom shines through (male)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
BellonaVoice name: Bellona
Description: A powerful, clear voice that brings characters to life—so stirring it makes your blood boil. With heroic grandeur and perfect diction, this voice captures the full spectrum of human expression.
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
VincentVoice name: Vincent
Description: A uniquely raspy, smoky voice—just one line evokes armies and heroic tales (male)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
BunnyVoice name: Bunny
Description: A little girl overflowing with "cuteness" (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
NeilVoice name: Neil
Description: A flat baseline intonation with precise, clear pronunciation—the most professional news anchor (male)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
EliasVoice name: Elias
Description: Maintains academic rigor while using storytelling techniques to turn complex knowledge into digestible learning modules (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
ArthurVoice name: Arthur
Description: A simple, earthy voice steeped in time and tobacco smoke—slowly unfolding village stories and curiosities (male)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
NiniVoice name: Nini
Description: A soft, clingy voice like sweet rice cakes—those drawn-out calls of “Big Brother” are so sweet they melt your bones (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
SerenVoice name: Seren
Description: A gentle, soothing voice to help you fall asleep faster. Good night, sweet dreams (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
PipVoice name: Pip
Description: A playful, mischievous boy full of childlike wonder—is this your memory of Shin-chan? (male)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
StellaVoice name: Stella
Description: Normally a cloyingly sweet, dazed teenage-girl voice—but when shouting “I represent the moon to defeat you!”, she instantly radiates unwavering love and justice (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
BodegaVoice name: Bodega
Description: A passionate Spanish man (male)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
SonrisaVoice name: Sonisa
Description: A cheerful, outgoing Latin American woman (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
AlekVoice name: Alek
Description: Cold like the Russian spirit, yet warm like wool coat lining (male)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
DolceVoice name: Dolce
Description: A laid-back Italian man (male)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
SoheeVoice name: Sohee
Description: A warm, cheerful, emotionally expressive Korean unnie (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
Ono AnnaVoice name: Ono Anna
Description: A clever, spirited childhood friend (female)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
LennVoice name: Lenn
Description: Rational at heart, rebellious in detail—a German youth who wears suits and listens to post-punk
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
EmilienVoice name: Emilien
Description: A romantic French big brother (male)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
AndreVoice name: Andre
Description: A magnetic, natural, and steady male voice
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
Radio GolVoice name: Radio Gol
Description: Football poet Radio Gol! Today I’ll commentate on football using my name (male)
Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
JadaVoice name: Shanghai - Jada
Description: A fast-paced, energetic Shanghai auntie (female)
Shanghainese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
Qwen-TTS: qwen-tts-latest, qwen-tts-2025-05-22
DylanVoice name: Beijing - Dylan
Description: A young man raised in Beijing’s hutongs (male)
Beijing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
Qwen-TTS: qwen-tts-latest, qwen-tts-2025-05-22
LiVoice name: Nanjing - Li
Description: A patient yoga teacher (male)
Nanjing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
MarcusVoice name: Shaanxi - Marcus
Description: Broad face, few words, sincere heart, deep voice—the authentic Shaanxi flavor (male)
Shaanxi dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
RoyVoice name: Southern Min - Roy
Description: A humorous, straightforward, lively Taiwanese guy (male)
Southern Min, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
PeterVoice name: Tianjin - Peter
Description: Tianjin-style crosstalk, professional foil (male)
Tianjin dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, and qwen3-tts-flash-2025-09-18
SunnyVoice name: Sichuan - Sunny
Description: A Sichuan girl sweet enough to melt your heart (female)
Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
Qwen-TTS: qwen-tts-latest, qwen-tts-2025-05-22
EricVoice name: Sichuan - Eric
Description: A Sichuanese man from Chengdu who stands out in everyday life (male)
Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
RockyVoice name: Cantonese - Rocky
Description: A humorous, witty A Qiang providing live chat (male)
Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, and qwen3-tts-flash-2025-09-18
KikiVoice name: Cantonese - Kiki
Description: A sweet Hong Kong girl best friend (female)
Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean
Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, and qwen3-tts-flash-2025-09-18
API reference
FAQ
Q: How long does the audio file URL remain valid?
A: The audio file URL expires 24 hours after it's generated. To get a new URL, call the API again.