Qwen speech synthesis models offer a variety of human-like voices. They support multiple languages and dialects and can generate multilingual content in a single voice. The system automatically adapts its tone and processes complex text fluently.
Supported models
We recommend Qwen3-TTS.
Qwen3-TTS offers 17 voices and supports multiple languages and dialects.
Qwen-TTS offers up to 7 voices and supports only Chinese and English.
International (Singapore)
Model | Version | Unit price | Maximum input characters | Supported languages | Free quota(Note) |
qwen3-tts-flash Its capabilities are the same as qwen3-tts-flash-2025-09-18 | Stable version | $0.1/10,000 characters | 600 | Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese | 2,000 characters for each Validity: Within 90 days after you activate Alibaba Cloud Model Studio |
qwen3-tts-flash-2025-09-18 | Snapshot version |
Qwen3-TTS is billed based on the number of input characters. The billing rules are as follows:
1 Chinese character = 2 characters
1 English letter, 1 punctuation mark, or 1 space = 1 character
Mainland China (Beijing)
Qwen3-TTS
Model | Version | Unit price | Maximum input characters | Supported languages |
qwen3-tts-flash Its capabilities are the same as qwen3-tts-flash-2025-09-18 | Stable version | $0.114682/10,000 characters | 600 | Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese |
qwen3-tts-flash-2025-09-18 | Snapshot version |
Qwen3-TTS is billed based on the number of input characters. The billing rules are as follows:
1 Chinese character = 2 characters
1 English letter, 1 punctuation mark, or 1 space = 1 character
Qwen-TTS
Model | Version | Context window | Maximum input | Maximum output | Input cost | Output cost |
(Tokens) | (per 1,000 tokens) | |||||
qwen-tts Its capabilities are the same as qwen-tts-2025-04-10 | Stable version | 8,192 | 512 | 7,680 | $0.230 | $1.434 |
qwen-tts-latest Its capabilities are always the same as the latest snapshot version | Latest version | |||||
qwen-tts-2025-05-22 | Snapshot version | |||||
qwen-tts-2025-04-10 | ||||||
Audio is converted to tokens at a rate of 50 tokens per second. Audio shorter than 1 second is counted as 50 tokens.
Features
Features | Qwen3-TTS | Qwen-TTS |
Connection type | Python, Java, HTTP | |
Streaming output | Supported | |
Streaming input | Not supported | |
Synthesized audio format |
| |
Synthesized audio sample rate | 24 kHz | |
Timestamp | Not supported | |
Language | Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese. Varies by voice. For more information, see Supported voices | Chinese (Mandarin, Beijing, Shanghai, Sichuan), English. Varies by model and voice. For more information, see Supported voices |
Voice cloning | Not supported | |
SSML | Not supported | |
Getting started
Preparations
You have configured an API key and added it to an environment variable.
To use the DashScope SDK, install the latest version of the SDK. The DashScope Java SDK must be version 2.21.9 or later. The DashScope Python SDK must be version 1.24.6 or later.
NoteIn the DashScope Python SDK, the
SpeechSynthesizerinterface has been replaced byMultiModalConversation. To use the new interface, simply replace the interface name. All other parameters are fully compatible.
Key parameters
text: Specifies the text.voice: The voice to use.language_type: The language for the synthesized audio. Valid values areChinese,English,German,Italian,Portuguese,Spanish,Japanese,Korean,French,Russian, andAuto(default).
You can use the returned url to retrieve the synthesized audio. The URL is valid for 24 hours.
# DashScope SDK version 1.24.6 or later
import os
import dashscope
# This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
text = "I want to recommend a T-shirt. It's incredibly stylish, and the color is very flattering. It's a perfect piece for any outfit. You can't go wrong with it. It's really beautiful and suits all body types, so anyone will look great wearing it. I highly recommend ordering one."
# To use the SpeechSynthesizer interface: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...)
response = dashscope.MultiModalConversation.call(
model="qwen3-tts-flash",
# API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
text=text,
voice="Cherry",
language_type="Chinese", # We recommend that this matches the text language to ensure correct pronunciation and natural intonation.
stream=False
)
print(response)// DashScope SDK version 2.21.9 or later
// Versions 2.20.7 and later support the Dylan, Jada, and Sunny voices.
import com.alibaba.dashscope.aigc.multimodalconversation.AudioParameters;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.utils.Constants;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.net.URL;
public class Main {
private static final String MODEL = "qwen3-tts-flash";
public static void call() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured an environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(MODEL)
.text("Today is a wonderful day to build something people love!")
.voice(AudioParameters.Voice.CHERRY)
.languageType("English") // We recommend that this matches the text language to ensure correct pronunciation and natural intonation.
.build();
MultiModalConversationResult result = conv.call(param);
String audioUrl = result.getOutput().getAudio().getUrl();
System.out.print(audioUrl);
// Download the audio file locally
try (InputStream in = new URL(audioUrl).openStream();
FileOutputStream out = new FileOutputStream("downloaded_audio.wav")) {
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = in.read(buffer)) != -1) {
out.write(buffer, 0, bytesRead);
}
System.out.println("\nAudio file downloaded to local path: downloaded_audio.wav");
} catch (Exception e) {
System.out.println("\nError downloading audio file: " + e.getMessage());
}
}
public static void main(String[] args) {
// This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
try {
call();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}# ======= Important =======
# This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-tts-flash",
"input": {
"text": "I want to recommend a T-shirt. It'\''s incredibly stylish, and the color is very flattering. It'\''s a perfect piece for any outfit. You can'\''t go wrong with it. It'\''s really beautiful and suits all body types, so anyone will look great wearing it. I highly recommend ordering one.",
"voice": "Cherry",
"language_type": "Chinese"
}
}'Real-time playback
The Qwen-TTS model can stream audio data in Base64 format. The last packet contains the URL for the complete audio file.
# DashScope SDK version 1.24.6 or later
# coding=utf-8
#
# Installation instructions for pyaudio:
# APPLE Mac OS X
# brew install portaudio
# pip install pyaudio
# Debian/Ubuntu
# sudo apt-get install python-pyaudio python3-pyaudio
# or
# pip install pyaudio
# CentOS
# sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
# python -m pip install pyaudio
import os
import dashscope
import pyaudio
import time
import base64
import numpy as np
# This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
p = pyaudio.PyAudio()
# Create an audio stream
stream = p.open(format=pyaudio.paInt16,
channels=1,
rate=24000,
output=True)
text = "Hello, I am Qwen"
response = dashscope.MultiModalConversation.call(
# API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen3-tts-flash",
text=text,
voice="Cherry",
language_type="Chinese", # We recommend that this matches the text language to ensure correct pronunciation and natural intonation.
stream=True
)
for chunk in response:
if chunk.output is not None:
audio = chunk.output.audio
if audio.data is not None:
wav_bytes = base64.b64decode(audio.data)
audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
# Play the audio data directly
stream.write(audio_np.tobytes())
if chunk.output.finish_reason == "stop":
print("finish at: {} ", chunk.output.audio.expires_at)
time.sleep(0.8)
# Clean up resources
stream.stop_stream()
stream.close()
p.terminate()
// Install the latest version of the DashScope SDK
// Versions 2.20.7 and later support the Dylan, Jada, and Sunny voices.
import com.alibaba.dashscope.aigc.multimodalconversation.AudioParameters;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.utils.Constants;
import io.reactivex.Flowable;
import javax.sound.sampled.*;
import java.util.Base64;
public class Main {
private static final String MODEL = "qwen3-tts-flash";
public static void streamCall() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured an environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(MODEL)
.text("Today is a wonderful day to build something people love!")
.voice(AudioParameters.Voice.CHERRY)
.languageType("English") // We recommend that this matches the text language to ensure correct pronunciation and natural intonation.
.build();
Flowable<MultiModalConversationResult> result = conv.streamCall(param);
result.blockingForEach(r -> {
try {
// 1. Get the Base64-encoded audio data
String base64Data = r.getOutput().getAudio().getData();
byte[] audioBytes = Base64.getDecoder().decode(base64Data);
// 2. Configure the audio format (adjust based on the format returned by the API)
AudioFormat format = new AudioFormat(
AudioFormat.Encoding.PCM_SIGNED,
24000, // Sample rate (must match the format returned by the API)
16, // Audio bit depth
1, // Number of sound channels
2, // Frame size (bit depth / 8)
16000, // Data transfer rate
false // Is compressed
);
// 3. Play the audio data in real time
DataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
try (SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info)) {
if (line != null) {
line.open(format);
line.start();
line.write(audioBytes, 0, audioBytes.length);
line.drain();
}
}
} catch (LineUnavailableException e) {
e.printStackTrace();
}
});
}
public static void main(String[] args) {
// This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
try {
streamCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
# ======= Important =======
# This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
"model": "qwen3-tts-flash",
"input": {
"text": "I want to recommend a T-shirt. It'\''s incredibly stylish, and the color is very flattering. It'\''s a perfect piece for any outfit. You can'\''t go wrong with it. It'\''s really beautiful and suits all body types, so anyone will look great wearing it. I highly recommend ordering one.",
"voice": "Cherry",
"language_type": "Chinese"
}
}'API reference
For more information, see Speech synthesis (Qwen-TTS).
Supported voices
The supported voices vary by model. When you make a request, set the voice parameter to the corresponding value in the voice parameter column of the following tables.
Qwen3-TTS
Name |
| Voice effects | Description | Supported languages |
Cherry | Cherry | A cheerful, friendly, and natural young woman's voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Ethan | Ethan | Standard Mandarin with a slight northern accent. A bright, warm, and energetic voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Nofish | Nofish | A designer who does not use retroflex consonants. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Jennifer | Jennifer | A premium, cinematic American English female voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Ryan | Ryan | A rhythmic, dramatic voice with realism and tension. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Katerina | Katerina | A mature and rhythmic female voice. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Elias | Elias | Explains complex topics with academic rigor and clear storytelling. | Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Shanghai-Jada | Jada | A lively woman from Shanghai. | Chinese (Shanghainese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Beijing-Dylan | Dylan | A teenager who grew up in the hutongs of Beijing. | Chinese (Beijing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Sichuan-Sunny | Sunny | A sweet female voice from Sichuan. | Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Nanjing-Li | Li | A patient yoga teacher. | Chinese (Nanjing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Shaanxi-Marcus | Marcus | A sincere and deep voice from Shaanxi. | Chinese (Shaanxi dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Man Nan-Roy | Roy | A humorous and lively young male voice with a Minnan accent. | Chinese (Min Nan), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Tianjin-Peter | Peter | A voice for the straight man in Tianjin crosstalk. | Chinese (Tianjin dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Cantonese-Rocky | Rocky | A witty and humorous male voice for online chats. | Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Cantonese-Kiki | Kiki | A sweet best friend from Hong Kong. | Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai | |
Sichuan-Eric | Eric | An unconventional and refined male voice from Chengdu, Sichuan. | Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai |
Qwen-TTS
Voices supported by qwen-tts, qwen-tts-2025-05-22, and qwen-tts-2025-04-10:
Name
voiceparameterVoice effects
Description
Supported languages
Cherry
Cherry
A cheerful, friendly, and genuine young woman.
Chinese, English
Serena
Serena
A gentle young lady.
Chinese, English
Ethan
Ethan
Standard Mandarin with a slight northern accent. A bright, warm, and energetic voice.
Chinese, English
Chelsie
Chelsie
An anime-style virtual girlfriend voice.
Chinese, English
Voices supported by qwen-tts-latest and qwen-tts-2025-05-22:
Name
voiceparameterVoice effects
Description
Supported languages
Cherry
Cherry
A bright, friendly, and natural female voice.
Chinese, English
Serena
Serena
Gentle young woman.
Chinese, English
Ethan
Ethan
Standard Mandarin with a slight northern accent. A bright, warm, and energetic voice.
Chinese, English
Chelsie
Chelsie
An anime-style virtual girlfriend voice.
Chinese, English
Sichuan-Sunny
Sunny
A sweet and endearing girl from Sichuan.
Chinese (Sichuanese), English
Shanghai-Jada
Jada
A lively and energetic female voice with a Shanghai accent.
Chinese (Shanghainese), English
Beijing-Dylan
Dylan
A teenager who grew up in Beijing's hutongs.
Chinese (Beijing dialect), English
FAQ
Q: How long is the audio file URL valid?
A: The audio file URL expires after 24 hours.
Q: Can I input text in Markdown format?
A: Markdown format is not currently supported. Convert the text to plain text first.