The Qwen speech synthesis model offers a variety of human-like voices. They support multiple languages and dialects and can generate multilingual content in a single voice. The system automatically adapts its tone and processes complex text fluently.
Model availability
We recommend Qwen3-TTS-Flash.
Qwen3-TTS-Flash offers 49 voices and supports multiple languages and dialects.
Qwen-TTS offers up to 7 voices and supports only Chinese and English.
International (Singapore)
Model | Version | Unit price | Max input characters | Supported languages | Free quota (Note) |
qwen3-tts-flash Capabilities are identical to qwen3-tts-flash-2025-09-18. | Stable | $0.1/10,000 characters | 600 | Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, and Portuguese | If you activate Model Studio before 00:00 on November 13, 2025: 2,000 characters If you activate Model Studio on or after 00:00 on November 13, 2025: 10,000 characters Validity: Valid for 90 days after you activate Model Studio. |
qwen3-tts-flash-2025-11-27 | Snapshot | 10,000 characters Validity: Valid for 90 days after you activate Model Studio. | |||
qwen3-tts-flash-2025-09-18 | Snapshot | If you activate Model Studio before 00:00 on November 13, 2025: 2,000 characters If you activate Model Studio on or after 00:00 on November 13, 2025: 10,000 characters Validity: Valid for 90 days after you activate Model Studio. |
Billing is based on the number of input characters. The calculation rules are as follows:
Each Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) is counted as 2 characters.
Each other character, such as an English letter, a punctuation mark, or a space, is counted as 1 character.
China (Beijing)
Qwen3-TTS-Flash
Model | Version | Unit price | Max input characters | Supported languages |
qwen3-tts-flash Same capabilities as qwen3-tts-flash-2025-09-18 | Stable | $0.114682/10,000 characters | 600 | Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese |
qwen3-tts-flash-2025-11-27 | Snapshot | |||
qwen3-tts-flash-2025-09-18 | Snapshot |
Billing is based on the number of input characters. The calculation rules are as follows:
Each Chinese character (including simplified/traditional Chinese, Japanese Kanji, and Korean Hanja) is counted as 2 characters.
Each other character, such as an English letter, a punctuation mark, or a space, is counted as 1 character.
Qwen-TTS
Model | Version | Context window | Max input | Max output | Input cost | Output cost |
(tokens) | (1,000 tokens) | |||||
qwen-tts Same capabilities as qwen-tts-2025-04-10. | Stable | 8,192 | 512 | 7,680 | $0.230 | $1.434 |
qwen-tts-latest Same capabilities as the latest snapshot version. | Latest | |||||
qwen-tts-2025-05-22 | Snapshot | |||||
qwen-tts-2025-04-10 | ||||||
Audio is converted to tokens at a rate of 50 tokens per second. Audio clips shorter than 1 second are billed as 50 tokens.
Features
Features | Qwen3-TTS-Flash | Qwen-TTS |
Connection type | Java/Python SDK, RESTful API | |
Streaming output | Support | |
Streaming input | Not supported | |
Synthesized audio format |
| |
Synthesized audio sample rate | 24 kHz | |
Timestamp | Not supported | |
Language | Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese. Varies by voice. For more information, see Supported voices | Chinese (Mandarin, Beijing, Shanghai, Sichuan), English. Varies by model and voice. For more information, see Supported voices |
Voice cloning | Not supported | |
SSML | Not supported | |
Getting started
Preparations
You have created an API key and export the API key as environment variable.
To use the DashScope SDK, install the latest version of the SDK. The DashScope Java SDK must be version 2.21.9 or later. The DashScope Python SDK must be version 1.24.6 or later.
NoteIn the DashScope Python SDK, the
SpeechSynthesizerinterface has been replaced byMultiModalConversation. To use the new interface, simply replace the name. All other parameters are fully compatible.
Key parameters
text: Specifies the text.voice: The voice to use.language_type: The language for the synthesized audio. Valid values areChinese,English,German,Italian,Portuguese,Spanish,Japanese,Korean,French,Russian, andAuto(default).
Use the returned url to retrieve the synthesized audio. The URL is valid for 24 hours.
Python
# DashScope SDK version 1.24.6 or later
import os
import dashscope
# This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
text = "Today is a wonderful day to build something people love!"
# To use the SpeechSynthesizer interface: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...)
response = dashscope.MultiModalConversation.call(
model="qwen3-tts-flash",
# API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
text=text,
voice="Cherry",
language_type="English", # We recommend that this matches the text language to ensure correct pronunciation and natural intonation.
stream=False
)
print(response)Java
You must import the Gson dependency. If you use Maven or Gradle, you can add the dependency as follows:
Maven
Add the following content to pom.xml:
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.13.1</version>
</dependency>Gradle
Add the following content to build.gradle:
// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")// DashScope SDK version 2.21.9 or later
// Versions 2.20.7 and later support the Dylan, Jada, and Sunny voices.
import com.alibaba.dashscope.aigc.multimodalconversation.AudioParameters;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.utils.Constants;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.net.URL;
public class Main {
private static final String MODEL = "qwen3-tts-flash";
public static void call() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured an environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(MODEL)
.text("Today is a wonderful day to build something people love!")
.voice(AudioParameters.Voice.CHERRY)
.languageType("English") // We recommend that this matches the text language to ensure correct pronunciation and natural intonation.
.build();
MultiModalConversationResult result = conv.call(param);
String audioUrl = result.getOutput().getAudio().getUrl();
System.out.print(audioUrl);
// Download the audio file locally
try (InputStream in = new URL(audioUrl).openStream();
FileOutputStream out = new FileOutputStream("downloaded_audio.wav")) {
byte[] buffer = new byte[1024];
int bytesRead;
while ((bytesRead = in.read(buffer)) != -1) {
out.write(buffer, 0, bytesRead);
}
System.out.println("\nAudio file downloaded to local path: downloaded_audio.wav");
} catch (Exception e) {
System.out.println("\nError downloading audio file: " + e.getMessage());
}
}
public static void main(String[] args) {
// This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
try {
call();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}cURL
# ======= Important =======
# This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"model": "qwen3-tts-flash",
"input": {
"text": "Today is a wonderful day to build something people love!",
"voice": "Cherry",
"language_type": "English"
}
}'Real-time playback
Stream audio data in Base64 format. The last packet contains the URL for the complete audio file.
Python
# DashScope SDK version 1.24.6 or later
# coding=utf-8
#
# Installation instructions for pyaudio:
# APPLE Mac OS X
# brew install portaudio
# pip install pyaudio
# Debian/Ubuntu
# sudo apt-get install python-pyaudio python3-pyaudio
# or
# pip install pyaudio
# CentOS
# sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
# python -m pip install pyaudio
import os
import dashscope
import pyaudio
import time
import base64
import numpy as np
# This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
p = pyaudio.PyAudio()
# Create an audio stream
stream = p.open(format=pyaudio.paInt16,
channels=1,
rate=24000,
output=True)
text = "Today is a wonderful day to build something people love!"
response = dashscope.MultiModalConversation.call(
# API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
api_key=os.getenv("DASHSCOPE_API_KEY"),
model="qwen3-tts-flash",
text=text,
voice="Cherry",
language_type="English", # We recommend that this matches the text language to ensure correct pronunciation and natural intonation.
stream=True
)
for chunk in response:
if chunk.output is not None:
audio = chunk.output.audio
if audio.data is not None:
wav_bytes = base64.b64decode(audio.data)
audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
# Play the audio data directly
stream.write(audio_np.tobytes())
if chunk.output.finish_reason == "stop":
print("finish at: {} ", chunk.output.audio.expires_at)
time.sleep(0.8)
# Clean up resources
stream.stop_stream()
stream.close()
p.terminate()
Java
You must import the Gson dependency. If you use Maven or Gradle, you can add the dependency as follows:
Maven
Add the following content to pom.xml:
<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.13.1</version>
</dependency>Gradle
Add the following content to build.gradle:
// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")// Install the latest version of the DashScope SDK
// Versions 2.20.7 and later support the Dylan, Jada, and Sunny voices.
import com.alibaba.dashscope.aigc.multimodalconversation.AudioParameters;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.utils.Constants;
import io.reactivex.Flowable;
import javax.sound.sampled.*;
import java.util.Base64;
public class Main {
private static final String MODEL = "qwen3-tts-flash";
public static void streamCall() throws ApiException, NoApiKeyException, UploadFileException {
MultiModalConversation conv = new MultiModalConversation();
MultiModalConversationParam param = MultiModalConversationParam.builder()
// API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
// If you have not configured an environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
.apiKey(System.getenv("DASHSCOPE_API_KEY"))
.model(MODEL)
.text("Today is a wonderful day to build something people love!")
.voice(AudioParameters.Voice.CHERRY)
.languageType("English") // We recommend that this matches the text language to ensure correct pronunciation and natural intonation.
.build();
Flowable<MultiModalConversationResult> result = conv.streamCall(param);
result.blockingForEach(r -> {
try {
// 1. Get the Base64-encoded audio data
String base64Data = r.getOutput().getAudio().getData();
byte[] audioBytes = Base64.getDecoder().decode(base64Data);
// 2. Configure the audio format (adjust based on the format returned by the API)
AudioFormat format = new AudioFormat(
AudioFormat.Encoding.PCM_SIGNED,
24000, // Sample rate (must match the format returned by the API)
16, // Audio bit depth
1, // Number of sound channels
2, // Frame size (bit depth / 8)
16000, // Data transfer rate
false // Is compressed
);
// 3. Play the audio data in real time
DataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
try (SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info)) {
if (line != null) {
line.open(format);
line.start();
line.write(audioBytes, 0, audioBytes.length);
line.drain();
}
}
} catch (LineUnavailableException e) {
e.printStackTrace();
}
});
}
public static void main(String[] args) {
// This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
try {
streamCall();
} catch (ApiException | NoApiKeyException | UploadFileException e) {
System.out.println(e.getMessage());
}
System.exit(0);
}
}
cURL
# ======= Important =======
# This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===
curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
"model": "qwen3-tts-flash",
"input": {
"text": "Today is a wonderful day to build something people love!",
"voice": "Cherry",
"language_type": "Chinese"
}
}'Supported voices
Qwen3-TTS-Flash
Supported voices vary by models. Set the voice request parameter to the value in the voice parameter column in the table.
Qwen-TTS
Supported voices vary by models. Set the voice request parameter to the value in the voice parameter column in the table.
API reference
For more information, see Speech synthesis (Qwen-TTS).
FAQ
Q: How long is the audio file URL valid?
A: The audio file URL expires after 24 hours.