All Products
Search
Document Center

Alibaba Cloud Model Studio:Qwen speech synthesis models

Last Updated:Nov 12, 2025

Qwen speech synthesis models offer a variety of human-like voices. They support multiple languages and dialects and can generate multilingual content in a single voice. The system automatically adapts its tone and processes complex text fluently.

Supported models

We recommend Qwen3-TTS.

Qwen3-TTS offers 17 voices and supports multiple languages and dialects.

Qwen-TTS offers up to 7 voices and supports only Chinese and English.

International (Singapore)

Model

Version

Unit price

Maximum input characters

Supported languages

Free quota(Note)

qwen3-tts-flash

Its capabilities are the same as qwen3-tts-flash-2025-09-18

Stable version

$0.1/10,000 characters

600

Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

2,000 characters for each

Validity: Within 90 days after you activate Alibaba Cloud Model Studio

qwen3-tts-flash-2025-09-18

Snapshot version

Qwen3-TTS is billed based on the number of input characters. The billing rules are as follows:

  • 1 Chinese character = 2 characters

  • 1 English letter, 1 punctuation mark, or 1 space = 1 character

Mainland China (Beijing)

Qwen3-TTS

Model

Version

Unit price

Maximum input characters

Supported languages

qwen3-tts-flash

Its capabilities are the same as qwen3-tts-flash-2025-09-18

Stable version

$0.114682/10,000 characters

600

Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese

qwen3-tts-flash-2025-09-18

Snapshot version

Qwen3-TTS is billed based on the number of input characters. The billing rules are as follows:

  • 1 Chinese character = 2 characters

  • 1 English letter, 1 punctuation mark, or 1 space = 1 character

Qwen-TTS

Model

Version

Context window

Maximum input

Maximum output

Input cost

Output cost

(Tokens)

(per 1,000 tokens)

qwen-tts

Its capabilities are the same as qwen-tts-2025-04-10

Stable version

8,192

512

7,680

$0.230

$1.434

qwen-tts-latest

Its capabilities are always the same as the latest snapshot version

Latest version

qwen-tts-2025-05-22

Snapshot version

qwen-tts-2025-04-10

Audio is converted to tokens at a rate of 50 tokens per second. Audio shorter than 1 second is counted as 50 tokens.

Features

Features

Qwen3-TTS

Qwen-TTS

Connection type

Python, Java, HTTP

Streaming output

Supported

Streaming input

Not supported

Synthesized audio format

  • wav

  • Base64-encoded PCM for streaming output

Synthesized audio sample rate

24 kHz

Timestamp

Not supported

Language

Chinese (Mandarin, Beijing, Shanghai, Sichuan, Nanjing, Shaanxi, Minnan, Tianjin, Cantonese), English, Spanish, Russian, Italian, French, Korean, Japanese, German, Portuguese. Varies by voice. For more information, see Supported voices

Chinese (Mandarin, Beijing, Shanghai, Sichuan), English. Varies by model and voice. For more information, see Supported voices

Voice cloning

Not supported

SSML

Not supported

Getting started

Preparations

Key parameters

  • text: Specifies the text.

  • voice: The voice to use.

  • language_type: The language for the synthesized audio. Valid values are Chinese, English, German, Italian, Portuguese, Spanish, Japanese, Korean, French, Russian, and Auto (default).

You can use the returned url to retrieve the synthesized audio. The URL is valid for 24 hours.

# DashScope SDK version 1.24.6 or later
import os
import dashscope

# This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

text = "I want to recommend a T-shirt. It's incredibly stylish, and the color is very flattering. It's a perfect piece for any outfit. You can't go wrong with it. It's really beautiful and suits all body types, so anyone will look great wearing it. I highly recommend ordering one."
# To use the SpeechSynthesizer interface: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...)
response = dashscope.MultiModalConversation.call(
    model="qwen3-tts-flash",
    # API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    text=text,
    voice="Cherry",
    language_type="Chinese", # We recommend that this matches the text language to ensure correct pronunciation and natural intonation.
    stream=False
)
print(response)
// DashScope SDK version 2.21.9 or later
// Versions 2.20.7 and later support the Dylan, Jada, and Sunny voices.
import com.alibaba.dashscope.aigc.multimodalconversation.AudioParameters;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.utils.Constants;

import java.io.FileOutputStream;
import java.io.InputStream;
import java.net.URL;

public class Main {
    private static final String MODEL = "qwen3-tts-flash";
    public static void call() throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // If you have not configured an environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model(MODEL)
                .text("Today is a wonderful day to build something people love!")
                .voice(AudioParameters.Voice.CHERRY)
                .languageType("English") // We recommend that this matches the text language to ensure correct pronunciation and natural intonation.
                .build();
        MultiModalConversationResult result = conv.call(param);
        String audioUrl = result.getOutput().getAudio().getUrl();
        System.out.print(audioUrl);

        // Download the audio file locally
        try (InputStream in = new URL(audioUrl).openStream();
             FileOutputStream out = new FileOutputStream("downloaded_audio.wav")) {
            byte[] buffer = new byte[1024];
            int bytesRead;
            while ((bytesRead = in.read(buffer)) != -1) {
                out.write(buffer, 0, bytesRead);
            }
            System.out.println("\nAudio file downloaded to local path: downloaded_audio.wav");
        } catch (Exception e) {
            System.out.println("\nError downloading audio file: " + e.getMessage());
        }
    }
    public static void main(String[] args) {
        // This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
        try {
            call();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}
# ======= Important =======
# This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===

curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen3-tts-flash",
    "input": {
        "text": "I want to recommend a T-shirt. It'\''s incredibly stylish, and the color is very flattering. It'\''s a perfect piece for any outfit. You can'\''t go wrong with it. It'\''s really beautiful and suits all body types, so anyone will look great wearing it. I highly recommend ordering one.",
        "voice": "Cherry",
        "language_type": "Chinese"
    }
}'

Real-time playback

The Qwen-TTS model can stream audio data in Base64 format. The last packet contains the URL for the complete audio file.

# DashScope SDK version 1.24.6 or later
# coding=utf-8
#
# Installation instructions for pyaudio:
# APPLE Mac OS X
#   brew install portaudio
#   pip install pyaudio
# Debian/Ubuntu
#   sudo apt-get install python-pyaudio python3-pyaudio
#   or
#   pip install pyaudio
# CentOS
#   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
#   python -m pip install pyaudio

import os
import dashscope
import pyaudio
import time
import base64
import numpy as np

# This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

p = pyaudio.PyAudio()
# Create an audio stream
stream = p.open(format=pyaudio.paInt16,
                channels=1,
                rate=24000,
                output=True)


text = "Hello, I am Qwen"
response = dashscope.MultiModalConversation.call(
    # API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you have not configured an environment variable, replace the following line with your Model Studio API key: api_key = "sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    model="qwen3-tts-flash",
    text=text,
    voice="Cherry",
    language_type="Chinese", # We recommend that this matches the text language to ensure correct pronunciation and natural intonation.
    stream=True
)

for chunk in response:
    if chunk.output is not None:
      audio = chunk.output.audio
      if audio.data is not None:
          wav_bytes = base64.b64decode(audio.data)
          audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
          # Play the audio data directly
          stream.write(audio_np.tobytes())
      if chunk.output.finish_reason == "stop":
          print("finish at: {} ", chunk.output.audio.expires_at)
time.sleep(0.8)
# Clean up resources
stream.stop_stream()
stream.close()
p.terminate()
// Install the latest version of the DashScope SDK
// Versions 2.20.7 and later support the Dylan, Jada, and Sunny voices.
import com.alibaba.dashscope.aigc.multimodalconversation.AudioParameters;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.utils.Constants;
import io.reactivex.Flowable;
import javax.sound.sampled.*;
import java.util.Base64;

public class Main {
    private static final String MODEL = "qwen3-tts-flash";
    public static void streamCall() throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // If you have not configured an environment variable, replace the following line with your Model Studio API key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model(MODEL)
                .text("Today is a wonderful day to build something people love!")
                .voice(AudioParameters.Voice.CHERRY)
                .languageType("English") // We recommend that this matches the text language to ensure correct pronunciation and natural intonation.
                .build();
        Flowable<MultiModalConversationResult> result = conv.streamCall(param);
        result.blockingForEach(r -> {
            try {
                // 1. Get the Base64-encoded audio data
                String base64Data = r.getOutput().getAudio().getData();
                byte[] audioBytes = Base64.getDecoder().decode(base64Data);

                // 2. Configure the audio format (adjust based on the format returned by the API)
                AudioFormat format = new AudioFormat(
                        AudioFormat.Encoding.PCM_SIGNED,
                        24000, // Sample rate (must match the format returned by the API)
                        16,    // Audio bit depth
                        1,     // Number of sound channels
                        2,     // Frame size (bit depth / 8)
                        16000, // Data transfer rate
                        false  // Is compressed
                );

                // 3. Play the audio data in real time
                DataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
                try (SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info)) {
                    if (line != null) {
                        line.open(format);
                        line.start();
                        line.write(audioBytes, 0, audioBytes.length);
                        line.drain();
                    }
                }
            } catch (LineUnavailableException e) {
                e.printStackTrace();
            }
        });
    }
    public static void main(String[] args) {
        // This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
        try {
            streamCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}
# ======= Important =======
# This is the URL for the Singapore region. If you use a model in the China (Beijing) region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys for the Singapore and China (Beijing) regions are different. To get an API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# === Delete this comment before execution ===

curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
    "model": "qwen3-tts-flash",
    "input": {
        "text": "I want to recommend a T-shirt. It'\''s incredibly stylish, and the color is very flattering. It'\''s a perfect piece for any outfit. You can'\''t go wrong with it. It'\''s really beautiful and suits all body types, so anyone will look great wearing it. I highly recommend ordering one.",
        "voice": "Cherry",
        "language_type": "Chinese"
    }
}'

API reference

For more information, see Speech synthesis (Qwen-TTS).

Supported voices

The supported voices vary by model. When you make a request, set the voice parameter to the corresponding value in the voice parameter column of the following tables.

Qwen3-TTS

Name

voice parameter

Voice effects

Description

Supported languages

Cherry

Cherry

A cheerful, friendly, and natural young woman's voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai

Ethan

Ethan

Standard Mandarin with a slight northern accent. A bright, warm, and energetic voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai

Nofish

Nofish

A designer who does not use retroflex consonants.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai

Jennifer

Jennifer

A premium, cinematic American English female voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai

Ryan

Ryan

A rhythmic, dramatic voice with realism and tension.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai

Katerina

Katerina

A mature and rhythmic female voice.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai

Elias

Elias

Explains complex topics with academic rigor and clear storytelling.

Chinese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai

Shanghai-Jada

Jada

A lively woman from Shanghai.

Chinese (Shanghainese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai

Beijing-Dylan

Dylan

A teenager who grew up in the hutongs of Beijing.

Chinese (Beijing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai

Sichuan-Sunny

Sunny

A sweet female voice from Sichuan.

Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai

Nanjing-Li

Li

A patient yoga teacher.

Chinese (Nanjing dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai

Shaanxi-Marcus

Marcus

A sincere and deep voice from Shaanxi.

Chinese (Shaanxi dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai

Man Nan-Roy

Roy

A humorous and lively young male voice with a Minnan accent.

Chinese (Min Nan), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai

Tianjin-Peter

Peter

A voice for the straight man in Tianjin crosstalk.

Chinese (Tianjin dialect), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai

Cantonese-Rocky

Rocky

A witty and humorous male voice for online chats.

Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai

Cantonese-Kiki

Kiki

A sweet best friend from Hong Kong.

Chinese (Cantonese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai

Sichuan-Eric

Eric

An unconventional and refined male voice from Chengdu, Sichuan.

Chinese (Sichuanese), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean, Thai

Qwen-TTS

  • Voices supported by qwen-tts, qwen-tts-2025-05-22, and qwen-tts-2025-04-10:

    Name

    voice parameter

    Voice effects

    Description

    Supported languages

    Cherry

    Cherry

    A cheerful, friendly, and genuine young woman.

    Chinese, English

    Serena

    Serena

    A gentle young lady.

    Chinese, English

    Ethan

    Ethan

    Standard Mandarin with a slight northern accent. A bright, warm, and energetic voice.

    Chinese, English

    Chelsie

    Chelsie

    An anime-style virtual girlfriend voice.

    Chinese, English

  • Voices supported by qwen-tts-latest and qwen-tts-2025-05-22:

    Name

    voice parameter

    Voice effects

    Description

    Supported languages

    Cherry

    Cherry

    A bright, friendly, and natural female voice.

    Chinese, English

    Serena

    Serena

    Gentle young woman.

    Chinese, English

    Ethan

    Ethan

    Standard Mandarin with a slight northern accent. A bright, warm, and energetic voice.

    Chinese, English

    Chelsie

    Chelsie

    An anime-style virtual girlfriend voice.

    Chinese, English

    Sichuan-Sunny

    Sunny

    A sweet and endearing girl from Sichuan.

    Chinese (Sichuanese), English

    Shanghai-Jada

    Jada

    A lively and energetic female voice with a Shanghai accent.

    Chinese (Shanghainese), English

    Beijing-Dylan

    Dylan

    A teenager who grew up in Beijing's hutongs.

    Chinese (Beijing dialect), English

FAQ

Q: How long is the audio file URL valid?

A: The audio file URL expires after 24 hours.

Q: Can I input text in Markdown format?

A: Markdown format is not currently supported. Convert the text to plain text first.