Speech synthesis - Qwen - Alibaba Cloud Model Studio - Alibaba Cloud Documentation Center

Non-real-time speech synthesis converts text to speech (TTS) through an HTTP API, making it suitable for latency-tolerant scenarios such as audiobook production, e-learning narration, and content production. The service supports the Qwen-TTS families, with a wide selection of voices, multilingual support, voice cloning, and voice design.

Overview

Convert text to speech files through an HTTP API. This approach suits latency-tolerant scenarios such as audiobook production, e-learning narration, and batch content production.

Submit complete text to the HTTP API to receive audio output. Streaming output (synthesize while playing) is also supported.
Supports multiple languages, including Chinese dialects.
Supports Voice cloning and Voice Design for custom voice creation.
Supports Instruction control, which lets you control speech expressiveness through natural-language instructions.

For real-time, low-latency speech synthesis, see Real-time speech synthesis(WebSocket API). To choose a model, see Speech synthesis.

Prerequisites

Configure an API key and set it as an environment variable.
To call the API through the DashScope SDK, install the latest SDK version.

Quick start

The following examples demonstrate how to synthesize speech with each model family. For more language examples and detailed parameter descriptions, see the API reference for each model.

Qwen-TTS

The following examples show how to synthesize speech with a built-in voice.

Non-streaming output

In non-streaming mode, the response includes a url field pointing to the synthesized audio file. The URL expires after 24 hours.

Python

import os
import dashscope

# The following is the Singapore region URL. To use models in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

text = "Today is a wonderful day to build something people love!"
# SpeechSynthesizer usage: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...)
response = dashscope.MultiModalConversation.call(
    # To use the instruction control feature, replace model with qwen3-tts-instruct-flash
    model="qwen3-tts-flash",
    # The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you have not configured an environment variable, replace the following line with your Model Studio API Key: api_key = "sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    text=text,
    voice="Cherry",
    language_type="English", # It is recommended to match the language of the text to ensure correct pronunciation and natural intonation.
    # To use the instruction control feature, uncomment the following lines and replace model with qwen3-tts-instruct-flash
    # instructions='Speak at a relatively fast speed with a noticeable rising intonation, suitable for introducing fashion products.',
    # optimize_instructions=True,
    stream=False
)
print(response)

Java

Import the Gson dependency. If you use Maven or Gradle, add the dependency as follows:

Maven

Add the following to pom.xml:

<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.13.1</version>
</dependency>

Gradle

Add the following to build.gradle:

// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")

import com.alibaba.dashscope.aigc.multimodalconversation.AudioParameters;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.utils.Constants;

import java.io.FileOutputStream;
import java.io.InputStream;
import java.net.URL;

public class Main {
    // To use the instruction control feature, replace MODEL with qwen3-tts-instruct-flash
    private static final String MODEL = "qwen3-tts-flash";
    public static void call() throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                // If you have not configured an environment variable, replace the following line with your Model Studio API Key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model(MODEL)
                .text("Today is a wonderful day to build something people love!")
                .voice(AudioParameters.Voice.CHERRY)
                .languageType("English") // It is recommended to match the language of the text to ensure correct pronunciation and natural intonation.
                // To use the instruction control feature, uncomment the following lines and replace model with qwen3-tts-instruct-flash
                // .parameter("instructions","Speak at a relatively fast speed with a noticeable rising intonation, suitable for introducing fashion products.")
                // .parameter("optimize_instructions",true)
                .build();
        MultiModalConversationResult result = conv.call(param);
        String audioUrl = result.getOutput().getAudio().getUrl();
        System.out.print(audioUrl);

        // Download the audio file to local storage
        try (InputStream in = new URL(audioUrl).openStream();
             FileOutputStream out = new FileOutputStream("downloaded_audio.wav")) {
            byte[] buffer = new byte[1024];
            int bytesRead;
            while ((bytesRead = in.read(buffer)) != -1) {
                out.write(buffer, 0, bytesRead);
            }
            System.out.println("\nAudio file downloaded to local storage: downloaded_audio.wav");
        } catch (Exception e) {
            System.out.println("\nError downloading audio file: " + e.getMessage());
        }
    }
    public static void main(String[] args) {
        // The following is the Singapore region URL. To use models in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
        try {
            call();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

cURL

# ======= IMPORTANT =======
# The URL below points to the Singapore region. If you are using a model in the China (Beijing) region, replace it with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# Note: API Keys differ between the Singapore and Beijing regions. To obtain an API Key, visit: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# === Remove this comment before running ===

curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen3-tts-flash",
    "input": {
        "text": "Today is a wonderful day to build something people love!",
        "voice": "Cherry",
        "language_type": "English"
    }
}'

Streaming output

In streaming mode, audio data is returned incrementally as Base64-encoded PCM segments. The last packet includes a URL for the complete audio file.

Python

# coding=utf-8
#
# Installation instructions for pyaudio:
# APPLE Mac OS X
#   brew install portaudio
#   pip install pyaudio
# Debian/Ubuntu
#   sudo apt-get install python-pyaudio python3-pyaudio
#   or
#   pip install pyaudio
# CentOS
#   sudo yum install -y portaudio portaudio-devel && pip install pyaudio
# Microsoft Windows
#   python -m pip install pyaudio

import os
import dashscope
import pyaudio
import time
import base64
import numpy as np

# The following is the Singapore region URL. To use models in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

p = pyaudio.PyAudio()
# Create an audio stream
stream = p.open(format=pyaudio.paInt16,
                channels=1,
                rate=24000,
                output=True)

text = "Today is a wonderful day to build something people love!"
response = dashscope.MultiModalConversation.call(
    # The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you have not configured an environment variable, replace the following line with your Model Studio API Key: api_key = "sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    # To use the instruction control feature, replace model with qwen3-tts-instruct-flash
    model="qwen3-tts-flash",
    text=text,
    voice="Cherry",
    language_type="English", # It is recommended to match the language of the text to ensure correct pronunciation and natural intonation.
    # To use the instruction control feature, uncomment the following lines and replace model with qwen3-tts-instruct-flash
    # instructions='Speak at a relatively fast speed with a noticeable rising intonation, suitable for introducing fashion products.',
    # optimize_instructions=True,
    stream=True
)

for chunk in response:
    if chunk.output is not None:
      audio = chunk.output.audio
      if audio.data is not None:
          wav_bytes = base64.b64decode(audio.data)
          audio_np = np.frombuffer(wav_bytes, dtype=np.int16)
          # Play the audio data directly
          stream.write(audio_np.tobytes())
      if chunk.output.finish_reason == "stop":
          print("finish at: {} ", chunk.output.audio.expires_at)
time.sleep(0.8)
# Clean up resources
stream.stop_stream()
stream.close()
p.terminate()

Java

Import the Gson dependency. If you use Maven or Gradle, add the dependency as follows:

Maven

Add the following to pom.xml:

<!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
<dependency>
    <groupId>com.google.code.gson</groupId>
    <artifactId>gson</artifactId>
    <version>2.13.1</version>
</dependency>

Gradle

Add the following to build.gradle:

// https://mvnrepository.com/artifact/com.google.code.gson/gson
implementation("com.google.code.gson:gson:2.13.1")

// Install the latest version of the DashScope SDK
import com.alibaba.dashscope.aigc.multimodalconversation.AudioParameters;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.protocol.Protocol;
import com.alibaba.dashscope.utils.Constants;
import io.reactivex.Flowable;
import javax.sound.sampled.*;
import java.util.Base64;

public class Main {
    // To use the instruction control feature, replace MODEL with qwen3-tts-instruct-flash
    private static final String MODEL = "qwen3-tts-flash";
    public static void streamCall() throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                // The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                // If you have not configured an environment variable, replace the following line with your Model Studio API Key: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .model(MODEL)
                .text("Today is a wonderful day to build something people love!")
                .voice(AudioParameters.Voice.CHERRY)
                .languageType("English") // It is recommended to match the language of the text to ensure correct pronunciation and natural intonation.
                // To use the instruction control feature, uncomment the following lines and replace model with qwen3-tts-instruct-flash
                // .parameter("instructions","Speak at a relatively fast speed with a noticeable rising intonation, suitable for introducing fashion products.")
                // .parameter("optimize_instructions",true)
                .build();
        Flowable<MultiModalConversationResult> result = conv.streamCall(param);
        result.blockingForEach(r -> {
            try {
                // 1. Get the Base64-encoded audio data
                String base64Data = r.getOutput().getAudio().getData();
                byte[] audioBytes = Base64.getDecoder().decode(base64Data);

                // 2. Configure the audio format (adjust according to the audio format returned by the API)
                AudioFormat format = new AudioFormat(
                        AudioFormat.Encoding.PCM_SIGNED,
                        24000, // Sample rate (must match the format returned by the API)
                        16,    // Bits per sample
                        1,     // Number of channels
                        2,     // Frame size (bytes)
                        24000, // Frame rate (must match the sample rate)
                        false  // Big-endian
                );

                // 3. Play the audio data in real time
                DataLine.Info info = new DataLine.Info(SourceDataLine.class, format);
                try (SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info)) {
                    if (line != null) {
                        line.open(format);
                        line.start();
                        line.write(audioBytes, 0, audioBytes.length);
                        line.drain();
                    }
                }
            } catch (LineUnavailableException e) {
                e.printStackTrace();
            }
        });
    }
    public static void main(String[] args) {
        // The following is the Singapore region URL. To use models in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
        try {
            streamCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

cURL

# ======= IMPORTANT =======
# The URL below points to the Singapore region. If you are using a model in the China (Beijing) region, replace it with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# Note: API Keys differ between the Singapore and Beijing regions. To obtain an API Key, visit: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# === Remove this comment before running ===

curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
    "model": "qwen3-tts-flash",
    "input": {
        "text": "Today is a wonderful day to build something people love!",
        "voice": "Cherry",
        "language_type": "English"
    }
}'

Advanced features

Instruction control

Instruction-based control lets you precisely shape the vocal expression through natural language descriptions, without adjusting complex audio parameters. Describe the desired tone, speed, emotion, or timbre in plain text to produce the corresponding speech effect.

Supported models: Qwen3-TTS-Instruct-Flash-Realtime family

Usage: Pass instruction content in the instruction parameter. For example: "Speak quickly with a noticeable rising tone, as if you're introducing a fashion item."

Supported instruction languages: Chinese and English.

Instruction text length limit: Up to 1,600 tokens.

Use cases:

Audiobook and radio drama voiceover
Advertising and promotional voiceover
Game character and animation voiceover
Emotionally expressive voice assistants
Documentary narration and news broadcasting

Tips for writing high-quality voice descriptions:

Core principles:
1. Be specific, not vague: Use words that describe concrete vocal qualities, such as "deep," "crisp," or "slightly fast." Avoid subjective, low-information terms like "nice" or "normal."
2. Be multidimensional, not single-faceted: A good description combines multiple dimensions (pitch, speed, emotion, etc.). Describing only one dimension (e.g., "high pitch") is too broad to produce a distinctive effect.
3. Be objective, not subjective: Focus on the physical and perceptual qualities of the voice, not personal preferences. For example, use "slightly high pitch with energy" rather than "my favorite voice."
4. Be original, not imitative: Describe the vocal qualities you want, rather than requesting imitation of specific public figures (such as celebrities or actors). Imitation requests involve copyright risks and are not supported.
5. Be concise, not redundant: Make every word count. Avoid repeating synonyms or stacking meaningless intensifiers (e.g., "a very very great voice").

Description dimensions: Combining multiple dimensions creates richer expression effects.

Dimension	Example descriptions
Pitch	High, mid, low, slightly high, slightly low
Speed	Fast, moderate, slow, slightly fast, slightly slow
Emotion	Cheerful, calm, gentle, serious, lively, composed, soothing
Timbre	Magnetic, crisp, husky, mellow, sweet, rich, powerful
Use case	News broadcasting, advertising, audiobook, animation character, voice assistant, documentary narration

Examples:
- Standard broadcasting style: Clear and precise articulation with standard pronunciation
- Emotional escalation: Volume rising rapidly from normal conversation to a shout; straightforward personality with externalized, easily agitated emotions
- Special emotional state: Slightly slurred pronunciation from a teary voice, slightly husky, with noticeable tension from a sobbing tone
- Advertising voiceover style: Slightly high pitch, moderate speed, energetic and engaging, suitable for advertising
- Gentle soothing style: Slightly slow speed, soft and sweet tone, warm and comforting like a caring friend

Supported scope

Model availability varies by deployment region:

International

If you select the International deployment scope, model inference compute resources are dynamically scheduled worldwide, excluding the Chinese mainland. Static data is stored in your selected region. Supported region: Singapore.

To call the following models, use an API key from the Singapore region:

Qwen-TTS:
- Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash (stable, currently equivalent to qwen3-tts-instruct-flash-2026-01-26), qwen3-tts-instruct-flash-2026-01-26 (latest snapshot)
- Qwen3-TTS-VD: qwen3-tts-vd-2026-01-26 (latest snapshot)
- Qwen3-TTS-VC: qwen3-tts-vc-2026-01-22 (latest snapshot)
- Qwen3-TTS-Flash: qwen3-tts-flash (stable, currently equivalent to qwen3-tts-flash-2025-11-27), qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18

Chinese mainland

If you select the Chinese mainland deployment scope, model inference compute resources are restricted to the Chinese mainland. Static data is stored in your selected region. Supported region: China (Beijing).

To call the following models, use an API key from the Beijing region:

Qwen-TTS:
- Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash (stable, currently equivalent to qwen3-tts-instruct-flash-2026-01-26), qwen3-tts-instruct-flash-2026-01-26 (latest snapshot)
- Qwen3-TTS-VD: qwen3-tts-vd-2026-01-26 (latest snapshot)
- Qwen3-TTS-VC: qwen3-tts-vc-2026-01-22 (latest snapshot)
- Qwen3-TTS-Flash: qwen3-tts-flash (stable, currently equivalent to qwen3-tts-flash-2025-11-27), qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
- Qwen-TTS: qwen-tts (stable, currently equivalent to qwen-tts-2025-04-10), qwen-tts-latest (latest, currently equivalent to qwen-tts-2025-05-22), qwen-tts-2025-05-22 (snapshot), qwen-tts-2025-04-10 (snapshot)

Built-in voices

Voices vary by model. To specify a voice, set the voice parameter to the value in the voice parameter column of the tables below.

Qwen-TTS voice list:

`voice` parameter	Details	Supported languages	Supported models
`Cherry`	Voice name: Cherry Description: A sunny, positive, friendly, and natural young woman (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18 Qwen-TTS: qwen-tts, qwen-tts-2025-04-10, qwen-tts-latest, qwen-tts-2025-05-22
`Serena`	Voice name: Serena Description: A gentle young woman (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27 Qwen-TTS: qwen-tts, qwen-tts-2025-04-10, qwen-tts-latest, qwen-tts-2025-05-22
`Ethan`	Voice name: Ethan Description: Standard Mandarin with a slight northern accent. Sunny, warm, energetic, and vibrant (male)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18 Qwen-TTS: qwen-tts, qwen-tts-2025-04-10, qwen-tts-latest, qwen-tts-2025-05-22
`Chelsie`	Voice name: Chelsie Description: A two-dimensional virtual girlfriend (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27 Qwen-TTS: qwen-tts, qwen-tts-2025-04-10, qwen-tts-latest, qwen-tts-2025-05-22
`Momo`	Voice name: Momo Description: Playful and mischievous, cheering you up (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Vivian`	Voice name: Vivian Description: Confident, cute, and slightly feisty (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Moon`	Voice name: Moon Description: A bold and handsome man named Yuebai (male)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Maia`	Voice name: Maia Description: A blend of intellect and gentleness (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Kai`	Voice name: Kai Description: A soothing audio spa for your ears (male)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Nofish`	Voice name: Nofish Description: A designer who cannot pronounce retroflex sounds (male)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
`Bella`	Voice name: Bella Description: A little girl who drinks but never throws punches when drunk (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Jennifer`	Voice name: Jennifer Description: A premium, cinematic-quality American English female voice (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
`Ryan`	Voice name: Ryan Description: Full of rhythm, bursting with dramatic flair, balancing authenticity and tension (male)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
`Katerina`	Voice name: Katerina Description: A mature-woman voice with rich, memorable rhythm (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
`Aiden`	Voice name: Aiden Description: An American English young man skilled in cooking (male)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Eldric Sage`	Voice name: Eldric Sage Description: A calm and wise elder—weathered like a pine tree, yet clear-minded as a mirror (male)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Mia`	Voice name: Mia Description: Gentle as spring water, obedient as fresh snow (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Mochi`	Voice name: Mochi Description: A clever, quick-witted young adult—childlike innocence remains, yet wisdom shines through (male)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Bellona`	Voice name: Bellona Description: A powerful, clear voice that brings characters to life—so stirring it makes your blood boil. With heroic grandeur and perfect diction, this voice captures the full spectrum of human expression.	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Vincent`	Voice name: Vincent Description: A uniquely raspy, smoky voice—just one line evokes armies and heroic tales (male)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Bunny`	Voice name: Bunny Description: A little girl overflowing with "cuteness" (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Neil`	Voice name: Neil Description: A flat baseline intonation with precise, clear pronunciation—the most professional news anchor (male)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Elias`	Voice name: Elias Description: Maintains academic rigor while using storytelling techniques to turn complex knowledge into digestible learning modules (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
`Arthur`	Voice name: Arthur Description: A simple, earthy voice steeped in time and tobacco smoke—slowly unfolding village stories and curiosities (male)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Nini`	Voice name: Nini Description: A soft, clingy voice like sweet rice cakes—those drawn-out calls of “Big Brother” are so sweet they melt your bones (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Seren`	Voice name: Seren Description: A gentle, soothing voice to help you fall asleep faster. Good night, sweet dreams (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Pip`	Voice name: Pip Description: A playful, mischievous boy full of childlike wonder—is this your memory of Shin-chan? (male)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Stella`	Voice name: Stella Description: Normally a cloyingly sweet, dazed teenage-girl voice—but when shouting “I represent the moon to defeat you!”, she instantly radiates unwavering love and justice (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Instruct-Flash: qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-2026-01-26 Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Bodega`	Voice name: Bodega Description: A passionate Spanish man (male)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Sonrisa`	Voice name: Sonisa Description: A cheerful, outgoing Latin American woman (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Alek`	Voice name: Alek Description: Cold like the Russian spirit, yet warm like wool coat lining (male)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Dolce`	Voice name: Dolce Description: A laid-back Italian man (male)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Sohee`	Voice name: Sohee Description: A warm, cheerful, emotionally expressive Korean unnie (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Ono Anna`	Voice name: Ono Anna Description: A clever, spirited childhood friend (female)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Lenn`	Voice name: Lenn Description: Rational at heart, rebellious in detail—a German youth who wears suits and listens to post-punk	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Emilien`	Voice name: Emilien Description: A romantic French big brother (male)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Andre`	Voice name: Andre Description: A magnetic, natural, and steady male voice	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Radio Gol`	Voice name: Radio Gol Description: Football poet Radio Gol! Today I’ll commentate on football using my name (male)	Chinese (Mandarin), English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27
`Jada`	Voice name: Shanghai - Jada Description: A fast-paced, energetic Shanghai auntie (female)	Shanghainese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18 Qwen-TTS: qwen-tts-latest, qwen-tts-2025-05-22
`Dylan`	Voice name: Beijing - Dylan Description: A young man raised in Beijing’s hutongs (male)	Beijing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18 Qwen-TTS: qwen-tts-latest, qwen-tts-2025-05-22
`Li`	Voice name: Nanjing - Li Description: A patient yoga teacher (male)	Nanjing dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
`Marcus`	Voice name: Shaanxi - Marcus Description: Broad face, few words, sincere heart, deep voice—the authentic Shaanxi flavor (male)	Shaanxi dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
`Roy`	Voice name: Southern Min - Roy Description: A humorous, straightforward, lively Taiwanese guy (male)	Southern Min, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
`Peter`	Voice name: Tianjin - Peter Description: Tianjin-style crosstalk, professional foil (male)	Tianjin dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, and qwen3-tts-flash-2025-09-18
`Sunny`	Voice name: Sichuan - Sunny Description: A Sichuan girl sweet enough to melt your heart (female)	Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18 Qwen-TTS: qwen-tts-latest, qwen-tts-2025-05-22
`Eric`	Voice name: Sichuan - Eric Description: A Sichuanese man from Chengdu who stands out in everyday life (male)	Sichuan dialect, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, qwen3-tts-flash-2025-09-18
`Rocky`	Voice name: Cantonese - Rocky Description: A humorous, witty A Qiang providing live chat (male)	Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, and qwen3-tts-flash-2025-09-18
`Kiki`	Voice name: Cantonese - Kiki Description: A sweet Hong Kong girl best friend (female)	Cantonese, English, French, German, Russian, Italian, Spanish, Portuguese, Japanese, Korean	Qwen3-TTS-Flash: qwen3-tts-flash, qwen3-tts-flash-2025-11-27, and qwen3-tts-flash-2025-09-18