All Products
Search
Document Center

Alibaba Cloud Model Studio:Speech synthesis models

Last Updated:Apr 23, 2026

Choose the right model for speech synthesis, voice cloning, and voice design.

This page lists models for speech synthesis and voice services, including previous versions. Answer two questions to narrow your selection:

  1. Do you need a custom voice, or will a built-in one be sufficient?

  2. Do you need real-time streaming output, or is non-streaming acceptable?

Standard voice synthesis or a custom voice?

Standard voice synthesis

Use built-in voices without extra configuration. Select a model and a voice to start synthesis.

International

Model

Series

Key advantage

cosyvoice-v3-plus

CosyVoice

High quality, with a rich voice library

cosyvoice-v3-flash

CosyVoice

Fast synthesis

qwen3-tts-flash

Qwen3-TTS

Low latency, high quality

qwen3-tts-flash-2025-11-27

Qwen3-TTS

Low latency, high quality (snapshot version)

qwen3-tts-flash-2025-09-18

Qwen3-TTS

Low latency, high quality (snapshot version)

qwen3-tts-flash-realtime

Qwen3-TTS

Real-time streaming output, low latency

qwen3-tts-flash-realtime-2025-11-27

Qwen3-TTS

Real-time streaming output, low latency (snapshot version)

qwen3-tts-flash-realtime-2025-09-18

Qwen3-TTS

Real-time streaming output, low latency (snapshot version)

qwen3-tts-instruct-flash

Qwen3-TTS

Instruction control (speech rate, emotion, and style)

qwen3-tts-instruct-flash-2026-01-26

Qwen3-TTS

Instruction control (speech rate, emotion, and style) (snapshot version)

qwen3-tts-instruct-flash-realtime

Qwen3-TTS

Real-time streaming output and instruction control (speech rate, emotion, and style)

qwen3-tts-instruct-flash-realtime-2026-01-22

Qwen3-TTS

Real-time streaming output and instruction control (speech rate, emotion, and style) (snapshot version)

Chinese mainland

Model

Series

Key advantage

cosyvoice-v3.5-plus

CosyVoice

High quality, with a continuously updated voice library

cosyvoice-v3.5-flash

CosyVoice

Fast synthesis

cosyvoice-v3-plus

CosyVoice

High quality, with a rich voice library

cosyvoice-v3-flash

CosyVoice

Fast synthesis

cosyvoice-v2

CosyVoice

Legacy high-quality synthesis

cosyvoice-v1

CosyVoice

Legacy basic synthesis

qwen3-tts-flash

Qwen3-TTS

Low latency, high quality

qwen3-tts-flash-2025-11-27

Qwen3-TTS

Low latency, high quality (snapshot version)

qwen3-tts-flash-2025-09-18

Qwen3-TTS

Low latency, high quality (snapshot version)

qwen3-tts-flash-realtime

Qwen3-TTS

Real-time streaming output, low latency

qwen3-tts-flash-realtime-2025-11-27

Qwen3-TTS

Real-time streaming output, low latency (snapshot version)

qwen3-tts-flash-realtime-2025-09-18

Qwen3-TTS

Real-time streaming output, low latency (snapshot version)

qwen3-tts-instruct-flash

Qwen3-TTS

Instruction control (speech rate, emotion, and style)

qwen3-tts-instruct-flash-2026-01-26

Qwen3-TTS

Instruction control (speech rate, emotion, and style) (snapshot version)

qwen3-tts-instruct-flash-realtime

Qwen3-TTS

Real-time streaming output and instruction control (speech rate, emotion, and style)

qwen3-tts-instruct-flash-realtime-2026-01-22

Qwen3-TTS

Real-time streaming output and instruction control (speech rate, emotion, and style) (snapshot version)

MiniMax/speech-2.8-hd

MiniMax

High-fidelity speech synthesis

MiniMax/speech-02-hd

MiniMax

High-fidelity speech synthesis

MiniMax/speech-2.8-turbo

MiniMax

Low-latency, fast synthesis

MiniMax/speech-02-turbo

MiniMax

Low-latency, fast synthesis

Custom voice

Create unique voices from audio samples or text descriptions.

International

Model

Series

Key advantage

qwen3-tts-vc-2026-01-22

Qwen3-TTS

Voice cloning from audio samples

qwen3-tts-vc-realtime-2026-01-15

Qwen3-TTS

Real-time voice cloning

qwen3-tts-vc-realtime-2025-11-27

Qwen3-TTS

Real-time voice cloning

qwen3-tts-vd-2026-01-26

Qwen3-TTS

Voice design from text descriptions

qwen3-tts-vd-realtime-2026-01-15

Qwen3-TTS

Real-time voice design

qwen3-tts-vd-realtime-2025-12-16

Qwen3-TTS

Real-time voice design

qwen-voice-enrollment

Qwen Voice Cloning

Voice cloning (voice enrollment and management)

qwen-voice-design

Qwen Voice Design

Voice design (creating voices from text)

Note

Voice cloning vs. voice design: Voice cloning duplicates a specific voice from audio samples. Voice design creates a new voice from a text description, such as "a warm, low-pitched female voice". Use voice cloning when you have a target voice. Use voice design when you want to create a voice from scratch.

Control voice expression

Three options are available, ordered by flexibility:

  1. Instruction control (qwen3-tts-instruct-flash, qwen3-tts-instruct-flash-realtime): Use natural language to describe the desired expression style and control speech rate, emotion, and style on demand.

  2. Voice design (qwen3-tts-vd-*): Creates a custom voice from a text description. Ideal for creating a brand voice without audio samples.

  3. Voice cloning (qwen3-tts-vc-*): Copies an existing voice from an audio sample. Suitable for replicating a specific person's voice.

Full comparison

Model

Series

Streaming

Custom voice

Instruction control

cosyvoice-v3.5-plus

CosyVoice

Supported

Not supported

Not supported

cosyvoice-v3.5-flash

CosyVoice

Supported

Not supported

Not supported

cosyvoice-v3-plus

CosyVoice

Supported

Not supported

Not supported

cosyvoice-v3-flash

CosyVoice

Supported

Not supported

Not supported

cosyvoice-v2

CosyVoice

Supported

Not supported

Not supported

qwen3-tts-flash

Qwen3-TTS

Supported

Not supported

Not supported

qwen3-tts-flash-2025-11-27

Qwen3-TTS

Supported

Not supported

Not supported

qwen3-tts-flash-2025-09-18

Qwen3-TTS

Supported

Not supported

Not supported

qwen3-tts-flash-realtime

Qwen3-TTS

Supported

Not supported

Not supported

qwen3-tts-flash-realtime-2025-11-27

Qwen3-TTS

Supported

Not supported

Not supported

qwen3-tts-flash-realtime-2025-09-18

Qwen3-TTS

Supported

Not supported

Not supported

qwen3-tts-instruct-flash

Qwen3-TTS

Supported

Not supported

Supported

qwen3-tts-instruct-flash-2026-01-26

Qwen3-TTS

Supported

Not supported

Supported

qwen3-tts-instruct-flash-realtime

Qwen3-TTS

Supported

Not supported

Supported

qwen3-tts-instruct-flash-realtime-2026-01-22

Qwen3-TTS

Supported

Not supported

Supported

qwen3-tts-vc-2026-01-22

Voice cloning

Not supported

Supported

Not supported

qwen3-tts-vc-realtime-2026-01-15

Voice cloning

Supported

Supported

Not supported

qwen3-tts-vc-realtime-2025-11-27

Voice cloning

Supported

Supported

Not supported

qwen3-tts-vd-2026-01-26

Voice design

Not supported

Supported

Not supported

qwen3-tts-vd-realtime-2026-01-15

Voice design

Supported

Supported

Not supported

qwen3-tts-vd-realtime-2025-12-16

Voice design

Supported

Supported

Not supported

qwen-tts

Qwen-TTS (Legacy)

Not supported (full-passage generation)

Not supported

Not supported

qwen-tts-latest

Qwen-TTS (Legacy)

Not supported (full-passage generation)

Not supported

Not supported

qwen-tts-2025-05-22

Qwen-TTS (Legacy)

Not supported (full-passage generation)

Not supported

Not supported

qwen-tts-2025-04-10

Qwen-TTS (Legacy)

Not supported (full-passage generation)

Not supported

Not supported

qwen-tts-realtime

Qwen-TTS (Legacy)

Supported

Not supported

Not supported

qwen-tts-realtime-latest

Qwen-TTS (Legacy)

Supported

Not supported

Not supported

qwen-tts-realtime-2025-07-15

Qwen-TTS (Legacy)

Supported

Not supported

Not supported

qwen-voice-enrollment

Voice service

N/A

Supported (voice enrollment)

Not supported

qwen-voice-design

Voice service

N/A

Supported (voice design)

Not supported

Legacy models (Qwen-TTS, token-based billing)

Legacy Qwen-TTS models use token-based billing and are accessible over HTTP or WebSocket. If you have migrated to Qwen3-TTS, use the standard speech synthesis models above.

International

Model

Access method

Description

qwen-tts

HTTP

Non-streaming synthesis, token-based billing

qwen-tts-latest

HTTP

Non-streaming synthesis, token-based billing

qwen-tts-2025-05-22

HTTP

Snapshot version, token-based billing

qwen-tts-2025-04-10

HTTP

Snapshot version, token-based billing

qwen-tts-realtime

WebSocket

Streaming synthesis, token-based billing

qwen-tts-realtime-latest

WebSocket

Streaming synthesis, token-based billing

qwen-tts-realtime-2025-07-15

WebSocket

Snapshot version, streaming synthesis, token-based billing