Qwen3-TTS is a series of powerful speech generation capabilities developed by Qwen, offering comprehensive support for voice clone, voice design, ultra-high-quality human-like speech generation, and natural language-based voice control. It provides developers and users with the most extensive set of speech generation features available. Powered by the innovative Qwen3-TTS-Tokenizer-12Hz multi-codebook speech encoder, Qwen3-TTS achieves efficient compression and robust representation of speech signals. This not only fully preserves paralinguistic information and acoustic environmental features but also enables high-speed, high-fidelity speech reconstruction via a lightweight non-DiT architecture. Utilizing Dual-Track modeling, Qwen3-TTS achieves extreme bidirectional streaming generation speeds, where the first audio packet is delivered after processing just a single character. The entire Qwen3-TTS multi-codebook model series is now open-sourced, featuring two sizes: 1.7B and 0.6B. The 1.7B model delivers peak performance and powerful control capabilities, while the 0.6B model offers an ideal balance between performance and efficiency. The models support 10 mainstream languages (Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, and Italian) along with various dialects to meet global application demands. Furthermore, the models exhibit strong contextual understanding, allowing them to adapt tone, rhythm, and emotional expression based on instructions and text semantics, while significantly improving robustness to input text noise. Now open-sourced on GitHub and accessible via the Qwen API.
| Model | Features | Language Support | Streaming | Instruction Control |
|---|---|---|---|---|
| Qwen3-TTS-12Hz-1.7B-VoiceDesign | Performs voice design based on user-provided descriptions. | Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian | ✅ | ✅ |
| Qwen3-TTS-12Hz-1.7B-CustomVoice | Provides style control over target timbres via user instructions; supports 9 premium timbres covering various combinations of gender, age, language, and dialect. | Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian | ✅ | ✅ |
| Qwen3-TTS-12Hz-1.7B-Base | Base model capable of 3-second rapid voice clone from user audio input; can be used for fine-tuning (FT) other models. | Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian | ✅ | |
| Model | Features | Language Support | Streaming | Instruction Control |
|---|---|---|---|---|
| Qwen3-TTS-12Hz-0.6B-CustomVoice | Supports 9 premium timbres covering various combinations of gender, age, language, and dialect. | Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian | ✅ | |
| Qwen3-TTS-12Hz-0.6B-Base | Base model capable of 3-second rapid voice clone from user audio input; can be used for fine-tuning (FT) other models. | Chinese, English, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian | ✅ | |
Main Features:

We have conducted a comprehensive evaluation of Qwen3-TTS across dimensions such as voice clone, voice design, and control. The results demonstrate that it has achieved SOTA performance across multiple metrics. Specifically:


We evaluated Qwen-TTS-Tokenizer for speech reconstruction. Results on the LibriSpeech test-clean set demonstrate that it achieves SOTA performance across all key metrics. Specifically, in Perceptual Evaluation of Speech Quality (PESQ), Qwen-TTS-Tokenizer achieved scores of 3.21 and 3.68 in wideband and narrowband respectively, significantly leading similar tokenizers. In Short-Time Objective Intelligibility (STOI) and UTMOS, Qwen-TTS-Tokenizer achieved scores of 0.96 and 4.16, demonstrating superior reconstruction quality. In speaker similarity, Qwen-TTS-Tokenizer achieved a score of 0.95, significantly surpassing comparison models, indicating its near-lossless speaker information preservation capability.

1,326 posts | 464 followers
FollowAlibaba Cloud Community - December 31, 2025
Alibaba Cloud Community - March 27, 2025
Alibaba Cloud Community - March 9, 2023
Alibaba Cloud Community - November 7, 2025
Alibaba Cloud Community - December 17, 2025
Alibaba Clouder - April 20, 2018
1,326 posts | 464 followers
Follow
AI Acceleration Solution
Accelerate AI-driven business and AI model training and inference with Alibaba Cloud GPU technology
Learn More
Offline Visual Intelligence Software Packages
Offline SDKs for visual production, such as image segmentation, video segmentation, and character recognition, based on deep learning technologies developed by Alibaba Cloud.
Learn More
Tongyi Qianwen (Qwen)
Top-performance foundation models from Alibaba Cloud
Learn More
Network Intelligence Service
Self-service network O&M service that features network status visualization and intelligent diagnostics capabilities
Learn MoreMore Posts by Alibaba Cloud Community