All Products
Search
Document Center

Alibaba Cloud Model Studio:Speech synthesis (Qwen-TTS)

Last Updated:Mar 15, 2026

This topic describes the request and response parameters for the Qwen speech synthesis model.

Model usage: Speech synthesis - Qwen

Request body

Non-streaming output

Python

The SpeechSynthesizer interface in the DashScope Python SDK is now unified under MultiModalConversation. Its usage and parameters remain fully consistent.
# Install the latest version of the DashScope SDK
import os
import dashscope

# This is the URL for the Singapore region. If using a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'

text = "Let me recommend a T-shirt to everyone. This one is really super nice. The color is very elegant, and it's also a perfect item to match. Everyone can buy it without hesitation. It's truly beautiful and very forgiving on the figure. No matter what body type you have, it will look great. I recommend everyone to place an order."
# SpeechSynthesizer interface usage: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...)
response = dashscope.MultiModalConversation.call(
    # To use the instruction control feature, replace the model with qwen3-tts-instruct-flash
    model="qwen3-tts-flash",
    # The API keys for Singapore and Beijing regions are different. Get your API Key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If the environment variable is not configured, replace the following line with your Model Studio API key: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    text=text,
    voice="Cherry"
    # To use the instruction control feature, uncomment the following line and replace the model with qwen3-tts-instruct-flash
    # instructions='Fast speech rate, with a clear rising intonation, suitable for introducing fashion products.',
    # optimize_instructions=True
)
print(response)

Java

// Install the latest version of the DashScope SDK
import com.alibaba.dashscope.aigc.multimodalconversation.AudioParameters;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    // To use the instruction control feature, replace MODEL with qwen3-tts-instruct-flash
    private static final String MODEL = "qwen3-tts-flash";
    public static void call() throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .model(MODEL)
                // The API keys for Singapore and Beijing regions are different. Get your API Key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // If the environment variable is not configured, replace the following line with your Model Studio API key: apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .text("Today is a wonderful day to build something people love!")
                .voice(AudioParameters.Voice.CHERRY)
                .languageType("English")
                // To use the instruction control feature, uncomment the following line and replace the model with qwen3-tts-instruct-flash
                // .parameter("instructions","Fast speech rate, with a clear rising intonation, suitable for introducing fashion products.")
                // .parameter("optimize_instructions",true)
                .build();
        MultiModalConversationResult result = conv.call(param);
        System.out.println(JsonUtils.toJson(result));
    }
    public static void main(String[] args) {
        // This is the URL for the Singapore region. If using a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
        try {
            call();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

# ======= IMPORTANT NOTE =======
# This is the URL for the Singapore region. If using a model in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# The API keys for Singapore and Beijing regions are different. Get your API Key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If the environment variable is not configured, replace $DASHSCOPE_API_KEY with your Model Studio API key: sk-xxx.
# === DELETE THIS COMMENT WHEN EXECUTING ===

curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
    "model": "qwen3-tts-flash",
    "input": {
        "text": "Let me recommend a T-shirt to everyone. This one is really super nice. The color is very elegant, and it's also a perfect item to match. Everyone can buy it without hesitation. It's truly beautiful and very forgiving on the figure. No matter what body type you have, it will look great. I recommend everyone to place an order.",
        "voice": "Cherry",
        "language_type": "Chinese"
    }
}'

Streaming output

Python

The SpeechSynthesizer interface in the DashScope Python SDK is now unified under MultiModalConversation. To use the new interface, replace only the interface name. All other parameters remain fully compatible.
# DashScope SDK version 1.24.5 or later
import os
import dashscope

# The following URL is for the Singapore region. If you use models in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
dashscope.base_http_api_url = 'https://dashscope-intl.aliyuncs.com/api/v1'
text = "Let me recommend a T-shirt to you. This one is truly stunning. Its color highlights your elegance and makes it an ideal match for any outfit. You can buy it without hesitation - it looks great on everyone. It flatters all body types. Whether you're tall, short, slim, or curvy, this T-shirt suits you perfectly. We highly recommend ordering it."
# Use SpeechSynthesizer as follows: dashscope.audio.qwen_tts.SpeechSynthesizer.call(...)
response = dashscope.MultiModalConversation.call(
    # To use instruction control, set model to qwen3-tts-instruct-flash
    model="qwen3-tts-flash",
    # API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
    # If you have not set an environment variable, replace the next line with: api_key="sk-xxx"
    api_key=os.getenv("DASHSCOPE_API_KEY"),
    text=text,
    voice="Cherry",
    # To use instruction control, uncomment the lines below and set model to qwen3-tts-instruct-flash
    # instructions='Speak quickly with a clear rising intonation, suitable for promoting fashion items.',
    # optimize_instructions=True,
    stream=True
)
for chunk in response:
    print(chunk)

Java

// DashScope SDK version 2.19.0 or later
import com.alibaba.dashscope.aigc.multimodalconversation.AudioParameters;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversation;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationParam;
import com.alibaba.dashscope.aigc.multimodalconversation.MultiModalConversationResult;
import com.alibaba.dashscope.exception.ApiException;
import com.alibaba.dashscope.exception.NoApiKeyException;
import com.alibaba.dashscope.exception.UploadFileException;
import com.alibaba.dashscope.utils.JsonUtils;
import io.reactivex.Flowable;
import com.alibaba.dashscope.utils.Constants;

public class Main {
    // To use instruction control, set MODEL to qwen3-tts-instruct-flash
    private static final String MODEL = "qwen3-tts-flash";
    public static void streamCall() throws ApiException, NoApiKeyException, UploadFileException {
        MultiModalConversation conv = new MultiModalConversation();
        MultiModalConversationParam param = MultiModalConversationParam.builder()
                .model(MODEL)
                // API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
                // If you have not set an environment variable, replace the next line with: .apiKey("sk-xxx")
                .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                .text("Today is a wonderful day to build something people love!")
                .voice(AudioParameters.Voice.CHERRY)
                .languageType("English")
                // To use instruction control, uncomment the lines below and set model to qwen3-tts-instruct-flash
                // .parameter("instructions","Speak quickly with a clear rising intonation, suitable for promoting fashion items.")
                // .parameter("optimize_instructions",true)
                .build();
        Flowable<MultiModalConversationResult> result = conv.streamCall(param);
        result.blockingForEach(r -> {System.out.println(JsonUtils.toJson(r));
        });
    }
    public static void main(String[] args) {
        // The following URL is for the Singapore region. If you use models in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1
        Constants.baseHttpApiUrl = "https://dashscope-intl.aliyuncs.com/api/v1";
        try {
            streamCall();
        } catch (ApiException | NoApiKeyException | UploadFileException e) {
            System.out.println(e.getMessage());
        }
        System.exit(0);
    }
}

curl

# ======= Important notice =======
# The following URL is for the Singapore region. If you use models in the Beijing region, replace the URL with: https://dashscope.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation
# API keys differ between the Singapore and Beijing regions. Get your API key: https://www.alibabacloud.com/help/en/model-studio/get-api-key
# If you have not set an environment variable, replace $DASHSCOPE_API_KEY with your Model Studio API key: sk-xxx.
# === Remove this comment before running ===

curl -X POST 'https://dashscope-intl.aliyuncs.com/api/v1/services/aigc/multimodal-generation/generation' \
-H "Authorization: Bearer $DASHSCOPE_API_KEY" \
-H 'Content-Type: application/json' \
-H 'X-DashScope-SSE: enable' \
-d '{
    "model": "qwen3-tts-flash",
    "input": {
        "text": "Let me recommend a T-shirt to you. This one is truly stunning. Its color highlights your elegance and makes it an ideal match for any outfit. You can buy it without hesitation - it looks great on everyone. It flatters all body types. Whether you're tall, short, slim, or curvy, this T-shirt suits you perfectly. We highly recommend ordering it.",
        "voice": "Cherry",
        "language_type": "Chinese"
    }
}'
To play Base64-encoded audio in real time, see Speech synthesis - Qwen.

model string (Required)

The model name. For details, see Supported models.

text string (Required)

The text to synthesize (mixed-language supported). Max input: Qwen-TTS supports 512 tokens, other models support 600 characters.

voice string (Required)

The voice to use. See Supported system voices.

language_type string (Optional)

Specify the language of the synthesized audio. The default value is Auto.

  • Auto: Use when text language is uncertain or contains multiple languages. The model automatically matches pronunciation for different language segments, but accuracy is not guaranteed.

  • Specify language: Use when text is in a single language. Specifying the exact language significantly improves synthesis quality and usually outperforms Auto. Supported values include the following:

    • Chinese

    • English

    • German

    • Italian

    • Portuguese

    • Spanish

    • Japanese

    • Korean

    • French

    • Russian

instructions string (Optional)

Provide instructions to guide speech synthesis. See Real-time speech synthesis - Qwen. Defaults to None (no effect if not set).

  • Length limit: Max 1600 tokens.

  • Supported languages: Chinese and English only.

  • Scope: Qwen3-TTS-Instruct-Flash-Realtime model series only.

optimize_instructions boolean (Optional)

Optimize instructions to improve speech naturalness and expressiveness. Defaults to false.

Behavior: When true, the system semantically enhances and rewrites instructions to generate internal instructions better suited for speech synthesis.

Scenarios: Enable for high-quality, fine-grained speech expression.

Dependency: Requires instructions parameter. Has no effect if instructions is empty.

Scope: This feature applies only to the Qwen3-TTS-Instruct-Flash model series.

stream boolean (Optional) Defaults to: false

Stream the response. Valid values:

  • false: Return the audio URL after generation completes.

  • true: Output Base64-encoded audio data as it's generated. Read segments in real time to obtain the complete result. See Speech synthesis - Qwen.

This parameter is for Python SDK only. To achieve streaming output with Java SDK, call streamCall. For HTTP, set the header X-DashScope-SSE: enable.

Return object (Same for streaming and non-streaming outputs)

Qwen3-TTS-Flash

{
    "status_code": 200,
    "request_id": "5c63c65c-cad8-4bf4-959d-xxxxxxxxxxxx",
    "code": "",
    "message": "",
    "output": {
        "text": null,
        "finish_reason": "stop",
        "choices": null,
        "audio": {
            "data": "",
            "url": "http://dashscope-result-bj.oss-cn-beijing.aliyuncs.com/1d/ab/20251218/d2033070/39b6d8f2-c0db-4daa-9073-5d27bfb66b78.wav?Expires=1766113409&OSSAccessKeyId=LTAI5xxxxxxxxxxxx&Signature=NOrqxxxxxxxxxxxx%3D",
            "id": "audio_5c63c65c-cad8-4bf4-959d-xxxxxxxxxxxx",
            "expires_at": 1766113409
        }
    },
    "usage": {
        "input_tokens": 0,
        "output_tokens": 0,
        "characters": 195
    }
}

Qwen-TTS

{
    "status_code": 200,
    "request_id": "f4e8139b-3203-4887-92cb-xxxxxxxxxxxx",
    "code": "",
    "message": "",
    "output": {
        "text": null,
        "finish_reason": "stop",
        "choices": null,
        "audio": {
            "data": "",
            "url": "http://dashscope-result-wlcb.oss-cn-wulanchabu.aliyuncs.com/1d/50/20251218/e6c1b9cc/9acec74e-e317-4dbd-9e76-745c47bcbf2d.wav?Expires=1766116806&OSSAccessKeyId=LTAxxxxxxxxx&Signature=afYZxxxxxxxxxxxx%2FAX9bk%3D",
            "id": "audio_f4e8139b-3203-4887-92cb-xxxxxxxxxxxx",
            "expires_at": 1766116806
        }
    },
    "usage": {
        "input_tokens": 76,
        "output_tokens": 1045,
        "characters": 0,
        "input_tokens_details": {
            "text_tokens": 76
        },
        "output_tokens_details": {
            "audio_tokens": 1045,
            "text_tokens": 0
        },
        "total_tokens": 1121
    }
}

status_code integer

HTTP status code (follows RFC 9110). Examples:
• 200: Request successful and normal result returned
• 400: Client parameter error
• 401: Unauthorized
• 404: Not found
• 500: Server error













request_id string

Unique request ID. Use it to locate and troubleshoot issues.

code string

Error code (returned when request fails). See Error messages.

message string

Error message (returned when request fails). See Error messages.

output object

Model output.

Properties

text string

Always null. Ignore this field.

choices string

Always null. Ignore this field.

finish_reason string

There are two scenarios:

  • "null": While generation is in progress.

  • "stop": When the model output ends naturally or when a stop condition specified in the input parameters is triggered.

audio object

Audio information from model output.

Properties

url string

URL of the complete audio file (valid for 24 hours).

data string

Base64-encoded audio data for streaming output.

id string

ID for the audio information.

expires_at integer

The UNIX timestamp when the URL expires.

usage object

Token or character consumption for this request. Qwen-TTS returns tokens, Qwen3-TTS-Flash returns characters.

Properties

input_tokens_details object

Input text token consumption (Qwen-TTS only).

Properties

text_tokens integer

The number of tokens consumed by the input text.

total_tokens integer

Total tokens consumed (Qwen-TTS only).

output_tokens integer

Tokens consumed by output audio (always 0 for Qwen3-TTS-Flash).

input_tokens integer

Tokens consumed by input text (always 0 for Qwen3-TTS-Flash).

output_tokens_details object

Output token consumption (Qwen-TTS only).

Properties

audio_tokens integer

Tokens consumed by output audio.

text_tokens integer

Tokens consumed by output text (currently fixed at 0).

characters integer

Character count in input text (Qwen3-TTS-Flash only).

request_id string

Request ID.