All Products
Search
Document Center

Alibaba Cloud Model Studio:SSML

Last Updated:May 12, 2026

Use SSML (Speech Synthesis Markup Language) to fine-tune speech characteristics such as speed, pauses, and pronunciation.

Overview

SSML (Speech Synthesis Markup Language) is an XML-based markup language for speech synthesis. Embed SSML tags in your text to control speech rate, intonation, pauses, and volume, or to add background music and sound effects for richer audio output.

Typical use cases include:

  • Audiobooks: control pauses and speech rate with precision, and add background music for an immersive listening experience

  • Intelligent customer service: use the <say-as> tag to ensure accurate reading of phone numbers, dates, and similar information

  • Multilingual broadcasting: use the <phoneme> tag to specify precise pronunciations for foreign words

  • Online education: convert LaTeX formulas into natural speech with the formula reading feature

Both features are available for the CosyVoice model family. For model selection guidance, see Speech synthesis.

SSML

Limitations

  • Models: SSML is supported only by cosyvoice-v3.5-flash, cosyvoice-v3.5-plus, cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2.

  • Voices: Only cloned voices and preset voices marked as SSML-compatible in Voice list are supported.

  • APIs: SSML is supported through the following APIs:

    • Java SDK (version 2.20.3 or later): non-streaming and unidirectional streaming calls

    • Python SDK (version 1.23.4 or later): non-streaming and unidirectional streaming calls

    • WebSocket API: set the enable_ssml parameter to true, and send a single continue-task event

Quick start

The example below uses SSML to control speech rate during synthesis. Before running the code, complete these prerequisites:

  1. Create an API key

  2. Install the DashScope SDK (Python 1.23.4 or later, Java 2.20.3 or later). For details, see Install the SDK.

Important

The cosyvoice-v3.5-plus and cosyvoice-v3.5-flash models are currently available only in the Beijing region and are designed exclusively for voice cloning scenarios (no preset voices are provided). Before using these models, create a target voice by following the instructions in Voice cloning.

Java SDK

Non-streaming call

import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import com.alibaba.dashscope.utils.Constants;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;

/**
 * SSML feature description:
 *     1. Only non-streaming calls and unidirectional streaming calls support the SSML feature
 *     2. Only cloned voices of the cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models, as well as system voices marked as SSML-supported in the voice list, support the SSML feature (for example, the longanyang voice of the cosyvoice-v3-flash model)
 */
public class Main {
    private static String model = "cosyvoice-v3-flash";
    private static String voice = "longanyang";

    public static void main(String[] args) {
        // The following is the Singapore region URL. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference
        Constants.baseWebsocketApiUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference";
        streamAudioDataToSpeaker();
        System.exit(0);
    }

    public static void streamAudioDataToSpeaker() {
        SpeechSynthesisParam param =
                SpeechSynthesisParam.builder()
                        // The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                        // If you have not configured an environment variable, replace the following line with your Model Studio API Key: .apiKey("sk-xxx")
                        .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                        .model(model)
                        .voice(voice)
                        .build();

        SpeechSynthesizer synthesizer = new SpeechSynthesizer(param, null);
        ByteBuffer audio = null;
        try {
            // Non-streaming call, blocks until audio is returned
            // Special characters need to be escaped
            audio = synthesizer.call("<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>");
        } catch (Exception e) {
            throw new RuntimeException(e);
        } finally {
            // Close the WebSocket connection when the task ends
            synthesizer.getDuplexApi().close(1000, "bye");
        }
        if (audio != null) {
            // Save audio data to local file "output.mp3"
            File file = new File("output.mp3");
            try (FileOutputStream fos = new FileOutputStream(file)) {
                fos.write(audio.array());
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        }

        // The first text transmission requires establishing a WebSocket connection, so the first packet latency includes the connection setup time
        System.out.println(
                "[Metric] requestId: "
                        + synthesizer.getLastRequestId()
                        + ", first packet latency (ms): "
                        + synthesizer.getFirstPackageDelay());
    }
}

Unidirectional streaming call

import com.alibaba.dashscope.audio.tts.SpeechSynthesisResult;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisAudioFormat;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.utils.Constants;

import java.io.FileOutputStream;
import java.io.IOException;
import java.util.concurrent.CountDownLatch;

/**
 * SSML feature description:
 *     1. Only non-streaming calls and unidirectional streaming calls support the SSML feature
 *     2. Only cloned voices of the cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models, as well as system voices marked as SSML-supported in the voice list, support the SSML feature (for example, the longanyang voice of the cosyvoice-v3-flash model)
 */
public class Main {
    private static String model = "cosyvoice-v3-flash";
    private static String voice = "longanyang";

    public static void main(String[] args) {
        // The following is the Singapore region URL. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference
        Constants.baseWebsocketApiUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference";
        streamAudioDataToSpeaker();
        System.out.println("Audio has been saved to the output.mp3 file");
        System.exit(0);
    }

    public static void streamAudioDataToSpeaker() {
        CountDownLatch latch = new CountDownLatch(1);
        final FileOutputStream[] fileOutputStream = new FileOutputStream[1];

        try {
            fileOutputStream[0] = new FileOutputStream("output.mp3");
        } catch (IOException e) {
            System.err.println("Failed to create output file: " + e.getMessage());
            return;
        }

        // Implement the ResultCallback interface
        ResultCallback<SpeechSynthesisResult> callback = new ResultCallback<SpeechSynthesisResult>() {
            @Override
            public void onEvent(SpeechSynthesisResult result) {
                if (result.getAudioFrame() != null) {
                    // Write audio data to local file
                    try {
                        byte[] audioData = result.getAudioFrame().array();
                        fileOutputStream[0].write(audioData);
                        fileOutputStream[0].flush();
                    } catch (IOException e) {
                        System.err.println("Failed to write audio data: " + e.getMessage());
                    }
                }
            }

            @Override
            public void onComplete() {
                System.out.println("Received Complete, speech synthesis finished");
                closeFileOutputStream(fileOutputStream[0]);
                latch.countDown();
            }

            @Override
            public void onError(Exception e) {
                System.out.println("An error occurred: " + e.toString());
                closeFileOutputStream(fileOutputStream[0]);
                latch.countDown();
            }
        };

        SpeechSynthesisParam param =
                SpeechSynthesisParam.builder()
                        // The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                        // If you have not configured an environment variable, replace the following line with your Model Studio API Key: .apiKey("sk-xxx")
                        .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                        .model(model)
                        .voice(voice)
                        .format(SpeechSynthesisAudioFormat.MP3_22050HZ_MONO_256KBPS)
                        .build();

        SpeechSynthesizer synthesizer = new SpeechSynthesizer(param, callback);

        try {
            // Unidirectional streaming call, returns null immediately (actual results are delivered asynchronously via the callback interface), retrieve binary audio in real time from the onEvent method of the callback interface
            // Special characters need to be escaped
            synthesizer.call("<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>");
            // Wait for synthesis to complete
            latch.await();
        } catch (Exception e) {
            throw new RuntimeException(e);
        } finally {
            // Close the WebSocket connection after the task ends
            try {
                synthesizer.getDuplexApi().close(1000, "bye");
            } catch (Exception e) {
                System.err.println("Failed to close WebSocket connection: " + e.getMessage());
            }

            // Ensure the file stream is closed
            closeFileOutputStream(fileOutputStream[0]);
        }

        // The first text transmission requires establishing a WebSocket connection, so the first packet latency includes the connection setup time
        System.out.println(
                "[Metric] requestId: "
                        + synthesizer.getLastRequestId()
                        + ", first packet latency (ms): "
                        + synthesizer.getFirstPackageDelay());
    }

    private static void closeFileOutputStream(FileOutputStream fileOutputStream) {
        try {
            if (fileOutputStream != null) {
                fileOutputStream.close();
            }
        } catch (IOException e) {
            System.err.println("Failed to close file stream: " + e.getMessage());
        }
    }
}

Python SDK

Non-streaming call

# coding=utf-8
# SSML feature description:
#     1. Only non-streaming calls and unidirectional streaming calls support the SSML feature
#     2. Only cloned voices of the cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models, as well as system voices marked as SSML-supported in the voice list, support the SSML feature (for example, the longanyang voice of the cosyvoice-v3-flash model)

import dashscope
from dashscope.audio.tts_v2 import *
import os

# The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured an environment variable, replace the following line with your Model Studio API Key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY')

# The following is the Singapore region URL. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference
dashscope.base_websocket_api_url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference'

# Model
model = "cosyvoice-v3-flash"
# Voice
voice = "longanyang"

# Instantiate SpeechSynthesizer and pass request parameters such as model and voice in the constructor
synthesizer = SpeechSynthesizer(model=model, voice=voice)
# Non-streaming call, blocks until audio is returned
# Special characters need to be escaped
audio = synthesizer.call("<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>")

# Save the audio to a local file
with open('output.mp3', 'wb') as f:
    f.write(audio)

# The first text transmission requires establishing a WebSocket connection, so the first packet latency includes the connection setup time
print('[Metric] requestId: {}, first packet latency: {} ms'.format(
    synthesizer.get_last_request_id(),
    synthesizer.get_first_package_delay()))

Unidirectional streaming call

# coding=utf-8
# SSML feature description:
#     1. Only non-streaming calls and unidirectional streaming calls support the SSML feature
#     2. Only cloned voices of the cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models, as well as system voices marked as SSML-supported in the voice list, support the SSML feature (for example, the longanyang voice of the cosyvoice-v3-flash model)

import dashscope
from dashscope.audio.tts_v2 import *
import os
from datetime import datetime

def get_timestamp():
    now = datetime.now()
    formatted_timestamp = now.strftime("[%Y-%m-%d %H:%M:%S.%f]")
    return formatted_timestamp

# The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured an environment variable, replace the following line with your Model Studio API Key: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY')

# The following is the Singapore region URL. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference
dashscope.base_websocket_api_url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference'

# Model
model = "cosyvoice-v3-flash"
# Voice
voice = "longanyang"

# Define the callback interface
class Callback(ResultCallback):
    _player = None
    _stream = None

    def on_open(self):
        # Open the output file for writing audio data
        self.file = open("output.mp3", "wb")
        print("Connection established: " + get_timestamp())

    def on_complete(self):
        print("Speech synthesis completed, all results have been received: " + get_timestamp())
        if hasattr(self, 'file') and self.file:
            self.file.close()
        self
        # The first text transmission requires establishing a WebSocket connection, so the first packet latency includes the connection setup time
        print('[Metric] requestId: {}, first packet latency: {} ms'.format(
            self.synthesizer.get_last_request_id(),
            self.synthesizer.get_first_package_delay()))

    def on_error(self, message: str):
        print(f"Speech synthesis error occurred: {message}")
        if hasattr(self, 'file') and self.file:
            self.file.close()

    def on_close(self):
        print("Connection closed: " + get_timestamp())
        if hasattr(self, 'file') and self.file:
            self.file.close()

    def on_event(self, message):
        pass

    def on_data(self, data: bytes) -> None:
        print(get_timestamp() + " Binary audio length: " + str(len(data)))
        # Write audio data to file
        self.file.write(data)

callback = Callback()

# Instantiate SpeechSynthesizer and pass request parameters such as model and voice in the constructor
synthesizer = SpeechSynthesizer(
    model=model,
    voice=voice,
    callback=callback,
)

# Assign the synthesizer instance to callback for use in on_complete
callback.synthesizer = synthesizer

# Unidirectional streaming call, send text for synthesis, retrieve binary audio in real time from the on_data method of the callback interface
# Special characters need to be escaped
synthesizer.call("<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>")

WebSocket API

Go

// SSML feature description:
//     1. When sending the run-task command, set the enable_ssml parameter to true to enable SSML support
//     2. Send text containing SSML via the continue-task command, and only one continue-task command is allowed
//     3. Only cloned voices of the cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models, as well as system voices marked as SSML-supported in the voice list, support the SSML feature (for example, the longanyang voice of the cosyvoice-v3-flash model)

package main

import (
    "encoding/json"
    "fmt"
    "net/http"
    "os"
    "strings"
    "time"

    "github.com/google/uuid"
    "github.com/gorilla/websocket"
)

const (
    // The following is the Singapore region URL. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference/
    wsURL      = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/"
    outputFile = "output.mp3"
)

func main() {
    // The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    // If you have not configured an environment variable, replace the following line with your Model Studio API Key: apiKey := "sk-xxx"
    apiKey := os.Getenv("DASHSCOPE_API_KEY")

    // Clear the output file
    os.Remove(outputFile)
    os.Create(outputFile)

    // Connect to WebSocket
    header := make(http.Header)
    header.Add("X-DashScope-DataInspection", "enable")
    header.Add("Authorization", fmt.Sprintf("bearer %s", apiKey))

    conn, resp, err := websocket.DefaultDialer.Dial(wsURL, header)
    if err != nil {
        if resp != nil {
            fmt.Printf("Connection failed, HTTP status code: %d\n", resp.StatusCode)
        }
        fmt.Println("Connection failed:", err)
        return
    }
    defer conn.Close()

    // Generate task ID
    taskID := uuid.New().String()
    fmt.Printf("Generated task ID: %s\n", taskID)

    // Send run-task command
    runTaskCmd := map[string]interface{}{
        "header": map[string]interface{}{
            "action":    "run-task",
            "task_id":   taskID,
            "streaming": "duplex",
        },
        "payload": map[string]interface{}{
            "task_group": "audio",
            "task":       "tts",
            "function":   "SpeechSynthesizer",
            "model":      "cosyvoice-v3-flash",
            "parameters": map[string]interface{}{
                "text_type":   "PlainText",
                "voice":       "longanyang",
                "format":      "mp3",
                "sample_rate": 22050,
                "volume":      50,
                "rate":        1,
                "pitch":       1,
                // If enable_ssml is set to true, only one continue-task command is allowed; otherwise the error "Text request limit violated, expected 1." will be returned
                "enable_ssml": true,
            },
            "input": map[string]interface{}{},
        },
    }

    runTaskJSON, _ := json.Marshal(runTaskCmd)
    fmt.Printf("Sending run-task command: %s\n", string(runTaskJSON))

    err = conn.WriteMessage(websocket.TextMessage, runTaskJSON)
    if err != nil {
        fmt.Println("Failed to send run-task:", err)
        return
    }

    textSent := false

    // Process messages
    for {
        messageType, message, err := conn.ReadMessage()
        if err != nil {
            fmt.Println("Failed to read message:", err)
            break
        }

        // Process binary messages
        if messageType == websocket.BinaryMessage {
            fmt.Printf("Received binary message, length: %d\n", len(message))
            file, _ := os.OpenFile(outputFile, os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0644)
            file.Write(message)
            file.Close()
            continue
        }

        // Process text messages
        messageStr := string(message)
        fmt.Printf("Received text message: %s\n", strings.ReplaceAll(messageStr, "\n", ""))

        // Simple JSON parsing to get event type
        var msgMap map[string]interface{}
        if json.Unmarshal(message, &msgMap) == nil {
            if header, ok := msgMap["header"].(map[string]interface{}); ok {
                if event, ok := header["event"].(string); ok {
                    fmt.Printf("Event type: %s\n", event)

                    switch event {
                    case "task-started":
                        fmt.Println("=== Received task-started event ===")

                        if !textSent {
                            // Send continue-task command. When using SSML, this command can only be sent once
                            continueTaskCmd := map[string]interface{}{
                                "header": map[string]interface{}{
                                    "action":    "continue-task",
                                    "task_id":   taskID,
                                    "streaming": "duplex",
                                },
                                "payload": map[string]interface{}{
                                    "input": map[string]interface{}{
                                        // Special characters need to be escaped
                                        "text": "<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>",
                                    },
                                },
                            }

                            continueTaskJSON, _ := json.Marshal(continueTaskCmd)
                            fmt.Printf("Sending continue-task command: %s\n", string(continueTaskJSON))

                            err = conn.WriteMessage(websocket.TextMessage, continueTaskJSON)
                            if err != nil {
                                fmt.Println("Failed to send continue-task:", err)
                                return
                            }

                            textSent = true

                            // Delay before sending finish-task
                            time.Sleep(500 * time.Millisecond)

                            // Send finish-task command
                            finishTaskCmd := map[string]interface{}{
                                "header": map[string]interface{}{
                                    "action":    "finish-task",
                                    "task_id":   taskID,
                                    "streaming": "duplex",
                                },
                                "payload": map[string]interface{}{
                                    "input": map[string]interface{}{},
                                },
                            }

                            finishTaskJSON, _ := json.Marshal(finishTaskCmd)
                            fmt.Printf("Sending finish-task command: %s\n", string(finishTaskJSON))

                            err = conn.WriteMessage(websocket.TextMessage, finishTaskJSON)
                            if err != nil {
                                fmt.Println("Failed to send finish-task:", err)
                                return
                            }
                        }

                    case "task-finished":
                        fmt.Println("=== Task completed ===")
                        return

                    case "task-failed":
                        fmt.Println("=== Task failed ===")
                        if header["error_message"] != nil {
                            fmt.Printf("Error message: %s\n", header["error_message"])
                        }
                        return

                    case "result-generated":
                        fmt.Println("Received result-generated event")
                    }
                }
            }
        }
    }
}

C#

using System.Net.WebSockets;
using System.Text;
using System.Text.Json;

// SSML feature description:
//     1. When sending the run-task command, set the enable_ssml parameter to true to enable SSML support
//     2. Send text containing SSML via the continue-task command, and only one continue-task command is allowed
//     3. Only cloned voices of the cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models, as well as system voices marked as SSML-supported in the voice list, support the SSML feature (for example, the longanyang voice of the cosyvoice-v3-flash model)
class Program {
    // The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    // If you have not configured an environment variable, replace the following line with your Model Studio API Key: private static readonly string ApiKey = "sk-xxx"
    private static readonly string ApiKey = Environment.GetEnvironmentVariable("DASHSCOPE_API_KEY") ?? throw new InvalidOperationException("DASHSCOPE_API_KEY environment variable is not set.");

    // The following is the Singapore region URL. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference/
    private const string WebSocketUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/";
    // Output file path
    private const string OutputFilePath = "output.mp3";

    // WebSocket client
    private static ClientWebSocket _webSocket = new ClientWebSocket();
    // Cancellation token source
    private static CancellationTokenSource _cancellationTokenSource = new CancellationTokenSource();
    // Task ID
    private static string? _taskId;
    // Whether the task has started
    private static TaskCompletionSource<bool> _taskStartedTcs = new TaskCompletionSource<bool>();

    static async Task Main(string[] args) {
        try {
            // Clear the output file
            ClearOutputFile(OutputFilePath);

            // Connect to the WebSocket service
            await ConnectToWebSocketAsync(WebSocketUrl);

            // Start the message receiving task
            Task receiveTask = ReceiveMessagesAsync();

            // Send run-task command
            _taskId = GenerateTaskId();
            await SendRunTaskCommandAsync(_taskId);

            // Wait for the task-started event
            await _taskStartedTcs.Task;

            // Send continue-task command. When using SSML, this command can only be sent once
            // Special characters need to be escaped
            await SendContinueTaskCommandAsync("<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>");

            // Send finish-task command
            await SendFinishTaskCommandAsync(_taskId);

            // Wait for the receiving task to complete
            await receiveTask;

            Console.WriteLine("Task completed, connection closed.");
        } catch (OperationCanceledException) {
            Console.WriteLine("Task was cancelled.");
        } catch (Exception ex) {
            Console.WriteLine($"An error occurred: {ex.Message}");
        } finally {
            _cancellationTokenSource.Cancel();
            _webSocket.Dispose();
        }
    }

    private static void ClearOutputFile(string filePath) {
        if (File.Exists(filePath)) {
            File.WriteAllText(filePath, string.Empty);
            Console.WriteLine("Output file cleared.");
        } else {
            Console.WriteLine("Output file does not exist, no need to clear.");
        }
    }

    private static async Task ConnectToWebSocketAsync(string url) {
        var uri = new Uri(url);
        if (_webSocket.State == WebSocketState.Connecting || _webSocket.State == WebSocketState.Open) {
            return;
        }

        // Set WebSocket connection headers
        _webSocket.Options.SetRequestHeader("Authorization", $"bearer {ApiKey}");
        _webSocket.Options.SetRequestHeader("X-DashScope-DataInspection", "enable");

        try {
            await _webSocket.ConnectAsync(uri, _cancellationTokenSource.Token);
            Console.WriteLine("Successfully connected to the WebSocket service.");
        } catch (OperationCanceledException) {
            Console.WriteLine("WebSocket connection was cancelled.");
        } catch (Exception ex) {
            Console.WriteLine($"WebSocket connection failed: {ex.Message}");
            throw;
        }
    }

    private static async Task SendRunTaskCommandAsync(string taskId) {
        var command = CreateCommand("run-task", taskId, "duplex", new {
            task_group = "audio",
            task = "tts",
            function = "SpeechSynthesizer",
            model = "cosyvoice-v3-flash",
            parameters = new
            {
                text_type = "PlainText",
                voice = "longanyang",
                format = "mp3",
                sample_rate = 22050,
                volume = 50,
                rate = 1,
                pitch = 1,
                // If enable_ssml is set to true, only one continue-task command is allowed; otherwise the error "Text request limit violated, expected 1." will be returned
                enable_ssml = true
            },
            input = new { }
        });

        await SendJsonMessageAsync(command);
        Console.WriteLine("run-task command sent.");
    }

    private static async Task SendContinueTaskCommandAsync(string text) {
        if (_taskId == null) {
            throw new InvalidOperationException("Task ID is not initialized.");
        }

        var command = CreateCommand("continue-task", _taskId, "duplex", new {
            input = new {
                text
            }
        });

        await SendJsonMessageAsync(command);
        Console.WriteLine("continue-task command sent.");
    }

    private static async Task SendFinishTaskCommandAsync(string taskId) {
        var command = CreateCommand("finish-task", taskId, "duplex", new {
            input = new { }
        });

        await SendJsonMessageAsync(command);
        Console.WriteLine("finish-task command sent.");
    }

    private static async Task SendJsonMessageAsync(string message) {
        var buffer = Encoding.UTF8.GetBytes(message);
        try {
            await _webSocket.SendAsync(new ArraySegment<byte>(buffer), WebSocketMessageType.Text, true, _cancellationTokenSource.Token);
        } catch (OperationCanceledException) {
            Console.WriteLine("Message sending was cancelled.");
        }
    }

    private static async Task ReceiveMessagesAsync() {
        while (_webSocket.State == WebSocketState.Open) {
            var response = await ReceiveMessageAsync();
            if (response != null) {
                var eventStr = response.RootElement.GetProperty("header").GetProperty("event").GetString();
                switch (eventStr) {
                    case "task-started":
                        Console.WriteLine("Task started.");
                        _taskStartedTcs.TrySetResult(true);
                        break;
                    case "task-finished":
                        Console.WriteLine("Task finished.");
                        _cancellationTokenSource.Cancel();
                        break;
                    case "task-failed":
                        Console.WriteLine("Task failed: " + response.RootElement.GetProperty("header").GetProperty("error_message").GetString());
                        _cancellationTokenSource.Cancel();
                        break;
                    default:
                        // result-generated can be handled here
                        break;
                }
            }
        }
    }

    private static async Task<JsonDocument?> ReceiveMessageAsync() {
        var buffer = new byte[1024 * 4];
        var segment = new ArraySegment<byte>(buffer);

        try {
            WebSocketReceiveResult result = await _webSocket.ReceiveAsync(segment, _cancellationTokenSource.Token);

            if (result.MessageType == WebSocketMessageType.Close) {
                await _webSocket.CloseAsync(WebSocketCloseStatus.NormalClosure, "Closing", _cancellationTokenSource.Token);
                return null;
            }

            if (result.MessageType == WebSocketMessageType.Binary) {
                // Process binary data
                Console.WriteLine("Received binary data...");

                // Save binary data to file
                using (var fileStream = new FileStream(OutputFilePath, FileMode.Append)) {
                    fileStream.Write(buffer, 0, result.Count);
                }

                return null;
            }

            string message = Encoding.UTF8.GetString(buffer, 0, result.Count);
            return JsonDocument.Parse(message);
        } catch (OperationCanceledException) {
            Console.WriteLine("Message receiving was cancelled.");
            return null;
        }
    }

    private static string GenerateTaskId() {
        return Guid.NewGuid().ToString("N").Substring(0, 32);
    }

    private static string CreateCommand(string action, string taskId, string streaming, object payload) {
        var command = new {
            header = new {
                action,
                task_id = taskId,
                streaming
            },
            payload
        };

        return JsonSerializer.Serialize(command);
    }
}

PHP

The sample code directory structure is:

my-php-project/

├── composer.json

├── vendor/

└── index.php

The composer.json file contains the following dependencies. Adjust the version numbers as needed:

{
    "require": {
        "react/event-loop": "^1.3",
        "react/socket": "^1.11",
        "react/stream": "^1.2",
        "react/http": "^1.1",
        "ratchet/pawl": "^0.4"
    },
    "autoload": {
        "psr-4": {
            "App\\": "src/"
        }
    }
}

The index.php file contains the following code:

<!-- SSML feature description: -->
<!--     1. When sending the run-task command, set the enable_ssml parameter to true to enable SSML support -->
<!--     2. Send text containing SSML via the continue-task command, and only one continue-task command is allowed -->
<!--     3. Only cloned voices of the cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models, as well as system voices marked as SSML-supported in the voice list, support the SSML feature (for example, the longanyang voice of the cosyvoice-v3-flash model) -->

<?php

require __DIR__ . '/vendor/autoload.php';

use Ratchet\Client\Connector;
use React\EventLoop\Loop;
use React\Socket\Connector as SocketConnector;

// The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you have not configured an environment variable, replace the following line with your Model Studio API Key: $api_key = "sk-xxx"
$api_key = getenv("DASHSCOPE_API_KEY");
// The following is the Singapore region URL. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference/
$websocket_url = 'wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/'; // WebSocket server address
$output_file = 'output.mp3'; // Output file path

$loop = Loop::get();

if (file_exists($output_file)) {
    // Clear file contents
    file_put_contents($output_file, '');
}

// Create a custom connector
$socketConnector = new SocketConnector($loop, [
    'tcp' => [
        'bindto' => '0.0.0.0:0',
    ],
    'tls' => [
        'verify_peer' => false,
        'verify_peer_name' => false,
    ],
]);

$connector = new Connector($loop, $socketConnector);

$headers = [
    'Authorization' => 'bearer ' . $api_key,
    'X-DashScope-DataInspection' => 'enable'
];

$connector($websocket_url, [], $headers)->then(function ($conn) use ($loop, $output_file) {
    echo "Connected to WebSocket server\n";

    // Generate task ID
    $taskId = generateTaskId();

    // Send run-task command
    sendRunTaskMessage($conn, $taskId);

    // Define the function to send continue-task command
    $sendContinueTask = function() use ($conn, $loop, $taskId) {
        // Send continue-task command. When using SSML, this command can only be sent once
        $continueTaskMessage = json_encode([
            "header" => [
                "action" => "continue-task",
                "task_id" => $taskId,
                "streaming" => "duplex"
            ],
            "payload" => [
                "input" => [
                    // Special characters need to be escaped
                    "text" => "<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>"
                ]
            ]
        ]);
        $conn->send($continueTaskMessage);

        // Send finish-task command
        sendFinishTaskMessage($conn, $taskId);
    };

    // Flag for whether task-started event has been received
    $taskStarted = false;

    // Listen for messages
    $conn->on('message', function($msg) use ($conn, $sendContinueTask, $loop, &$taskStarted, $taskId, $output_file) {
        if ($msg->isBinary()) {
            // Write binary data to local file
            file_put_contents($output_file, $msg->getPayload(), FILE_APPEND);
        } else {
            // Process non-binary messages
            $response = json_decode($msg, true);

            if (isset($response['header']['event'])) {
                handleEvent($conn, $response, $sendContinueTask, $loop, $taskId, $taskStarted);
            } else {
                echo "Unknown message format\n";
            }
        }
    });

    // Listen for connection close
    $conn->on('close', function($code = null, $reason = null) {
        echo "Connection closed\n";
        if ($code !== null) {
            echo "Close code: " . $code . "\n";
        }
        if ($reason !== null) {
            echo "Close reason: " . $reason . "\n";
        }
    });
}, function ($e) {
    echo "Unable to connect: {$e->getMessage()}\n";
});

$loop->run();

/**
 * Generate task ID
 * @return string
 */
function generateTaskId(): string {
    return bin2hex(random_bytes(16));
}

/**
 * Send run-task command
 * @param $conn
 * @param $taskId
 */
function sendRunTaskMessage($conn, $taskId) {
    $runTaskMessage = json_encode([
        "header" => [
            "action" => "run-task",
            "task_id" => $taskId,
            "streaming" => "duplex"
        ],
        "payload" => [
            "task_group" => "audio",
            "task" => "tts",
            "function" => "SpeechSynthesizer",
            "model" => "cosyvoice-v3-flash",
            "parameters" => [
                "text_type" => "PlainText",
                "voice" => "longanyang",
                "format" => "mp3",
                "sample_rate" => 22050,
                "volume" => 50,
                "rate" => 1,
                "pitch" => 1,
                // If enable_ssml is set to true, only one continue-task command is allowed; otherwise the error "Text request limit violated, expected 1." will be returned
                "enable_ssml" => true
            ],
            "input" => (object) []
        ]
    ]);
    echo "Preparing to send run-task command: " . $runTaskMessage . "\n";
    $conn->send($runTaskMessage);
    echo "run-task command sent\n";
}

/**
 * Read audio file
 * @param string $filePath
 * @return bool|string
 */
function readAudioFile(string $filePath) {
    $voiceData = file_get_contents($filePath);
    if ($voiceData === false) {
        echo "Unable to read audio file\n";
    }
    return $voiceData;
}

/**
 * Split audio data
 * @param string $data
 * @param int $chunkSize
 * @return array
 */
function splitAudioData(string $data, int $chunkSize): array {
    return str_split($data, $chunkSize);
}

/**
 * Send finish-task command
 * @param $conn
 * @param $taskId
 */
function sendFinishTaskMessage($conn, $taskId) {
    $finishTaskMessage = json_encode([
        "header" => [
            "action" => "finish-task",
            "task_id" => $taskId,
            "streaming" => "duplex"
        ],
        "payload" => [
            "input" => (object) []
        ]
    ]);
    echo "Preparing to send finish-task command: " . $finishTaskMessage . "\n";
    $conn->send($finishTaskMessage);
    echo "finish-task command sent\n";
}

/**
 * Handle events
 * @param $conn
 * @param $response
 * @param $sendContinueTask
 * @param $loop
 * @param $taskId
 * @param $taskStarted
 */
function handleEvent($conn, $response, $sendContinueTask, $loop, $taskId, &$taskStarted) {
    switch ($response['header']['event']) {
        case 'task-started':
            echo "Task started, sending continue-task command...\n";
            $taskStarted = true;
            // Send continue-task command
            $sendContinueTask();
            break;
        case 'result-generated':
            // Ignore result-generated events
            break;
        case 'task-finished':
            echo "Task completed\n";
            $conn->close();
            break;
        case 'task-failed':
            echo "Task failed\n";
            echo "Error code: " . $response['header']['error_code'] . "\n";
            echo "Error message: " . $response['header']['error_message'] . "\n";
            $conn->close();
            break;
        case 'error':
            echo "Error: " . $response['payload']['message'] . "\n";
            break;
        default:
            echo "Unknown event: " . $response['header']['event'] . "\n";
            break;
    }

    // If the task is completed, close the connection
    if ($response['header']['event'] == 'task-finished') {
        // Wait 1 second to ensure all data has been transmitted
        $loop->addTimer(1, function() use ($conn) {
            $conn->close();
            echo "Client closed connection\n";
        });
    }

    // If task-started event has not been received, close the connection
    if (!$taskStarted && in_array($response['header']['event'], ['task-failed', 'error'])) {
        $conn->close();
    }
}

Node.js

Install the required dependencies:

npm install ws
npm install uuid

Sample code:

// SSML feature description:
//     1. When sending the run-task command, set the enable_ssml parameter to true to enable SSML support
//     2. Send text containing SSML via the continue-task command, and only one continue-task command is allowed
//     3. Only cloned voices of the cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models, as well as system voices marked as SSML-supported in the voice list, support the SSML feature (for example, the longanyang voice of the cosyvoice-v3-flash model)

import fs from 'fs';
import WebSocket from 'ws';
import { v4 as uuid } from 'uuid'; // For generating UUIDs

// The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you have not configured an environment variable, replace the following line with your Model Studio API Key: const apiKey = "sk-xxx"
const apiKey = process.env.DASHSCOPE_API_KEY;
// The following is the Singapore region URL. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference/
const url = 'wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/';
// Output file path
const outputFilePath = 'output.mp3';

// Clear the output file
fs.writeFileSync(outputFilePath, '');

// Create WebSocket client
const ws = new WebSocket(url, {
  headers: {
    Authorization: `bearer ${apiKey}`,
    'X-DashScope-DataInspection': 'enable'
  }
});

let taskStarted = false;
let taskId = uuid();

ws.on('open', () => {
  console.log('Connected to WebSocket server');

  // Send run-task command
  const runTaskMessage = JSON.stringify({
    header: {
      action: 'run-task',
      task_id: taskId,
      streaming: 'duplex'
    },
    payload: {
      task_group: 'audio',
      task: 'tts',
      function: 'SpeechSynthesizer',
      model: 'cosyvoice-v3-flash',
      parameters: {
        text_type: 'PlainText',
        voice: 'longanyang', // Voice
        format: 'mp3', // Audio format
        sample_rate: 22050, // Sample rate
        volume: 50, // Volume
        rate: 1, // Speech rate
        pitch: 1, // Pitch
        enable_ssml: true // Whether to enable SSML. If enable_ssml is set to true, only one continue-task command is allowed; otherwise the error "Text request limit violated, expected 1." will be returned
      },
      input: {}
    }
  });
  ws.send(runTaskMessage);
  console.log('run-task message sent');
});

const fileStream = fs.createWriteStream(outputFilePath, { flags: 'a' });
ws.on('message', (data, isBinary) => {
  if (isBinary) {
    // Write binary data to file
    fileStream.write(data);
  } else {
    const message = JSON.parse(data);

    switch (message.header.event) {
      case 'task-started':
        taskStarted = true;
        console.log('Task started');
        // Send continue-task command
        sendContinueTasks(ws);
        break;
      case 'task-finished':
        console.log('Task completed');
        ws.close();
        fileStream.end(() => {
          console.log('File stream closed');
        });
        break;
      case 'task-failed':
        console.error('Task failed:', message.header.error_message);
        ws.close();
        fileStream.end(() => {
          console.log('File stream closed');
        });
        break;
      default:
        // result-generated can be handled here
        break;
    }
  }
});

function sendContinueTasks(ws) {

  if (taskStarted) {
    // Send continue-task command. When using SSML, this command can only be sent once
    const continueTaskMessage = JSON.stringify({
      header: {
        action: 'continue-task',
        task_id: taskId,
        streaming: 'duplex'
      },
      payload: {
        input: {
          // Special characters need to be escaped
          text: '<speak rate="2">My speaking rate is faster than a normal person's.</speak>'
        }
      }
    });
    ws.send(continueTaskMessage);

    // Send finish-task command
    const finishTaskMessage = JSON.stringify({
      header: {
        action: 'finish-task',
        task_id: taskId,
        streaming: 'duplex'
      },
      payload: {
        input: {}
      }
    });
    ws.send(finishTaskMessage);
  }
}

ws.on('close', () => {
  console.log('Disconnected from WebSocket server');
});

Java

For Java development, use the Java DashScope SDK. For details, see Java SDK.

This Java WebSocket example requires the following dependencies:

  • Java-WebSocket

  • jackson-databind

Manage these dependencies with Maven or Gradle:

pom.xml

<dependencies>
    <!-- WebSocket Client -->
    <dependency>
        <groupId>org.java-websocket</groupId>
        <artifactId>Java-WebSocket</artifactId>
        <version>1.5.3</version>
    </dependency>

    <!-- JSON Processing -->
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-databind</artifactId>
        <version>2.13.0</version>
    </dependency>
</dependencies>

build.gradle

dependencies {
  // WebSocket Client
  implementation 'org.java-websocket:Java-WebSocket:1.5.3'
  // JSON Processing
  implementation 'com.fasterxml.jackson.core:jackson-databind:2.13.0'
}

Java code:

import com.fasterxml.jackson.databind.ObjectMapper;

import org.java_websocket.client.WebSocketClient;
import org.java_websocket.handshake.ServerHandshake;

import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URI;
import java.nio.ByteBuffer;
import java.util.*;

/**
 * SSML feature description:
 *     1. When sending the run-task command, set the enable_ssml parameter to true to enable SSML support
 *     2. Send text containing SSML via the continue-task command, and only one continue-task command is allowed
 *     3. Only cloned voices of the cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models, as well as system voices marked as SSML-supported in the voice list, support the SSML feature (for example, the longanyang voice of the cosyvoice-v3-flash model)
 */
public class TTSWebSocketClient extends WebSocketClient {
    private final String taskId = UUID.randomUUID().toString();
    private final String outputFile = "output_" + System.currentTimeMillis() + ".mp3";
    private boolean taskFinished = false;

    public TTSWebSocketClient(URI serverUri, Map<String, String> headers) {
        super(serverUri, headers);
    }

    @Override
    public void onOpen(ServerHandshake serverHandshake) {
        System.out.println("Connection established");

        // Send run-task command
        // If enable_ssml is set to true, only one continue-task command is allowed; otherwise the error "Text request limit violated, expected 1." will be returned
        String runTaskCommand = "{ \"header\": { \"action\": \"run-task\", \"task_id\": \"" + taskId + "\", \"streaming\": \"duplex\" }, \"payload\": { \"task_group\": \"audio\", \"task\": \"tts\", \"function\": \"SpeechSynthesizer\", \"model\": \"cosyvoice-v3-flash\", \"parameters\": { \"text_type\": \"PlainText\", \"voice\": \"longanyang\", \"format\": \"mp3\", \"sample_rate\": 22050, \"volume\": 50, \"rate\": 1, \"pitch\": 1, \"enable_ssml\": true }, \"input\": {} }}";
        send(runTaskCommand);
    }

    @Override
    public void onMessage(String message) {
        System.out.println("Received message from server: " + message);
        try {
            // Parse JSON message
            Map<String, Object> messageMap = new ObjectMapper().readValue(message, Map.class);

            if (messageMap.containsKey("header")) {
                Map<String, Object> header = (Map<String, Object>) messageMap.get("header");

                if (header.containsKey("event")) {
                    String event = (String) header.get("event");

                    if ("task-started".equals(event)) {
                        System.out.println("Received task-started event from server");

                        // Send continue-task command. When using SSML, this command can only be sent once
                        // Special characters need to be escaped
                        sendContinueTask("<speak rate=\\\"2\\\">My speaking rate is faster than a normal person's.</speak>");

                        // Send finish-task command
                        sendFinishTask();
                    } else if ("task-finished".equals(event)) {
                        System.out.println("Received task-finished event from server");
                        taskFinished = true;
                        closeConnection();
                    } else if ("task-failed".equals(event)) {
                        System.out.println("Task failed: " + message);
                        closeConnection();
                    }
                }
            }
        } catch (Exception e) {
            System.err.println("An error occurred: " + e.getMessage());
        }
    }

    @Override
    public void onMessage(ByteBuffer message) {
        System.out.println("Received binary audio data, size: " + message.remaining());

        try (FileOutputStream fos = new FileOutputStream(outputFile, true)) {
            byte[] buffer = new byte[message.remaining()];
            message.get(buffer);
            fos.write(buffer);
            System.out.println("Audio data written to local file " + outputFile);
        } catch (IOException e) {
            System.err.println("Failed to write audio data to local file: " + e.getMessage());
        }
    }

    @Override
    public void onClose(int code, String reason, boolean remote) {
        System.out.println("Connection closed: " + reason + " (" + code + ")");
    }

    @Override
    public void onError(Exception ex) {
        System.err.println("Error: " + ex.getMessage());
        ex.printStackTrace();
    }

    private void sendContinueTask(String text) {
        String command = "{ \"header\": { \"action\": \"continue-task\", \"task_id\": \"" + taskId + "\", \"streaming\": \"duplex\" }, \"payload\": { \"input\": { \"text\": \"" + text + "\" } }}";
        send(command);
    }

    private void sendFinishTask() {
        String command = "{ \"header\": { \"action\": \"finish-task\", \"task_id\": \"" + taskId + "\", \"streaming\": \"duplex\" }, \"payload\": { \"input\": {} }}";
        send(command);
    }

    private void closeConnection() {
        if (!isClosed()) {
            close();
        }
    }

    public static void main(String[] args) {
        try {
            // The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
            // If you have not configured an environment variable, replace the following line with your Model Studio API Key: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
            if (apiKey == null || apiKey.isEmpty()) {
                System.err.println("Please set the DASHSCOPE_API_KEY environment variable");
                return;
            }

            Map<String, String> headers = new HashMap<>();
            headers.put("Authorization", "bearer " + apiKey);
            // The following is the Singapore region URL. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference/
            TTSWebSocketClient client = new TTSWebSocketClient(new URI("wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/"), headers);

            client.connect();

            while (!client.isClosed() && !client.taskFinished) {
                Thread.sleep(1000);
            }
        } catch (Exception e) {
            System.err.println("Failed to connect to WebSocket service: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Python

For Python development, use the Python DashScope SDK. For details, see Python SDK.

The following is a Python WebSocket example. Before running it, install the required dependency:

pip uninstall websocket-client
pip uninstall websocket
pip install websocket-client
Important

Don't name your Python file "websocket.py". Doing so causes a naming conflict that results in an error (AttributeError: module 'websocket' has no attribute 'WebSocketApp'. Did you mean: 'WebSocket'?).

# SSML feature description:
#     1. When sending the run-task command, set the enable_ssml parameter to true to enable SSML support
#     2. Send text containing SSML via the continue-task command, and only one continue-task command is allowed
#     3. Only cloned voices of the cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models, as well as system voices marked as SSML-supported in the voice list, support the SSML feature (for example, the longanyang voice of the cosyvoice-v3-flash model)

import websocket
import json
import uuid
import os
import time

class TTSClient:
    def __init__(self, api_key, uri):
        """
    Initialize a TTSClient instance

    Parameters:
        api_key (str): API Key for authentication
        uri (str): WebSocket service address
    """
        self.api_key = api_key  # Replace with your API Key
        self.uri = uri  # Replace with your WebSocket address
        self.task_id = str(uuid.uuid4())  # Generate a unique task ID
        self.output_file = f"output_{int(time.time())}.mp3"  # Output audio file path
        self.ws = None  # WebSocketApp instance
        self.task_started = False  # Whether task-started has been received
        self.task_finished = False  # Whether task-finished / task-failed has been received

    def on_open(self, ws):
        """
    Callback when WebSocket connection is established
    Sends run-task command to start the speech synthesis task
    """
        print("WebSocket connected")

        # Construct run-task command
        run_task_cmd = {
            "header": {
                "action": "run-task",
                "task_id": self.task_id,
                "streaming": "duplex"
            },
            "payload": {
                "task_group": "audio",
                "task": "tts",
                "function": "SpeechSynthesizer",
                "model": "cosyvoice-v3-flash",
                "parameters": {
                    "text_type": "PlainText",
                    "voice": "longanyang",
                    "format": "mp3",
                    "sample_rate": 22050,
                    "volume": 50,
                    "rate": 1,
                    "pitch": 1,
                    # If enable_ssml is set to True, only one continue-task command is allowed; otherwise the error "Text request limit violated, expected 1." will be returned
                    "enable_ssml": True
                },
                "input": {}
            }
        }

        # Send run-task command
        ws.send(json.dumps(run_task_cmd))
        print("run-task command sent")

    def on_message(self, ws, message):
        """
    Callback when a message is received
    Handles text and binary messages separately
    """
        if isinstance(message, str):
            # Process JSON text messages
            try:
                msg_json = json.loads(message)
                print(f"Received JSON message: {msg_json}")

                if "header" in msg_json:
                    header = msg_json["header"]

                    if "event" in header:
                        event = header["event"]

                        if event == "task-started":
                            print("Task started")
                            self.task_started = True

                            # Send continue-task command. When using SSML, this command can only be sent once
                            # Special characters need to be escaped
                            self.send_continue_task("<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>")

                            # Send finish-task after continue-task is sent
                            self.send_finish_task()

                        elif event == "task-finished":
                            print("Task completed")
                            self.task_finished = True
                            self.close(ws)

                        elif event == "task-failed":
                            error_msg = msg_json.get("error_message", "Unknown error")
                            print(f"Task failed: {error_msg}")
                            self.task_finished = True
                            self.close(ws)

            except json.JSONDecodeError as e:
                print(f"JSON parsing failed: {e}")
        else:
            # Process binary messages (audio data)
            print(f"Received binary message, size: {len(message)} bytes")
            with open(self.output_file, "ab") as f:
                f.write(message)
            print(f"Audio data written to local file {self.output_file}")

    def on_error(self, ws, error):
        """Callback when an error occurs"""
        print(f"WebSocket error: {error}")

    def on_close(self, ws, close_status_code, close_msg):
        """Callback when connection is closed"""
        print(f"WebSocket closed: {close_msg} ({close_status_code})")

    def send_continue_task(self, text):
        """Send continue-task command with the text content to synthesize"""
        cmd = {
            "header": {
                "action": "continue-task",
                "task_id": self.task_id,
                "streaming": "duplex"
            },
            "payload": {
                "input": {
                    "text": text
                }
            }
        }

        self.ws.send(json.dumps(cmd))
        print(f"continue-task command sent, text content: {text}")

    def send_finish_task(self):
        """Send finish-task command to end the speech synthesis task"""
        cmd = {
            "header": {
                "action": "finish-task",
                "task_id": self.task_id,
                "streaming": "duplex"
            },
            "payload": {
                "input": {}
            }
        }

        self.ws.send(json.dumps(cmd))
        print("finish-task command sent")

    def close(self, ws):
        """Actively close the connection"""
        if ws and ws.sock and ws.sock.connected:
            ws.close()
            print("Connection closed actively")

    def run(self):
        """Start the WebSocket client"""
        # Set request headers (authentication)
        header = {
            "Authorization": f"bearer {self.api_key}",
            "X-DashScope-DataInspection": "enable"
        }

        # Create WebSocketApp instance
        self.ws = websocket.WebSocketApp(
            self.uri,
            header=header,
            on_open=self.on_open,
            on_message=self.on_message,
            on_error=self.on_error,
            on_close=self.on_close
        )

        print("Listening for WebSocket messages...")
        self.ws.run_forever()  # Start persistent connection listener

# Example usage
if __name__ == "__main__":
    # The API Keys for the Singapore and Beijing regions are different. Get an API Key: https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you have not configured an environment variable, replace the following line with your Model Studio API Key: API_KEY = "sk-xxx"
    API_KEY = os.environ.get("DASHSCOPE_API_KEY")
    # The following is the Singapore region URL. If you use a model in the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference/
    SERVER_URI = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/"

    client = TTSClient(API_KEY, SERVER_URI)
    client.run()

cURL

curl --location 'https://dashscope.aliyuncs.com/api/v1/services/aigc/text-generation/generation' \
--header "Authorization: Bearer $DASHSCOPE_API_KEY" \
--header 'Content-Type: application/json' \
--header 'X-DashScope-DataInspection: enable' \
--data '{
    "model": "cosyvoice-v3-flash",
    "input": {
        "text": "<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>"
    },
    "parameters": {
        "voice": "longanyang",
        "format": "mp3"
    }
}'

Tag reference

Note

The Alibaba Cloud SSML implementation is based on the W3C SSML 1.0 specification. Not all standard tags are supported; the service implements the most commonly used tags for production scenarios.

  • When using SSML, all text content must be enclosed within <speak></speak> tags.

  • Multiple <speak> tags can be used in sequence (for example, <speak></speak><speak></speak>), but nesting is not supported (for example, <speak><speak></speak></speak>).

  • If text inside a tag contains XML special characters, escape them as follows:

    • " (double quote) → &quot;

    • ' (single quote/apostrophe) → &apos;

    • & (ampersand) → &amp;

    • < (less-than sign) → &lt;

    • > (greater-than sign) → &gt;

<speak>: root element

  • Description

    <speak> is the root element for all SSML content. All text must be enclosed within <speak></speak> tags.

  • Syntax

     <speak>Text that requires SSML processing</speak>
  • Attributes

    Attribute

    Type

    Required

    Description

    voice

    String

    No

    Specifies the voice.

    This attribute takes precedence over the voice parameter in the API request.

    • Valid values: a specific voice name. For details, see cosyvoice-v2 voices.

    • Example:

      <speak voice="longcheng_v2">
        I am a male voice.
      </speak>

    rate

    String

    No

    Specifies the speech rate. This attribute takes precedence over the speech_rate parameter in the API request.

    • Valid values: a decimal number between 0.5 and 2 (inclusive)

    • Default value: 1

      • Values greater than 1 increase the speech rate

      • Values less than 1 decrease the speech rate

    • Example:

      <speak rate="2">
        My speech rate is faster than normal.
      </speak>

    pitch

    String

    No

    Specifies the pitch. This attribute takes precedence over the pitch_rate parameter in the API request.

    • Valid values: a decimal number between 0.5 and 2 (inclusive)

    • Default value: 1

      • Values greater than 1 raise the pitch

      • Values less than 1 lower the pitch

    • Example:

      <speak pitch="0.5">
        However, my pitch is lower than others.
      </speak>

    volume

    String

    No

    Specifies the volume. This attribute takes precedence over the volume parameter in the API request.

    • Valid values: an integer between 0 and 100 (inclusive)

    • Default value: 50

      • Values greater than 50 increase the volume

      • Values less than 50 decrease the volume

    • Example:

      <speak volume="80">
        My volume is also very high.
      </speak>

    effect

    String

    No

    Specifies the audio effect.

    • Valid values:

      • robot: robot voice effect

      • lolita: lolita voice effect

      • lowpass: low-pass filter effect

      • echo: echo effect

      • eq: equalizer (advanced)

      • lpfilter: low-pass filter (advanced)

      • hpfilter: high-pass filter (advanced)

      Note
      • eq, lpfilter, and hpfilter are advanced effect types. Use the effectValue parameter to customize specific effects.

      • Each SSML tag supports only one effect. Setting multiple effect attributes simultaneously isn't allowed.

      • Enabling audio effects increases synthesis latency.

    • Example:

      <speak effect="robot">
        Do you like the robot WALL-E?
      </speak>

    effectValue

    String

    No

    Configures the specific behavior of the audio effect (the effect parameter). Applies to three advanced effect types: eq, lpfilter, and hpfilter.

    • Valid values:

      • eq (equalizer): the system supports 8 frequency bands by default, corresponding to the following frequencies:

        ["40 Hz","100 Hz", "200 Hz", "400 Hz", "800 Hz", "1600 Hz", "4000 Hz", "12000 Hz"].

        Each band has a bandwidth of 1.0q.

        Use the effectValue parameter to specify the gain for each band. The parameter is a string of 8 integers ranging from -20 to 20, separated by spaces. A value of 0 means no gain adjustment for that frequency.

        Example: effectValue="1 1 1 1 1 1 1 1"

      • lpfilter (low-pass filter): specifies the cutoff frequency. Valid values: an integer in the range (0, target_sample_rate/2]. Example: effectValue="800".

      • hpfilter (high-pass filter): specifies the cutoff frequency. Valid values: an integer in the range (0, target_sample_rate/2]. Example: effectValue="1200".

    • Example:

      <speak effect="eq" effectValue="1 -20 1 1 1 1 20 1">
        Do you like the robot WALL-E?
      </speak>
      
      <speak effect="lpfilter" effectValue="1200">
        Do you like the robot WALL-E?
      </speak>
      
      <speak effect="hpfilter" effectValue="1200">
        Do you like the robot WALL-E?
      </speak>

    bgm

    String

    No

    Adds background music to the synthesized speech. The audio file must be stored on Alibaba Cloud OSS (see Upload objects), and the bucket must have at least public-read access.

    If the background music URL contains XML special characters (such as &, <, >), escape them.

    • Audio requirements:

      There's no upper limit on the background music file size, but larger files take longer to download. If the synthesized speech is longer than the background music, the music loops automatically.

      • Sample rate: 16 kHz

      • Channels: mono

      • Format: WAV

        To convert a non-WAV file, use ffmpeg:

        ffmpeg -i input_audio -acodec pcm_s16le -ac 1 -ar 16000 output.wav
      • Bit depth: 16-bit

    • Example:

      <speak bgm="http://nls.alicdn.com/bgm/2.wav" backgroundMusicVolume="30" rate="-500" volume="40">
        <break time="2s"/>
        The old trees on the shady cliff are shrouded in mist
        <break time="700ms"/>
        The sound of rain is still in the bamboo forest
        <break time="700ms"/>
        I know that cotton contributes to the country's plan
        <break time="700ms"/>
        The scenery of Mianzhou is always pitiable
        <break time="2s"/>
      </speak>
    Important

    You're responsible for the copyright of any uploaded audio.

    backgroundMusicVolume

    String

    No

    Specifies the background music volume. Use this attribute together with the bgm attribute.

  • Tag relationships

    The <speak> tag can contain text and the following child tags:

  • More examples

    • No attributes

      <speak>
        Text that requires SSML tags
      </speak>
    • Combined attributes (space-separated)

      <speak rate="200" pitch="-100" volume="80">
        So when put together, my voice sounds like this.
      </speak>

<break>: control pause duration

  • Description

    Inserts a silent pause during speech synthesis to simulate natural pauses in conversation. Supports seconds (s) and milliseconds (ms) as time units.

  • Syntax

    # No attributes
    <break/>
    # With time attribute
    <break time="string"/>
  • Attributes

    Note

    A <break> tag without attributes pauses for 1 second by default.

    Attribute

    Type

    Required

    Description

    time

    String

    No

    Specifies the pause duration, in seconds or milliseconds (for example, "2s" or "50ms").

    • Valid values:

      • In seconds (s): an integer between 1 and 10 (inclusive)

      • In milliseconds (ms): an integer between 50 and 10000 (inclusive)

    • Example:

      <speak>
        Please close your eyes and take a rest.<break time="500ms"/>Okay, please open your eyes.
      </speak>
    Important

    When multiple <break> tags are used consecutively, the total pause duration is the sum of all individual durations. If the total exceeds 10 seconds, only the first 10 seconds are applied.

    For example, in the following SSML, the cumulative <break> duration is 15 seconds. Because this exceeds the 10-second limit, the actual pause is truncated to 10 seconds:

    <speak>
      Please close your eyes and take a rest.<break time="5s"/><break time="5s"/><break time="5s"/>Okay, please open your eyes.
    </speak>
  • Tag relationships

    <break> is a self-closing element and cannot contain child elements.

<sub>: substitute text

  • Description

    Replaces specified text with content that's more suitable for speech. For example, reads "W3C" as "World Wide Web Consortium."

  • Syntax

    <sub alias="string"></sub>
  • Attributes

    Attribute

    Type

    Required

    Description

    alias

    String

    Yes

    Specifies the replacement text to be read aloud.

    Example:

     <speak>
       <sub alias="network protocol">W3C</sub>
     </speak>
  • Tag relationships

    The <sub> tag can contain only plain text.

<phoneme>: specify pronunciation (pinyin/phonetic)

  • Description

    Provides precise control over how text is pronounced. Chinese text supports pinyin notation, and English text supports CMU phonetic notation. This is useful for disambiguating polyphonic characters and handling foreign language pronunciation.

  • Syntax

    <phoneme alphabet="string" ph="string">Text</phoneme>
  • Attributes

    Attribute

    Type

    Required

    Description

    alphabet

    String

    Yes

    Specifies the pronunciation type: pinyin (for Chinese) or phonetic symbols (for English).

    Valid values:

    ph

    String

    Yes

    Specifies the exact pinyin or phonetic notation. Usage rules:

    • Separate pinyin for multiple characters with spaces. The number of pinyin entries must match the number of characters.

    • Each pinyin entry consists of the pronunciation and a tone number. Tone numbers range from 1 to 5, where 5 represents the neutral tone.

    • Example:

      <speak>
        How to spell <phoneme alphabet="cmu" ph="S AY N">sin</phoneme>?
      </speak>
  • Tag relationships

    The <phoneme> tag can contain only plain text.

<soundEvent>: insert an external sound (ringtone, cat meow, etc.)

  • Description

    Inserts a sound effect file (such as an alert tone or ambient sound) at a specific point in the speech to enrich the audio output.

  • Syntax

     <soundEvent src="URL"/>
  • Attributes

    Attribute

    Type

    Required

    Description

    src

    String

    Yes

    Specifies the URL of an external audio file.

    The audio file must be stored on Alibaba Cloud OSS (see Upload objects), and the bucket must have at least public-read access. If the URL contains XML special characters (such as &, <, >), escape them.

    • Audio requirements:

      • Sample rate: 16 kHz

      • Channels: mono

      • Format: WAV

        To convert a non-WAV file, use ffmpeg:

        ffmpeg -i input_audio -acodec pcm_s16le -ac 1 -ar 16000 output.wav
      • File size: 2 MB maximum

      • Bit depth: 16-bit

    • Example:

      <speak>
        A horse was frightened<soundEvent src="http://nls.alicdn.com/sound-event/horse-neigh.wav"/>and people scattered to avoid it.
      </speak>
    Important

    You are legally responsible for the copyright of the uploaded audio.

  • Tag relationships

    <soundEvent> is a self-closing element and cannot contain child elements.

<say-as>: set text interpretation (numbers, dates, phone numbers, etc.)

  • Description

    Specifies the content type of text (such as numbers, dates, or phone numbers) so the system reads it according to the appropriate rules for that type.

  • Syntax

     <say-as interpret-as="string">Text</say-as>
  • Attributes

    Attribute

    Type

    Required

    Description

    interpret-as

    String

    Yes

    Specifies the content type of the text within the tag.

    Valid values:

    • cardinal: reads as a standard integer or decimal number

    • digits: reads each digit individually (for example, 123 is read as "one two three")

    • telephone: reads digit by digit in the standard phone number format

    • name: reads using standard name pronunciation rules

    • address: reads using standard address pronunciation rules

    • id: reads using standard identifier (account name, nickname) pronunciation rules

    • characters: reads each character in the text individually

    • punctuation: reads the name of each punctuation mark

    • date: reads using standard date pronunciation rules

    • time: reads using standard time pronunciation rules

    • currency: reads using standard monetary amount pronunciation rules

    • measure: reads using standard unit of measurement pronunciation rules

  • Supported ranges for each <say-as> type

    • cardinal

      Format

      Example

      English output

      Notes

      Digit string

      145

      one hundred forty five

      Integer range: positive and negative integers up to 13 digits, [-999999999999,999999999999].

      Decimal range: no specific limit on decimal places, but 10 or fewer is recommended.

      Digit string starting with zero

      0145

      one hundred forty five

      Minus sign + digit string

      -145

      minus hundred forty five

      Digit string with comma separating every 3 digits

      60,000

      sixty thousand

      Minus sign + comma-separated digit string

      -208,000

      minus two hundred eight thousand

      Digit string + decimal point + zero

      12.00

      twelve

      Digit string + decimal point + digit string

      12.34

      twelve point three four

      Comma-separated digit string + decimal point + digit string

      1,000.1

      one thousand point one

      Minus sign + digit string + decimal point + digit string

      -12.34

      minus twelve point three four

      Minus sign + comma-separated digit string + decimal point + digit string

      -1,000.1

      minus one thousand point one

      (Comma-separated) digit string + hyphen + (comma-separated) digit string

      1-1,000

      one to one thousand

      Other default readings

      012.34

      twelve point three four

      None

      1/2

      one half

      -3/4

      minus three quarters

      5.1/6

      five point one over six

      -3 1/2

      minus three and a half

      1,000.3^3

      one thousand point three to the power of three

      3e9.1

      three times ten to the power of nine point one

      23.10%

      twenty three point one percent

    • digits

      Format

      Example

      English output

      Notes

      Digit string

      12034

      one two zero three four

      No specific limit on digit string length, but 20 or fewer digits is recommended.

      When digit strings are grouped by spaces or hyphens, a comma pause is inserted between groups. Up to 5 groups are supported.

      Digit string + space/hyphen + digit string + space/hyphen + digit string + space/hyphen + digit string

      1-23-456 7890

      one, two three, four five six, seven eight nine zero

    • telephone

      Format

      Example

      English output

      Notes

      Digit string

      12034

      one two oh three four

      No specific limit on digit string length, but 20 or fewer digits is recommended.When digit strings are grouped by spaces or hyphens, a comma pause is inserted between groups. Up to 5 groups are supported.

      Digit string + space/hyphen + digit string + space/hyphen + digit string

      1-23-456 7890

      one, two three, four five six, seven eight nine oh

      Plus sign + digit string + space/hyphen + digit string

      +43-211-0567

      plus four three, two one one, oh five six seven

      Left paren + digit string + right paren + space + digit string + space/hyphen + digit string

      (21) 654-3210

      (two one) six five four, three two one oh

    • address

      This tag isn't supported for English text.

    • id

      For English text, this tag functions the same as the characters tag.

    • characters

      Format

      Example

      English output

      Notes

      String

      *b+3$.c-0'=α

      asterisk B plus three dollar dot C dash zero apostrophe equals alpha

      Supports Chinese characters, uppercase and lowercase letters, digits 0-9, and some full-width and half-width characters.

      Spaces in the output indicate pauses between characters, meaning each character is read individually.

      If the text inside the tag contains XML special characters, escape them.

    • punctuation

      For English text, this tag functions the same as the characters tag.

    • date

      Format

      Example

      English output

      Notes

      Four digits/two digits or four digits-two digits

      2000/01

      two thousand, oh one

      Spans across years.

      1900-01

      nineteen hundred, oh one

      2001-02

      twenty oh one, oh two

      2019-20

      twenty nineteen, twenty

      1998-99

      nineteen ninety eight, ninety nine

      1999-00

      nineteen ninety nine, oh oh

      Four-digit number starting with 1 or 2

      2000

      two thousand

      4-digit year.

      1900

      nineteen hundred

      1905

      nineteen oh five

      2021

      twenty twenty one

      Day - day of week (hyphen)

      or

      Day - day of week

      or

      Day & day of week

      mon-wed

      monday to wednesday

      If the text in the day-of-the-week range tag contains special XML characters, escape the characters.

      tue~fri

      tuesday to friday

      sat&sun

      saturday and sunday

      DD-DD MMM, YYYY

      or

      DD~DD MMM, YYYY

      or

      DD&DD MMM, YYYY

      19-20 Jan, 2000

      the nineteen to the twentieth of january two thousand

      DD: 2-digit day. MMM: 3-letter month abbreviation or full word. YYYY: 4-digit year starting with 1 or 2.

      01 ~ 10 Jul, 2020

      the first to the tenth of july twenty twenty

      05&06 Apr, 2009

      the fifth and the sixth of april two thousand nine

      MMM DD-DD

      or

      MMM DD~DD

      or

      MMM DD&DD

      Feb 01 - 03

      feburary the first to the third

      MMM: 3-letter month abbreviation or full word. DD: 2-digit day.

      Aug 10~20

      august the tenth to the twentieth

      Dec 11&12

      december the eleventh and the twelfth

      MMM-MMM

      or

      MMM~MMM

      or

      MMM&MMM

      Jan-Jun

      january to june

      MMM: 3-letter month abbreviation or full word.

      jul ~ dec

      july to december

      sep&oct

      september and october

      YYYY-YYYY

      or

      YYYY~YYYY

      1990 - 2000

      nineteen ninety to two thousand

      YYYY: 4-digit year starting with 1 or 2.

      2001~2021

      two thousand one to twenty twenty one

      WWW DD MMM YYYY

      Sun 20 Nov 2011

      sunday the twentieth of november twenty eleven

      WWW is the three-letter abbreviation or full name for a day of the week. DD is a two-digit day. MMM is the three-letter abbreviation or full name for a month. MM is a two-digit month (or the three-letter abbreviation or full name for a month). YYYY is a four-digit year starting with 1 or 2.

      WWW DD MMM

      Sun 20 Nov

      sunday the twentieth of november

      WWW MMM DD YYYY

      Sun Nov 20 2011

      sunday november the twentieth twenty eleven

      WWW MMM DD

      Sun Nov 20

      sunday november the twentieth

      WWW YYYY-MM-DD

      Sat 2010-10-01

      aturday october the first twenty ten

      WWW YYYY/MM/DD

      Sat 2010/10/01

      saturday october the first twenty ten

      WWW MM/DD/YYYY

      Sun 11/20/2011

      sunday november the twentieth twenty eleven

      MM/DD/YYYY

      11/20/2011

      november the twentieth twenty eleven

      YYYY

      1998

      nineteen ninety eight

      Other default readings

      10 Mar, 2001

      the tenth of march two thousand one

      None

      10 Mar

      the tenth of march

      Mar 2001

      march two thousand one

      Fri. 10/Mar/2001

      friday the tenth of march two thousand one

      Mar 10th, 2001

      march the tenth two thousand one

      Mar 10

      march the tenth

      2001/03/10

      march the tenth two thousand one

      2001-03-10

      march the tenth two thousand one

      2000s

      two thousands

      2010's

      twenty tens

      1900's

      nineteen hundreds

      1990s

      nineteen nineties

    • time

      Format

      Example

      English output

      Notes

      HH:MM AM or PM

      09:00 AM

      nine A M

      HH: 1 or 2-digit hour. MM: 2-digit minute. AM/PM: morning/afternoon.

      09:03 PM

      nine oh three P M

      09:13 p.m.

      nine thirteen p m

      HH:MM

      21:00

      twenty one hundred

      HHMM

      100

      one oclock

      Time point-Time point

      8:00 am - 05:30 pm

      eight a m to five p m

      Supports common time formats and ranges.

      7:05~10:15 AM

      seven oh five to ten fifteen A M

      09:00-13:00

      nine oclock to thirteen hundred

    • currency

      Format

      Example

      English output

      Notes

      Number + currency identifier

      1.00 RMB

      one yuan

      Supported number formats: integers, decimals, and comma-separated international notation.

      Supported currency identifiers:

      CN¥ (yuan)

      CNY (yuan)

      RMB (yuan)

      AUD (australian dollar)

      CAD (canadian dollar)

      CHF (swiss franc)

      DKK (danish krone)

      EUR (euro)

      GBP (british pound)

      HKD (Hong Kong(China) dollar)

      JPY (japanese yen)

      NOK (norwegian krone)

      SEK (swedish krona)

      SGD (singapore dollar)

      USD (united states dollar)

      2.02 CNY

      two point zero two yuan

      1,000.23 CN¥

      one thousand point two three yuan

      1.01 SGD

      one singapore dollar and one cent

      2.01 CAD

      two canadian dollars and one cent

      3.1 HKD

      three hong kong dollars and ten cents

      1,000.00 EUR

      one thousand euros

      Currency identifier + number

      US$ 1.00

      one US dollar

      Supported number formats: integers, decimals, and comma-separated international notation.

      Supported currency identifiers:

      US$ (US dollar)

      CA$ (Canadian dollar)

      AU$ (Australian dollar)

      SG$ (Singapore dollar)

      HK$ (Hong Kong dollar)

      C$ (Canadian dollar)

      A$ (Australian dollar)

      $ (dollar)

      £ (pound)

      € (euro)

      CN¥ (yuan)

      CNY (yuan)

      RMB (yuan)

      AUD (australian dollar)

      CAD (canadian dollar)

      CHF (swiss franc)

      DKK (danish krone)

      EUR (euro)

      GBP (british pound)

      HKD (Hong Kong(China) dollar)

      JPY (japanese yen)

      NOK (norwegian krone)

      SEK (swedish krona)

      SGD (singapore dollar)

      USD (united states dollar)

      $0.01

      one cent

      JPY 1.01

      one japanese yen and one sen

      £1.1

      one pound and ten pence

      €2.01

      two euros and one cent

      USD 1,000

      one thousand united states dollars

      Number + classifier + currency identifier

      or

      Currency identifier + number+Quantifier

      1.23 Tn RMB

      one point two three trillion yuan

      Supported classifier formats:

      thousand

      million

      billion

      trillion

      Mil (million)

      mil (million)

      Bil (billion)

      bil (billion)

      MM (million)

      Bn (billion)

      bn (billion)

      Tn (trillion)

      tn (trillion)

      K(thousand)

      k (thousand)

      M (million)

      m (million)

      $1.2 K

      one point two thousand dollars

    • measure

      Format

      Example

      English output

      Notes

      Number + unit of measurement

      1.0 kg

      one kilogram

      Supported number formats: integers, decimals, and comma-separated international notation.

      Supports common unit abbreviations.

      1,234.01 km

      one thousand two hundred thirty four point zero one kilometres.

      Unit of measurement

      mm2

      square millimetre

    • The following table shows how common symbols are read with <say-as>.

      Symbol

      English pronunciation

      !

      exclamation mark

      double quote

      #

      pound

      $

      dollar

      %

      percent

      &

      and

      left quote

      left parenthesis

      right parenthesis

      *

      asterisk

      +

      plus

      ,

      comma

      -

      dash

      .

      dot

      /

      slash

      solon

      semicolon

      <

      less than

      =

      equals

      >

      greater than

      ?

      question mark

      @

      at

      [

      left bracket

      \

      back slash

      ]

      right bracket

      ^

      caret

      _

      underscore

      `

      back quote

      {

      left brace

      |

      vertical bar

      }

      right brace

      ~

      tilde

      exclamation mark

      left double quote

      right double qute

      left quote

      right quote

      left parenthesis

      right parenthesis

      comma

      full stop

      em dash

      colon

      semicolon

      question mark

      enumeration comma

      ellipsis

      ……

      ellipsis

      left guillemet

      right guillemet

      yuan

      greater than or equal to

      less than or equal to

      not equal

      approximately equal

      ±

      plus or minus

      ×

      times

      π

      pi

      Α

      alpha

      Β

      beta

      Γ

      gamma

      Δ

      delta

      Ε

      epsilon

      Ζ

      zeta

      Θ

      theta

      Ι

      iota

      Κ

      kappa

      lambda

      Μ

      mu

      Ν

      nu

      Ξ

      ksi

      Ο

      omicron

      pi

      Ρ

      rho

      sigma

      Τ

      tau

      Υ

      upsilon

      Φ

      phi

      Χ

      chi

      Ψ

      psi

      Ω

      omega

      α

      alpha

      β

      beta

      γ

      gamma

      δ

      delta

      ε

      epsilon

      ζ

      zeta

      η

      eta

      θ

      theta

      ι

      iota

      κ

      kappa

      λ

      lambda

      μ

      mu

      ν

      nu

      ξ

      ksi

      ο

      omicron

      π

      pi

      ρ

      rho

      σ

      sigma

      τ

      tau

      υ

      upsilon

      φ

      phi

      χ

      chi

      ψ

      psi

      ω

      omega

    • The following table shows how common units of measurement are read with <say-as>.

      Format

      Category

      English example

      Abbreviation

      Length

      nm (nanometre), μm (micrometre), mm (millimetre), cm (centimetre), m (metre), km (kilometre), ft (foot), in (inch)

      Area

      cm² (square centimetre), ㎡ (square metre), km2 (square kilometre), SqFt (square foot)

      Volume

      cm³ (cubic centimetre), m³ (cubic metre), km3 (cubic kilometre), mL (millilitre), L (millilitre), gal (gallon)

      Weight

      μg (microgram), mg (microgram), g (gram), kg (kilogram)

      Time

      min (minute), sec (second), ms (millisecond)

      Electromagnetism

      μA (microamp), mA (milliamp), Hz (hertz), kHz (kilohertz), MHz (megahertz), GHz (gigahertz), V (volt), kV (kilovolt), kWh (kilowatt hour)

      Sound

      dB (decibel)

      Atmospheric pressure

      Pa (pascal), kPa (kilopascal), MPa (megapascal)

      Other common units

      Supports units beyond those listed above, such as tsp (teaspoon), rpm (revolutions per minute), KB (kilobyte), and mmHg (millimetre of mercury).

  • Tag relationships

    The <say-as> tag can contain text and <vhml/>.

  • Examples

    • cardinal

      <speak>
        <say-as interpret-as="cardinal">12345</say-as>
      </speak>
      <speak>
        <say-as interpret-as="cardinal">10234</say-as>
      </speak>
    • digits

      <speak>
        <say-as interpret-as="digits">12345</say-as>
      </speak>
      <speak>
        <say-as interpret-as="digits">10234</say-as>
      </speak>
    • telephone

      <speak>
        <say-as interpret-as="telephone">12345</say-as>
      </speak>
      <speak>
        <say-as interpret-as="telephone">10234</say-as>
      </speak>
    • name

      <speak>
        Her former name is <say-as interpret-as="name">Zeng Xiaofan</say-as>
      </speak>
    • address

      <speak>
        <say-as interpret-as="address">Fulu International, Building 1, Unit 3, Room 304</say-as>
      </speak>
    • id

      <speak>
        <say-as interpret-as="id">myid_1998</say-as>
      </speak>
    • characters

      <speak>
        <say-as interpret-as="characters">Greek letters αβ</say-as>
      </speak>
      <speak>
        <say-as interpret-as="characters">*b+3.c$=α</say-as>
      </speak>
    • punctuation

      <speak>
        <say-as interpret-as="punctuation"> -./:;</say-as>
      </speak>
    • date

      <speak>
        <say-as interpret-as="date">1000-10-10</say-as>
      </speak>
      <speak>
        <say-as interpret-as="date">10-01-2020</say-as>
      </speak>
    • time

      <speak>
        <say-as interpret-as="time">5:00am</say-as>
      </speak>
      <speak>
        <say-as interpret-as="time">0500</say-as>
      </speak>
    • currency

      <speak>
        <say-as interpret-as="currency">13,000,000.00RMB</say-as>
      </speak>
      <speak>
        <say-as interpret-as="currency">$1,000.01</say-as>
      </speak>
    • measure

      <speak>
        <say-as interpret-as="measure">100m12cm6mm</say-as>
      </speak>
      <speak>
        <say-as interpret-as="measure">1,000.01kg</say-as>
      </speak>