Introduction to the SSML markup language for CosyVoice - Alibaba Cloud Model Studio

Speech Synthesis Markup Language (SSML) is an XML-based markup language for speech synthesis. It enables large speech synthesis models to process rich text content and gives you fine-grained control over speech features, such as speech rate, pitch, pauses, and volume. You can also add background music to create more expressive speech effects. This topic describes the SSML features of CosyVoice and how to use them.

Limitations

Models: SSML only supports the cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models.
Voices: SSML only supports cloned voices and system voices that are marked in the Voice list as supporting SSML.
APIs: SSML only supports some APIs.
- Java SDK (version 2.20.3 or later): SSML is supported only for non-streaming and unidirectional streaming calls. For more information, see SSML support - Java SDK.
- Python SDK (version 1.23.4 or later): SSML is supported only for non-streaming and unidirectional streaming calls. For more information, see SSML support - Python SDK.
- WebSocket API: When sending the run-task instruction, set the enable_ssml parameter to true and send the continue-task instruction only once. For more information, see SSML support - WebSocket API.

Getting started

Before you run the code, complete the following steps:

Get an API key
Install the SDK (if you plan to run the Java/Python SDK examples)

Java SDK

Non-streaming call

import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import com.alibaba.dashscope.utils.Constants;

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;

/**
 * SSML feature notes:
 *     1. SSML is supported only for non-streaming and unidirectional streaming calls.
 *     2. SSML is supported only for cloned voices from cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models,
 *        and system voices marked as SSML-supported in the voice list (for example, the longanyang voice for cosyvoice-v3-flash).
 */
public class Main {
    private static String model = "cosyvoice-v3-flash";
    private static String voice = "longanyang";

    public static void main(String[] args) {
        // If you use a model from Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference
        Constants.baseWebsocketApiUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference";
        streamAudioDataToSpeaker();
        System.exit(0);
    }

    public static void streamAudioDataToSpeaker() {
        SpeechSynthesisParam param =
                SpeechSynthesisParam.builder()
                        // The API keys for Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                        // If you have not configured an environment variable, replace the following line with: .apiKey("sk-xxx")
                        .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                        .model(model)
                        .voice(voice)
                        .build();

        SpeechSynthesizer synthesizer = new SpeechSynthesizer(param, null);
        ByteBuffer audio = null;
        try {
            // Non-streaming call; blocks until audio is returned
            // Escape special characters
            audio = synthesizer.call("<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>");
        } catch (Exception e) {
            throw new RuntimeException(e);
        } finally {
            // Close the WebSocket connection after the task ends
            synthesizer.getDuplexApi().close(1000, "bye");
        }
        if (audio != null) {
            // Save the audio data to a local file named "output.mp3"
            File file = new File("output.mp3");
            try (FileOutputStream fos = new FileOutputStream(file)) {
                fos.write(audio.array());
            } catch (IOException e) {
                throw new RuntimeException(e);
            }
        }

        // The first packet latency includes the time required to establish the WebSocket connection
        System.out.println(
                "[Metric] Request ID: "
                        + synthesizer.getLastRequestId()
                        + ", First packet latency (ms): "
                        + synthesizer.getFirstPackageDelay());
    }
}

Unidirectional streaming call

import com.alibaba.dashscope.audio.tts.SpeechSynthesisResult;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisAudioFormat;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesisParam;
import com.alibaba.dashscope.audio.ttsv2.SpeechSynthesizer;
import com.alibaba.dashscope.common.ResultCallback;
import com.alibaba.dashscope.utils.Constants;

import java.io.FileOutputStream;
import java.io.IOException;
import java.util.concurrent.CountDownLatch;

/**
 * SSML feature notes:
 *     1. SSML is supported only for non-streaming and unidirectional streaming calls.
 *     2. SSML is supported only for cloned voices from cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models,
 *        and system voices marked as SSML-supported in the voice list (for example, the longanyang voice for cosyvoice-v3-flash).
 */
public class Main {
    private static String model = "cosyvoice-v3-flash";
    private static String voice = "longanyang";

    public static void main(String[] args) {
        // If you use a model from Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference
        Constants.baseWebsocketApiUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference";
        streamAudioDataToSpeaker();
        System.out.println("Audio saved to output.mp3");
        System.exit(0);
    }

    public static void streamAudioDataToSpeaker() {
        CountDownLatch latch = new CountDownLatch(1);
        final FileOutputStream[] fileOutputStream = new FileOutputStream[1];

        try {
            fileOutputStream[0] = new FileOutputStream("output.mp3");
        } catch (IOException e) {
            System.err.println("Failed to create output file: " + e.getMessage());
            return;
        }

        // Implement the ResultCallback interface
        ResultCallback<SpeechSynthesisResult> callback = new ResultCallback<SpeechSynthesisResult>() {
            @Override
            public void onEvent(SpeechSynthesisResult result) {
                if (result.getAudioFrame() != null) {
                    // Write audio data to a local file
                    try {
                        byte[] audioData = result.getAudioFrame().array();
                        fileOutputStream[0].write(audioData);
                        fileOutputStream[0].flush();
                    } catch (IOException e) {
                        System.err.println("Failed to write audio data: " + e.getMessage());
                    }
                }
            }

            @Override
            public void onComplete() {
                System.out.println("Received Complete; speech synthesis finished");
                closeFileOutputStream(fileOutputStream[0]);
                latch.countDown();
            }

            @Override
            public void onError(Exception e) {
                System.out.println("Error occurred: " + e.toString());
                closeFileOutputStream(fileOutputStream[0]);
                latch.countDown();
            }
        };

        SpeechSynthesisParam param =
                SpeechSynthesisParam.builder()
                        // The API keys for Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
                        // If you have not configured an environment variable, replace the following line with: .apiKey("sk-xxx")
                        .apiKey(System.getenv("DASHSCOPE_API_KEY"))
                        .model(model)
                        .voice(voice)
                        .format(SpeechSynthesisAudioFormat.MP3_22050HZ_MONO_256KBPS)
                        .build();

        SpeechSynthesizer synthesizer = new SpeechSynthesizer(param, callback);

        try {
            // Unidirectional streaming call; returns null immediately (results are delivered asynchronously through the callback)
            // Escape special characters
            synthesizer.call("<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>");
            // Wait for synthesis to complete
            latch.await();
        } catch (Exception e) {
            throw new RuntimeException(e);
        } finally {
            // Close the WebSocket connection after the task ends
            try {
                synthesizer.getDuplexApi().close(1000, "bye");
            } catch (Exception e) {
                System.err.println("Failed to close WebSocket connection: " + e.getMessage());
            }

            // Ensure the file stream is closed
            closeFileOutputStream(fileOutputStream[0]);
        }

        // The first packet latency includes the time required to establish the WebSocket connection
        System.out.println(
                "[Metric] Request ID: "
                        + synthesizer.getLastRequestId()
                        + ", First packet latency (ms): "
                        + synthesizer.getFirstPackageDelay());
    }

    private static void closeFileOutputStream(FileOutputStream fileOutputStream) {
        try {
            if (fileOutputStream != null) {
                fileOutputStream.close();
            }
        } catch (IOException e) {
            System.err.println("Failed to close file stream: " + e.getMessage());
        }
    }
}

Python SDK

Non-streaming call

# coding=utf-8
# SSML feature notes:
#     1. SSML is supported only for non-streaming and unidirectional streaming calls.
#     2. SSML is supported only for cloned voices from cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models,
#        and system voices marked as SSML-supported in the voice list (for example, the longanyang voice for cosyvoice-v3-flash)

import dashscope
from dashscope.audio.tts_v2 import *
import os

# The API keys for Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured an environment variable, replace the following line with: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY')

# If you use a model from Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference
dashscope.base_websocket_api_url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference'

# Model
model = "cosyvoice-v3-flash"
# Voice
voice = "longanyang"

# Instantiate SpeechSynthesizer and pass model, voice, and other request parameters to the constructor
synthesizer = SpeechSynthesizer(model=model, voice=voice)
# Non-streaming call; blocks until audio is returned
# Escape special characters
audio = synthesizer.call("<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>")

# Save the audio locally
with open('output.mp3', 'wb') as f:
    f.write(audio)

# The first packet latency includes the time required to establish the WebSocket connection
print('[Metric] Request ID: {}, First packet latency: {} ms'.format(
    synthesizer.get_last_request_id(),
    synthesizer.get_first_package_delay()))

Unidirectional streaming call

# coding=utf-8
# SSML feature notes:
#     1. SSML is supported only for non-streaming and unidirectional streaming calls.
#     2. SSML is supported only for cloned voices from cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models,
#        and system voices marked as SSML-supported in the voice list (for example, the longanyang voice for cosyvoice-v3-flash)

import dashscope
from dashscope.audio.tts_v2 import *
import os
from datetime import datetime

def get_timestamp():
    now = datetime.now()
    formatted_timestamp = now.strftime("[%Y-%m-%d %H:%M:%S.%f]")
    return formatted_timestamp

# The API keys for Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
# If you have not configured an environment variable, replace the following line with: dashscope.api_key = "sk-xxx"
dashscope.api_key = os.environ.get('DASHSCOPE_API_KEY')

# If you use a model from Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference
dashscope.base_websocket_api_url='wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference'

# Model
model = "cosyvoice-v3-flash"
# Voice
voice = "longanyang"

# Define callback interface
class Callback(ResultCallback):
    _player = None
    _stream = None

    def on_open(self):
        # Open output file to write audio data
        self.file = open("output.mp3", "wb")
        print("Connection established: " + get_timestamp())

    def on_complete(self):
        print("Speech synthesis completed; all results received: " + get_timestamp())
        if hasattr(self, 'file') and self.file:
            self.file.close()
        self
        # The first packet latency includes the time required to establish the WebSocket connection
        print('[Metric] Request ID: {}, First packet latency: {} ms'.format(
            self.synthesizer.get_last_request_id(),
            self.synthesizer.get_first_package_delay()))

    def on_error(self, message: str):
        print(f"Speech synthesis error: {message}")
        if hasattr(self, 'file') and self.file:
            self.file.close()

    def on_close(self):
        print("Connection closed: " + get_timestamp())
        if hasattr(self, 'file') and self.file:
            self.file.close()

    def on_event(self, message):
        pass

    def on_data(self, data: bytes) -> None:
        print(get_timestamp() + " Binary audio length: " + str(len(data)))
        # Write audio data to file
        self.file.write(data)

callback = Callback()

# Instantiate SpeechSynthesizer and pass model, voice, and other request parameters to the constructor
synthesizer = SpeechSynthesizer(
    model=model,
    voice=voice,
    callback=callback,
)

# Assign the synthesizer instance to callback for use in on_complete
callback.synthesizer = synthesizer

# Unidirectional streaming call; send text to synthesize and receive binary audio in real time via the on_data method of the callback
# Escape special characters
synthesizer.call("<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>")

WebSocket API

Go

// SSML feature notes:
//     1. Set the enable_ssml parameter to true in the run-task command to enable SSML.
//     2. Send SSML text using the continue-task command; you can send this command only once.
//     3. SSML is supported only for cloned voices from cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models,
//        and system voices marked as SSML-supported in the voice list (for example, the longanyang voice for cosyvoice-v3-flash)

package main

import (
    "encoding/json"
    "fmt"
    "net/http"
    "os"
    "strings"
    "time"

    "github.com/google/uuid"
    "github.com/gorilla/websocket"
)

const (
    // If you use a model from Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference/
    wsURL      = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/"
    outputFile = "output.mp3"
)

func main() {
    // The API keys for Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    // If you have not configured an environment variable, replace the following line with: apiKey := "sk-xxx"
    apiKey := os.Getenv("DASHSCOPE_API_KEY")

    // Clear the output file
    os.Remove(outputFile)
    os.Create(outputFile)

    // Connect to WebSocket
    header := make(http.Header)
    header.Add("X-DashScope-DataInspection", "enable")
    header.Add("Authorization", fmt.Sprintf("bearer %s", apiKey))

    conn, resp, err := websocket.DefaultDialer.Dial(wsURL, header)
    if err != nil {
        if resp != nil {
            fmt.Printf("Connection failed. HTTP status code: %d\n", resp.StatusCode)
        }
        fmt.Println("Connection failed:", err)
        return
    }
    defer conn.Close()

    // Generate task ID
    taskID := uuid.New().String()
    fmt.Printf("Generated task ID: %s\n", taskID)

    // Send run-task command
    runTaskCmd := map[string]interface{}{
        "header": map[string]interface{}{
            "action":    "run-task",
            "task_id":   taskID,
            "streaming": "duplex",
        },
        "payload": map[string]interface{}{
            "task_group": "audio",
            "task":       "tts",
            "function":   "SpeechSynthesizer",
            "model":      "cosyvoice-v3-flash",
            "parameters": map[string]interface{}{
                "text_type":   "PlainText",
                "voice":       "longanyang",
                "format":      "mp3",
                "sample_rate": 22050,
                "volume":      50,
                "rate":        1,
                "pitch":       1,
                // If enable_ssml is set to true, you can send the continue-task command only once.
                // Otherwise, you will get the error "Text request limit violated, expected 1."
                "enable_ssml": true,
            },
            "input": map[string]interface{}{},
        },
    }

    runTaskJSON, _ := json.Marshal(runTaskCmd)
    fmt.Printf("Sending run-task command: %s\n", string(runTaskJSON))

    err = conn.WriteMessage(websocket.TextMessage, runTaskJSON)
    if err != nil {
        fmt.Println("Failed to send run-task:", err)
        return
    }

    textSent := false

    // Process messages
    for {
        messageType, message, err := conn.ReadMessage()
        if err != nil {
            fmt.Println("Failed to read message:", err)
            break
        }

        // Handle binary messages
        if messageType == websocket.BinaryMessage {
            fmt.Printf("Received binary message, length: %d\n", len(message))
            file, _ := os.OpenFile(outputFile, os.O_APPEND|os.O_WRONLY|os.O_CREATE, 0644)
            file.Write(message)
            file.Close()
            continue
        }

        // Handle text messages
        messageStr := string(message)
        fmt.Printf("Received text message: %s\n", strings.ReplaceAll(messageStr, "\n", ""))

        // Parse JSON to get event type
        var msgMap map[string]interface{}
        if json.Unmarshal(message, &msgMap) == nil {
            if header, ok := msgMap["header"].(map[string]interface{}); ok {
                if event, ok := header["event"].(string); ok {
                    fmt.Printf("Event type: %s\n", event)

                    switch event {
                    case "task-started":
                        fmt.Println("=== Received task-started event ===")

                        if !textSent {
                            // Send continue-task command; when using SSML, you can send this command only once
                            continueTaskCmd := map[string]interface{}{
                                "header": map[string]interface{}{
                                    "action":    "continue-task",
                                    "task_id":   taskID,
                                    "streaming": "duplex",
                                },
                                "payload": map[string]interface{}{
                                    "input": map[string]interface{}{
                                        // Escape special characters
                                        "text": "<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>",
                                    },
                                },
                            }

                            continueTaskJSON, _ := json.Marshal(continueTaskCmd)
                            fmt.Printf("Sending continue-task command: %s\n", string(continueTaskJSON))

                            err = conn.WriteMessage(websocket.TextMessage, continueTaskJSON)
                            if err != nil {
                                fmt.Println("Failed to send continue-task:", err)
                                return
                            }

                            textSent = true

                            // Delay sending finish-task
                            time.Sleep(500 * time.Millisecond)

                            // Send finish-task command
                            finishTaskCmd := map[string]interface{}{
                                "header": map[string]interface{}{
                                    "action":    "finish-task",
                                    "task_id":   taskID,
                                    "streaming": "duplex",
                                },
                                "payload": map[string]interface{}{
                                    "input": map[string]interface{}{},
                                },
                            }

                            finishTaskJSON, _ := json.Marshal(finishTaskCmd)
                            fmt.Printf("Sending finish-task command: %s\n", string(finishTaskJSON))

                            err = conn.WriteMessage(websocket.TextMessage, finishTaskJSON)
                            if err != nil {
                                fmt.Println("Failed to send finish-task:", err)
                                return
                            }
                        }

                    case "task-finished":
                        fmt.Println("=== Task finished ===")
                        return

                    case "task-failed":
                        fmt.Println("=== Task failed ===")
                        if header["error_message"] != nil {
                            fmt.Printf("Error message: %s\n", header["error_message"])
                        }
                        return

                    case "result-generated":
                        fmt.Println("Received result-generated event")
                    }
                }
            }
        }
    }
}

C#

using System.Net.WebSockets;
using System.Text;
using System.Text.Json;

// SSML feature notes:
//     1. Set the enable_ssml parameter to true in the run-task command to enable SSML.
//     2. Send SSML text using the continue-task command; you can send this command only once.
//     3. SSML is supported only for cloned voices from cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models,
//        and system voices marked as SSML-supported in the voice list (for example, the longanyang voice for cosyvoice-v3-flash)
class Program {
    // The API keys for Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    // If you have not configured an environment variable, replace the following line with: private static readonly string ApiKey = "sk-xxx"
    private static readonly string ApiKey = Environment.GetEnvironmentVariable("DASHSCOPE_API_KEY") ?? throw new InvalidOperationException("DASHSCOPE_API_KEY environment variable is not set.");

    // If you use a model from Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference/
    private const string WebSocketUrl = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/";
    // Output file path
    private const string OutputFilePath = "output.mp3";

    // WebSocket client
    private static ClientWebSocket _webSocket = new ClientWebSocket();
    // Cancellation token source
    private static CancellationTokenSource _cancellationTokenSource = new CancellationTokenSource();
    // Task ID
    private static string? _taskId;
    // Whether the task has started
    private static TaskCompletionSource<bool> _taskStartedTcs = new TaskCompletionSource<bool>();

    static async Task Main(string[] args) {
        try {
            // Clear the output file
            ClearOutputFile(OutputFilePath);

            // Connect to WebSocket service
            await ConnectToWebSocketAsync(WebSocketUrl);

            // Start the task to receive messages
            Task receiveTask = ReceiveMessagesAsync();

            // Send run-task command
            _taskId = GenerateTaskId();
            await SendRunTaskCommandAsync(_taskId);

            // Wait for the task-started event
            await _taskStartedTcs.Task;

            // Send the continue-task command. When using the SSML feature, this command can be sent only once.
            // Special characters need to be escaped.
            await SendContinueTaskCommandAsync("<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>");

            // Send the finish-task command
            await SendFinishTaskCommandAsync(_taskId);

            // Wait for the receive task to complete
            await receiveTask;

            Console.WriteLine("Task completed, connection closed.");
        } catch (OperationCanceledException) {
            Console.WriteLine("The task was canceled.");
        } catch (Exception ex) {
            Console.WriteLine($"An error occurred: {ex.Message}");
        } finally {
            _cancellationTokenSource.Cancel();
            _webSocket.Dispose();
        }
    }

    private static void ClearOutputFile(string filePath) {
        if (File.Exists(filePath)) {
            File.WriteAllText(filePath, string.Empty);
            Console.WriteLine("The output file has been cleared.");
        } else {
            Console.WriteLine("The output file does not exist and does not need to be cleared.");
        }
    }

    private static async Task ConnectToWebSocketAsync(string url) {
        var uri = new Uri(url);
        if (_webSocket.State == WebSocketState.Connecting || _webSocket.State == WebSocketState.Open) {
            return;
        }

        // Set headers for the WebSocket connection
        _webSocket.Options.SetRequestHeader("Authorization", $"bearer {ApiKey}");
        _webSocket.Options.SetRequestHeader("X-DashScope-DataInspection", "enable");

        try {
            await _webSocket.ConnectAsync(uri, _cancellationTokenSource.Token);
            Console.WriteLine("Successfully connected to the WebSocket service.");
        } catch (OperationCanceledException) {
            Console.WriteLine("WebSocket connection was canceled.");
        } catch (Exception ex) {
            Console.WriteLine($"WebSocket connection failed: {ex.Message}");
            throw;
        }
    }

    private static async Task SendRunTaskCommandAsync(string taskId) {
        var command = CreateCommand("run-task", taskId, "duplex", new {
            task_group = "audio",
            task = "tts",
            function = "SpeechSynthesizer",
            model = "cosyvoice-v3-flash",
            parameters = new
            {
                text_type = "PlainText",
                voice = "longanyang",
                format = "mp3",
                sample_rate = 22050,
                volume = 50,
                rate = 1,
                pitch = 1,
                // If enable_ssml is set to true, you can send the continue-task command only once.
                // Otherwise, you will get the error "Text request limit violated, expected 1."
                enable_ssml = true
            },
            input = new { }
        });

        await SendJsonMessageAsync(command);
        Console.WriteLine("Sent run-task command.");
    }

    private static async Task SendContinueTaskCommandAsync(string text) {
        if (_taskId == null) {
            throw new InvalidOperationException("Task ID is not initialized.");
        }

        var command = CreateCommand("continue-task", _taskId, "duplex", new {
            input = new {
                text
            }
        });

        await SendJsonMessageAsync(command);
        Console.WriteLine("Sent continue-task command.");
    }

    private static async Task SendFinishTaskCommandAsync(string taskId) {
        var command = CreateCommand("finish-task", taskId, "duplex", new {
            input = new { }
        });

        await SendJsonMessageAsync(command);
        Console.WriteLine("Sent finish-task command.");
    }

    private static async Task SendJsonMessageAsync(string message) {
        var buffer = Encoding.UTF8.GetBytes(message);
        try {
            await _webSocket.SendAsync(new ArraySegment<byte>(buffer), WebSocketMessageType.Text, true, _cancellationTokenSource.Token);
        } catch (OperationCanceledException) {
            Console.WriteLine("Message sending was canceled.");
        }
    }

    private static async Task ReceiveMessagesAsync() {
        while (_webSocket.State == WebSocketState.Open) {
            var response = await ReceiveMessageAsync();
            if (response != null) {
                var eventStr = response.RootElement.GetProperty("header").GetProperty("event").GetString();
                switch (eventStr) {
                    case "task-started":
                        Console.WriteLine("Task started.");
                        _taskStartedTcs.TrySetResult(true);
                        break;
                    case "task-finished":
                        Console.WriteLine("Task finished.");
                        _cancellationTokenSource.Cancel();
                        break;
                    case "task-failed":
                        Console.WriteLine("Task failed: " + response.RootElement.GetProperty("header").GetProperty("error_message").GetString());
                        _cancellationTokenSource.Cancel();
                        break;
                    default:
                        // result-generated can be handled here
                        break;
                }
            }
        }
    }

    private static async Task<JsonDocument?> ReceiveMessageAsync() {
        var buffer = new byte[1024 * 4];
        var segment = new ArraySegment<byte>(buffer);

        try {
            WebSocketReceiveResult result = await _webSocket.ReceiveAsync(segment, _cancellationTokenSource.Token);

            if (result.MessageType == WebSocketMessageType.Close) {
                await _webSocket.CloseAsync(WebSocketCloseStatus.NormalClosure, "Closing", _cancellationTokenSource.Token);
                return null;
            }

            if (result.MessageType == WebSocketMessageType.Binary) {
                // Handle binary data
                Console.WriteLine("Receiving binary data...");

                // Save binary data to a file
                using (var fileStream = new FileStream(OutputFilePath, FileMode.Append)) {
                    fileStream.Write(buffer, 0, result.Count);
                }

                return null;
            }

            string message = Encoding.UTF8.GetString(buffer, 0, result.Count);
            return JsonDocument.Parse(message);
        } catch (OperationCanceledException) {
            Console.WriteLine("Message reception was canceled.");
            return null;
        }
    }

    private static string GenerateTaskId() {
        return Guid.NewGuid().ToString("N").Substring(0, 32);
    }

    private static string CreateCommand(string action, string taskId, string streaming, object payload) {
        var command = new {
            header = new {
                action,
                task_id = taskId,
                streaming
            },
            payload
        };

        return JsonSerializer.Serialize(command);
    }
}

PHP

The example code has the following directory structure:

my-php-project/

├── composer.json

├── vendor/

└── index.php

The following is the content of composer.json. Determine the version numbers of the dependencies based on your actual needs:

{
    "require": {
        "react/event-loop": "^1.3",
        "react/socket": "^1.11",
        "react/stream": "^1.2",
        "react/http": "^1.1",
        "ratchet/pawl": "^0.4"
    },
    "autoload": {
        "psr-4": {
            "App\\": "src/"
        }
    }
}

The following is the content of index.php:

<!-- SSML feature notes: -->
<!--     1. When sending the run-task command, set the enable_ssml parameter to true to enable SSML support. -->
<!--     2. Send the text that contains SSML by using the continue-task command. You can send this command only once. -->
<!--     3. SSML is supported only for cloned voices from cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models, and system voices marked as SSML-supported in the voice list (for example, the longanyang voice for cosyvoice-v3-flash). -->

<?php

require __DIR__ . '/vendor/autoload.php';

use Ratchet\Client\Connector;
use React\EventLoop\Loop;
use React\Socket\Connector as SocketConnector;

// The API keys for the Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you have not configured an environment variable, replace the following line with: $api_key = "sk-xxx"
$api_key = getenv("DASHSCOPE_API_KEY");
// The following URL is for the Singapore region. If you use a model from the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference/
$websocket_url = 'wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/'; // WebSocket server address
$output_file = 'output.mp3'; // Output file path

$loop = Loop::get();

if (file_exists($output_file)) {
    // Clear the file content
    file_put_contents($output_file, '');
}

// Create a custom connector
$socketConnector = new SocketConnector($loop, [
    'tcp' => [
        'bindto' => '0.0.0.0:0',
    ],
    'tls' => [
        'verify_peer' => false,
        'verify_peer_name' => false,
    ],
]);

$connector = new Connector($loop, $socketConnector);

$headers = [
    'Authorization' => 'bearer ' . $api_key,
    'X-DashScope-DataInspection' => 'enable'
];

$connector($websocket_url, [], $headers)->then(function ($conn) use ($loop, $output_file) {
    echo "Connected to WebSocket server\n";

    // Generate task ID
    $taskId = generateTaskId();

    // Send run-task command
    sendRunTaskMessage($conn, $taskId);

    // Define the function to send the continue-task command
    $sendContinueTask = function() use ($conn, $loop, $taskId) {
        // Send the continue-task command. When using the SSML feature, this command can be sent only once.
        $continueTaskMessage = json_encode([
            "header" => [
                "action" => "continue-task",
                "task_id" => $taskId,
                "streaming" => "duplex"
            ],
            "payload" => [
                "input" => [
                    // Special characters need to be escaped
                    "text" => "<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>"
                ]
            ]
        ]);
        $conn->send($continueTaskMessage);

        // Send the finish-task command
        sendFinishTaskMessage($conn, $taskId);
    };

    // Flag to check if the task-started event is received
    $taskStarted = false;

    // Listen for messages
    $conn->on('message', function($msg) use ($conn, $sendContinueTask, $loop, &$taskStarted, $taskId, $output_file) {
        if ($msg->isBinary()) {
            // Write binary data to the local file
            file_put_contents($output_file, $msg->getPayload(), FILE_APPEND);
        } else {
            // Handle non-binary messages
            $response = json_decode($msg, true);

            if (isset($response['header']['event'])) {
                handleEvent($conn, $response, $sendContinueTask, $loop, $taskId, $taskStarted);
            } else {
                echo "Unknown message format\n";
            }
        }
    });

    // Listen for connection close
    $conn->on('close', function($code = null, $reason = null) {
        echo "Connection closed\n";
        if ($code !== null) {
            echo "Close code: " . $code . "\n";
        }
        if ($reason !== null) {
            echo "Close reason: " . $reason . "\n";
        }
    });
}, function ($e) {
    echo "Could not connect: {$e->getMessage()}\n";
});

$loop->run();

/**
 * Generate task ID
 * @return string
 */
function generateTaskId(): string {
    return bin2hex(random_bytes(16));
}

/**
 * Send run-task command
 * @param $conn
 * @param $taskId
 */
function sendRunTaskMessage($conn, $taskId) {
    $runTaskMessage = json_encode([
        "header" => [
            "action" => "run-task",
            "task_id" => $taskId,
            "streaming" => "duplex"
        ],
        "payload" => [
            "task_group" => "audio",
            "task" => "tts",
            "function" => "SpeechSynthesizer",
            "model" => "cosyvoice-v3-flash",
            "parameters" => [
                "text_type" => "PlainText",
                "voice" => "longanyang",
                "format" => "mp3",
                "sample_rate" => 22050,
                "volume" => 50,
                "rate" => 1,
                "pitch" => 1,
                // If enable_ssml is set to true, you can send the continue-task command only once.
                // Otherwise, you will get the error "Text request limit violated, expected 1."
                "enable_ssml" => true
            ],
            "input" => (object) []
        ]
    ]);
    echo "Preparing to send run-task command: " . $runTaskMessage . "\n";
    $conn->send($runTaskMessage);
    echo "run-task command sent\n";
}

/**
 * Read audio file
 * @param string $filePath
 * @return bool|string
 */
function readAudioFile(string $filePath) {
    $voiceData = file_get_contents($filePath);
    if ($voiceData === false) {
        echo "Failed to read audio file\n";
    }
    return $voiceData;
}

/**
 * Split audio data
 * @param string $data
 * @param int $chunkSize
 * @return array
 */
function splitAudioData(string $data, int $chunkSize): array {
    return str_split($data, $chunkSize);
}

/**
 * Send finish-task command
 * @param $conn
 * @param $taskId
 */
function sendFinishTaskMessage($conn, $taskId) {
    $finishTaskMessage = json_encode([
        "header" => [
            "action" => "finish-task",
            "task_id" => $taskId,
            "streaming" => "duplex"
        ],
        "payload" => [
            "input" => (object) []
        ]
    ]);
    echo "Preparing to send finish-task command: " . $finishTaskMessage . "\n";
    $conn->send($finishTaskMessage);
    echo "finish-task command sent\n";
}

/**
 * Handle events
 * @param $conn
 * @param $response
 * @param $sendContinueTask
 * @param $loop
 * @param $taskId
 * @param $taskStarted
 */
function handleEvent($conn, $response, $sendContinueTask, $loop, $taskId, &$taskStarted) {
    switch ($response['header']['event']) {
        case 'task-started':
            echo "Task started, sending continue-task command...\n";
            $taskStarted = true;
            // Send continue-task command
            $sendContinueTask();
            break;
        case 'result-generated':
            // Ignore result-generated event
            break;
        case 'task-finished':
            echo "Task finished\n";
            $conn->close();
            break;
        case 'task-failed':
            echo "Task failed\n";
            echo "Error code: " . $response['header']['error_code'] . "\n";
            echo "Error message: " . $response['header']['error_message'] . "\n";
            $conn->close();
            break;
        case 'error':
            echo "Error: " . $response['payload']['message'] . "\n";
            break;
        default:
            echo "Unknown event: " . $response['header']['event'] . "\n";
            break;
    }

    // If the task is finished, close the connection
    if ($response['header']['event'] == 'task-finished') {
        // Wait for 1 second to ensure all data is transferred
        $loop->addTimer(1, function() use ($conn) {
            $conn->close();
            echo "Client closes connection\n";
        });
    }

    // If task-started event is not received, close the connection
    if (!$taskStarted && in_array($response['header']['event'], ['task-failed', 'error'])) {
        $conn->close();
    }
}

Node.js

You need to install the required dependencies:

npm install ws
npm install uuid

The example code is as follows:

// SSML feature notes:
//     1. Set the enable_ssml parameter to true in the run-task command to enable SSML.
//     2. Send SSML text using the continue-task command; you can send this command only once.
//     3. SSML is supported only for cloned voices from cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models,
//        and system voices marked as SSML-supported in the voice list (for example, the longanyang voice for cosyvoice-v3-flash)

import fs from 'fs';
import WebSocket from 'ws';
import { v4 as uuid } from 'uuid'; // Used to generate UUIDs

// The API keys for the Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
// If you have not configured an environment variable, replace the following line with: const apiKey = "sk-xxx"
const apiKey = process.env.DASHSCOPE_API_KEY;
// The following URL is for the Singapore region. If you use a model from the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference/
const url = 'wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/';
// Output file path
const outputFilePath = 'output.mp3';

// Clear the output file
fs.writeFileSync(outputFilePath, '');

// Create a WebSocket client
const ws = new WebSocket(url, {
  headers: {
    Authorization: `bearer ${apiKey}`,
    'X-DashScope-DataInspection': 'enable'
  }
});

let taskStarted = false;
let taskId = uuid();

ws.on('open', () => {
  console.log('Connected to WebSocket server');

  // Send the run-task command
  const runTaskMessage = JSON.stringify({
    header: {
      action: 'run-task',
      task_id: taskId,
      streaming: 'duplex'
    },
    payload: {
      task_group: 'audio',
      task: 'tts',
      function: 'SpeechSynthesizer',
      model: 'cosyvoice-v3-flash',
      parameters: {
        text_type: 'PlainText',
        voice: 'longanyang', // Voice
        format: 'mp3', // Audio format
        sample_rate: 22050, // Sample rate
        volume: 50, // Volume
        rate: 1, // Speech rate
        pitch: 1, // Pitch
        enable_ssml: true // Whether to enable the SSML feature. If enable_ssml is set to true, you can send the continue-task command only once. Otherwise, the error "Text request limit violated, expected 1." is reported.
      },
      input: {}
    }
  });
  ws.send(runTaskMessage);
  console.log('Sent run-task message');
});

const fileStream = fs.createWriteStream(outputFilePath, { flags: 'a' });
ws.on('message', (data, isBinary) => {
  if (isBinary) {
    // Write binary data to the file
    fileStream.write(data);
  } else {
    const message = JSON.parse(data);

    switch (message.header.event) {
      case 'task-started':
        taskStarted = true;
        console.log('Task has started');
        // Send continue-task command
        sendContinueTasks(ws);
        break;
      case 'task-finished':
        console.log('Task has finished');
        ws.close();
        fileStream.end(() => {
          console.log('File stream has been closed');
        });
        break;
      case 'task-failed':
        console.error('Task failed: ', message.header.error_message);
        ws.close();
        fileStream.end(() => {
          console.log('File stream has been closed');
        });
        break;
      default:
        // You can handle result-generated here
        break;
    }
  }
});

function sendContinueTasks(ws) {
  
  if (taskStarted) {
    // Send the continue-task command. When using the SSML feature, this command can be sent only once.
    const continueTaskMessage = JSON.stringify({
      header: {
        action: 'continue-task',
        task_id: taskId,
        streaming: 'duplex'
      },
      payload: {
        input: {
          // Special characters need to be escaped
          text: '<speak rate="2">My speaking rate is faster than a normal person's.</speak>'
        }
      }
    });
    ws.send(continueTaskMessage);
    
    // Send the finish-task command
    const finishTaskMessage = JSON.stringify({
      header: {
        action: 'finish-task',
        task_id: taskId,
        streaming: 'duplex'
      },
      payload: {
        input: {}
      }
    });
    ws.send(finishTaskMessage);
  }
}

ws.on('close', () => {
  console.log('Disconnected from the WebSocket server');
});

Java

If you use the Java programming language, we recommend that you use the Java DashScope SDK for development. For more information, see Java SDK.

The following is a Java WebSocket example. Before you run the example, make sure that you have imported the following dependencies:

Java-WebSocket
jackson-databind

We recommend that you use Maven or Gradle to manage dependency packages. The configurations are as follows:

pom.xml

<dependencies>
    <!-- WebSocket Client -->
    <dependency>
        <groupId>org.java-websocket</groupId>
        <artifactId>Java-WebSocket</artifactId>
        <version>1.5.3</version>
    </dependency>

    <!-- JSON Processing -->
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-databind</artifactId>
        <version>2.13.0</version>
    </dependency>
</dependencies>

build.gradle

// Omit other code
dependencies {
  // WebSocket Client
  implementation 'org.java-websocket:Java-WebSocket:1.5.3'
  // JSON Processing
  implementation 'com.fasterxml.jackson.core:jackson-databind:2.13.0'
}
// Omit other code

The Java code is as follows:

import com.fasterxml.jackson.databind.ObjectMapper;

import org.java_websocket.client.WebSocketClient;
import org.java_websocket.handshake.ServerHandshake;

import java.io.FileOutputStream;
import java.io.IOException;
import java.net.URI;
import java.nio.ByteBuffer;
import java.util.*;

/**
 * SSML feature notes:
 *     1. When sending the run-task command, set the enable_ssml parameter to true to enable SSML support.
 *     2. Send the text that contains SSML by using the continue-task command. You can send this command only once.
 *     3. SSML is supported only for cloned voices from cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models, and system voices marked as SSML-supported in the voice list (for example, the longanyang voice for cosyvoice-v3-flash).
 */
public class TTSWebSocketClient extends WebSocketClient {
    private final String taskId = UUID.randomUUID().toString();
    private final String outputFile = "output_" + System.currentTimeMillis() + ".mp3";
    private boolean taskFinished = false;

    public TTSWebSocketClient(URI serverUri, Map<String, String> headers) {
        super(serverUri, headers);
    }

    @Override
    public void onOpen(ServerHandshake serverHandshake) {
        System.out.println("Connection successful");

        // Send run-task command
        // If enable_ssml is set to true, you can send the continue-task command only once.
        // Otherwise, you will get the error "Text request limit violated, expected 1."
        String runTaskCommand = "{ \"header\": { \"action\": \"run-task\", \"task_id\": \"" + taskId + "\", \"streaming\": \"duplex\" }, \"payload\": { \"task_group\": \"audio\", \"task\": \"tts\", \"function\": \"SpeechSynthesizer\", \"model\": \"cosyvoice-v3-flash\", \"parameters\": { \"text_type\": \"PlainText\", \"voice\": \"longanyang\", \"format\": \"mp3\", \"sample_rate\": 22050, \"volume\": 50, \"rate\": 1, \"pitch\": 1, \"enable_ssml\": true }, \"input\": {} }}";
        send(runTaskCommand);
    }

    @Override
    public void onMessage(String message) {
        System.out.println("Received message from server: " + message);
        try {
            // Parse JSON message
            Map<String, Object> messageMap = new ObjectMapper().readValue(message, Map.class);

            if (messageMap.containsKey("header")) {
                Map<String, Object> header = (Map<String, Object>) messageMap.get("header");

                if (header.containsKey("event")) {
                    String event = (String) header.get("event");

                    if ("task-started".equals(event)) {
                        System.out.println("Received task-started event from server");

                        // Send the continue-task command. When using the SSML feature, this command can be sent only once.
                        // Special characters need to be escaped.
                        sendContinueTask("<speak rate=\\\"2\\\">My speaking rate is faster than a normal person's.</speak>");

                        // Send the finish-task command
                        sendFinishTask();
                    } else if ("task-finished".equals(event)) {
                        System.out.println("Received task-finished event from server");
                        taskFinished = true;
                        closeConnection();
                    } else if ("task-failed".equals(event)) {
                        System.out.println("Task failed: " + message);
                        closeConnection();
                    }
                }
            }
        } catch (Exception e) {
            System.err.println("An exception occurred: " + e.getMessage());
        }
    }

    @Override
    public void onMessage(ByteBuffer message) {
        System.out.println("Size of received binary audio data: " + message.remaining());

        try (FileOutputStream fos = new FileOutputStream(outputFile, true)) {
            byte[] buffer = new byte[message.remaining()];
            message.get(buffer);
            fos.write(buffer);
            System.out.println("Audio data has been written to the local file " + outputFile);
        } catch (IOException e) {
            System.err.println("Failed to write audio data to local file: " + e.getMessage());
        }
    }

    @Override
    public void onClose(int code, String reason, boolean remote) {
        System.out.println("Connection closed: " + reason + " (" + code + ")");
    }

    @Override
    public void onError(Exception ex) {
        System.err.println("Error: " + ex.getMessage());
        ex.printStackTrace();
    }

    private void sendContinueTask(String text) {
        String command = "{ \"header\": { \"action\": \"continue-task\", \"task_id\": \"" + taskId + "\", \"streaming\": \"duplex\" }, \"payload\": { \"input\": { \"text\": \"" + text + "\" } }}";
        send(command);
    }

    private void sendFinishTask() {
        String command = "{ \"header\": { \"action\": \"finish-task\", \"task_id\": \"" + taskId + "\", \"streaming\": \"duplex\" }, \"payload\": { \"input\": {} }}";
        send(command);
    }

    private void closeConnection() {
        if (!isClosed()) {
            close();
        }
    }

    public static void main(String[] args) {
        try {
            // The API keys for Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
            // If you have not configured an environment variable, replace the following line with: String apiKey = "sk-xxx"
            String apiKey = System.getenv("DASHSCOPE_API_KEY");
            if (apiKey == null || apiKey.isEmpty()) {
                System.err.println("Please set the DASHSCOPE_API_KEY environment variable");
                return;
            }

            Map<String, String> headers = new HashMap<>();
            headers.put("Authorization", "bearer " + apiKey);
            // If you use a model from Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference/
            TTSWebSocketClient client = new TTSWebSocketClient(new URI("wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/"), headers);

            client.connect();

            while (!client.isClosed() && !client.taskFinished) {
                Thread.sleep(1000);
            }
        } catch (Exception e) {
            System.err.println("Failed to connect to WebSocket service: " + e.getMessage());
            e.printStackTrace();
        }
    }
}

Python

If you use Python, we recommend that you use the Python DashScope SDK for development, see Python SDK.

The following is a Python WebSocket example. Before you run the example, import the dependencies in the following way:

pip uninstall websocket-client
pip uninstall websocket
pip install websocket-client

Important

Do not name the Python file that runs the example code "websocket.py". Otherwise, an error is reported: AttributeError: module 'websocket' has no attribute 'WebSocketApp'. Did you mean: 'WebSocket'?

# SSML feature notes:
#     1. When sending the run-task command, set the enable_ssml parameter to true to enable SSML support.
#     2. Send the text that contains SSML by using the continue-task command. You can send this command only once.
#     3. SSML is supported only for cloned voices from cosyvoice-v3-flash, cosyvoice-v3-plus, and cosyvoice-v2 models, and system voices marked as SSML-supported in the voice list (for example, the longanyang voice for cosyvoice-v3-flash).

import websocket
import json
import uuid
import os
import time


class TTSClient:
    def __init__(self, api_key, uri):
        """
    Initializes the TTSClient instance.

    Parameters:
        api_key (str): The API key for authentication.
        uri (str): The WebSocket service address.
    """
        self.api_key = api_key  # Replace with your API key.
        self.uri = uri  # Replace with your WebSocket address.
        self.task_id = str(uuid.uuid4())  # Generate a unique task ID.
        self.output_file = f"output_{int(time.time())}.mp3"  # Output audio file path.
        self.ws = None  # WebSocketApp instance.
        self.task_started = False  # Whether task-started is received.
        self.task_finished = False  # Whether task-finished/task-failed is received.

    def on_open(self, ws):
        """
    Callback function when the WebSocket connection is established.
    Sends the run-task command to start the speech synthesis task.
    """
        print("WebSocket connection established")

        # Construct the run-task command.
        run_task_cmd = {
            "header": {
                "action": "run-task",
                "task_id": self.task_id,
                "streaming": "duplex"
            },
            "payload": {
                "task_group": "audio",
                "task": "tts",
                "function": "SpeechSynthesizer",
                "model": "cosyvoice-v3-flash",
                "parameters": {
                    "text_type": "PlainText",
                    "voice": "longanyang",
                    "format": "mp3",
                    "sample_rate": 22050,
                    "volume": 50,
                    "rate": 1,
                    "pitch": 1,
                    # If enable_ssml is set to true, you can send the continue-task command only once.
                    # Otherwise, you will get the error "Text request limit violated, expected 1."
                    "enable_ssml": True
                },
                "input": {}
            }
        }

        # Send the run-task command.
        ws.send(json.dumps(run_task_cmd))
        print("run-task command sent")

    def on_message(self, ws, message):
        """
    Callback function when a message is received.
    Handles text and binary messages separately.
    """
        if isinstance(message, str):
            # Process JSON text messages.
            try:
                msg_json = json.loads(message)
                print(f"Received JSON message: {msg_json}")

                if "header" in msg_json:
                    header = msg_json["header"]

                    if "event" in header:
                        event = header["event"]

                        if event == "task-started":
                            print("Task started")
                            self.task_started = True

                            # Send the continue-task command. When using the SSML feature, this command can be sent only once.
                            # Special characters need to be escaped.
                            self.send_continue_task("<speak rate=\"2\">My speaking rate is faster than a normal person's.</speak>")

                            # Send finish-task after continue-task is sent.
                            self.send_finish_task()

                        elif event == "task-finished":
                            print("Task finished")
                            self.task_finished = True
                            self.close(ws)

                        elif event == "task-failed":
                            error_msg = msg_json.get("error_message", "Unknown error")
                            print(f"Task failed: {error_msg}")
                            self.task_finished = True
                            self.close(ws)

            except json.JSONDecodeError as e:
                print(f"JSON parsing failed: {e}")
        else:
            # Process binary messages (audio data).
            print(f"Received binary message, size: {len(message)} bytes")
            with open(self.output_file, "ab") as f:
                f.write(message)
            print(f"Audio data has been written to the local file {self.output_file}")

    def on_error(self, ws, error):
        """Callback on error."""
        print(f"WebSocket error: {error}")

    def on_close(self, ws, close_status_code, close_msg):
        """Callback on close."""
        print(f"WebSocket closed: {close_msg} ({close_status_code})")

    def send_continue_task(self, text):
        """Sends the continue-task command with the text to be synthesized."""
        cmd = {
            "header": {
                "action": "continue-task",
                "task_id": self.task_id,
                "streaming": "duplex"
            },
            "payload": {
                "input": {
                    "text": text
                }
            }
        }

        self.ws.send(json.dumps(cmd))
        print(f"Sent continue-task command, text content: {text}")

    def send_finish_task(self):
        """Sends the finish-task command to end the speech synthesis task."""
        cmd = {
            "header": {
                "action": "finish-task",
                "task_id": self.task_id,
                "streaming": "duplex"
            },
            "payload": {
                "input": {}
            }
        }

        self.ws.send(json.dumps(cmd))
        print("Sent finish-task command")

    def close(self, ws):
        """Actively closes the connection."""
        if ws and ws.sock and ws.sock.connected:
            ws.close()
            print("Connection actively closed")

    def run(self):
        """Starts the WebSocket client."""
        # Set request headers (authentication).
        header = {
            "Authorization": f"bearer {self.api_key}",
            "X-DashScope-DataInspection": "enable"
        }

        # Create a WebSocketApp instance.
        self.ws = websocket.WebSocketApp(
            self.uri,
            header=header,
            on_open=self.on_open,
            on_message=self.on_message,
            on_error=self.on_error,
            on_close=self.on_close
        )

        print("Listening for WebSocket messages...")
        self.ws.run_forever()  # Start the persistent connection listener.


# Example usage
if __name__ == "__main__":
    # The API keys for Singapore and Beijing regions are different. To obtain an API key, see https://www.alibabacloud.com/help/zh/model-studio/get-api-key
    # If you have not configured an environment variable, replace the following line with: API_KEY = "sk-xxx"
    API_KEY = os.environ.get("DASHSCOPE_API_KEY")
    # The following URL is for the Singapore region. If you use a model from the Beijing region, replace the URL with: wss://dashscope.aliyuncs.com/api-ws/v1/inference/
    SERVER_URI = "wss://dashscope-intl.aliyuncs.com/api-ws/v1/inference/"

    client = TTSClient(API_KEY, SERVER_URI)
    client.run()

Tags

Note

The SSML implementation for the speech synthesis service is based on the W3C SSML 1.0 specification. However, to accommodate various business scenarios, not all standard tags are supported. Instead, we support a collection of the most practical tags.

All text content that uses SSML features must be enclosed within <speak></speak> tags.
You can use multiple <speak> tags consecutively, such as <speak></speak><speak></speak>. You cannot nest them, such as <speak><speak></speak></speak>.
You must escape XML special characters in the tag text content. The common special characters and their escaped forms are as follows:
- " (double quotation mark) → "
- ' (single quotation mark/apostrophe) → '
- & (ampersand) → &
- < (less than sign) → <
- > (greater than sign) → >

`<speak>`: Root node

Description
The <speak> tag is the root node for all SSML documents. All text that uses SSML features must be enclosed within <speak></speak> tags.

Syntax

 <speak>Text that requires SSML features</speak>

Properties

Property	Type	Required	Description
voice	String	No	Specifies the voice. This property has a higher priority than the `voice` parameter in the API request. Valid values: For more information about specific voices, see cosyvoice-v2 voices. Example: `<speak voice="longcheng_v2"> I am a male voice. </speak>`
rate	String	No	Specifies the speech rate. This property has a higher priority than the `speech_rate` parameter in the API request. Valid values: a decimal number from 0.5 to 2. Default value: 1 A value greater than 1 indicates a faster speech rate. A value less than 1 indicates a slower speech rate. Example: `<speak rate="2"> My speech rate is faster than normal. </speak>`
pitch	String	No	Specifies the pitch. This property has a higher priority than the `pitch_rate` parameter in the API request. Valid values: a decimal number from 0.5 to 2. Default value: 1 A value greater than 1 indicates a higher pitch. A value less than 1 indicates a lower pitch. Example: `<speak pitch="0.5"> However, my pitch is lower than others. </speak>`
volume	String	No	Specifies the volume. This property has a higher priority than the `volume` parameter in the API request. Valid values: an integer from 0 to 100. Default value: 50 A value greater than 50 indicates a higher volume. A value less than 50 indicates a lower volume. Example: `<speak volume="80"> My volume is also very high. </speak>`
effect	String	No	Specifies the sound effect. Valid values: robot: robot sound effect lolita: lively female voice effect lowpass: low-pass sound effect echo: echo sound effect eq: equalizer (advanced) lpfilter: low-pass filter (advanced) hpfilter: high-pass filter (advanced) Note The eq, lpfilter, and hpfilter are advanced sound effect types. You can use the `effectValue` parameter to customize their specific effects. Each SSML tag supports only one sound effect. Multiple `effect` attributes cannot coexist. Using sound effects increases system latency. Example: `<speak effect="robot"> Do you like the robot WALL-E? </speak>`
effectValue	String	No	Specifies the specific effect of the sound effect (the `effect` parameter). Valid values: `eq` (equalizer): The system supports eight frequency levels by default: ["40 Hz", "100 Hz", "200 Hz", "400 Hz", "800 Hz", "1600 Hz", "4000 Hz", "12000 Hz"]. The bandwidth of each frequency band is 1.0 q. When you use this effect, you must use the `effectValue` parameter to specify the gain value for each frequency band. This parameter is a string of eight integers separated by spaces. The value of each integer ranges from -20 to 20. A value of `0` indicates that the gain of the corresponding frequency is not adjusted. For example: `effectValue="1 1 1 1 1 1 1 1"` `lpfilter` (low-pass filter): Enter the frequency value of the low-pass filter. The value is an integer in the range of (0, target sample rate/2]. For example, effectValue="800". `hpfilter` (high-pass filter): Enter the frequency value of the high-pass filter. The value is an integer in the range of (0, target sample rate/2]. For example, effectValue="1200". Example: `<speak effect="eq" effectValue="1 -20 1 1 1 1 20 1"> Do you like the robot WALL-E? </speak> <speak effect="lpfilter" effectValue="1200"> Do you like the robot WALL-E? </speak> <speak effect="hpfilter" effectValue="1200"> Do you like the robot WALL-E? </speak>`
bgm	String	No	Adds the specified background music to the synthesized speech. The background music file must be stored in Alibaba Cloud OSS (see Upload files), and its bucket must have at least public-read permissions. If the background music URL contains XML special characters, such as `&`, `<`, and `>`, you must escape them. Audio requirements: There is no upper limit on the audio file size, but larger files may increase download time. If the duration of the synthesized content exceeds the duration of the background music, the background music is automatically looped to match the length of the synthesized audio. Sample rate: 16 kHz Number of sound channels: mono File format: WAV If the original audio is not in WAV format, use the `ffmpeg` tool to transform it: `ffmpeg -i input_audio -acodec pcm_s16le -ac 1 -ar 16000 output.wav` Bit depth: 16-bit Example: `<speak bgm="http://nls.alicdn.com/bgm/2.wav" backgroundMusicVolume="30" rate="-500" volume="40"> <break time="2s"/> The old trees on the shady cliff are shrouded in mist <break time="700ms"/> The sound of rain is still in the bamboo forest <break time="700ms"/> I know that cotton contributes to the country's plan <break time="700ms"/> The scenery of Mianzhou is always pitiable <break time="2s"/> </speak>` Important You are legally responsible for the copyright of the uploaded audio.
backgroundMusicVolume	String	No	Controls the volume of the background music. This is configured using the `backgroundMusicVolume` property.

Tag relationships
The <speak> tag can contain text and the following tags:

More examples

Empty attribute

<speak>
  Text that requires SSML tags
</speak>

Attribute combination (separated by spaces)

<speak rate="200" pitch="-100" volume="80">
  So when put together, my voice sounds like this.
</speak>

<break>: Controls pause duration

Description
Adds a period of silence during speech synthesis to simulate a natural pause. You can set the duration in seconds (s) or milliseconds (ms). This tag is optional.

Syntax

# Empty attribute
<break/>
# With the time attribute
<break time="string"/>

Properties

Note

If you use the <break> tag without attributes, the default pause duration is 1 s.

Property

Type

Required

Description

time

String

Sets the pause duration in seconds or milliseconds, such as "2s" or "50ms".

Valid values:
- In seconds (s): an integer from 1 to 10.
- In milliseconds (ms): an integer from 50 to 10000.

Example:

<speak>
  Please close your eyes and take a rest.<break time="500ms"/>Okay, please open your eyes.
</speak>

Important

If you use multiple <break> tags consecutively, the total pause duration is the sum of the time specified in each tag. If the total duration exceeds 10 seconds, only the first 10 seconds take effect.

For example, in the following SSML segment, the cumulative duration of the <break> tags is 15 seconds, which exceeds the 10-second limit. The final pause duration will be truncated to 10 seconds:

<speak>
  Please close your eyes and take a rest.<break time="5s"/><break time="5s"/><break time="5s"/>Okay, please open your eyes.
</speak>

Tag relationships
<break> is an empty tag and cannot contain any other tags.

: Replaces text

Description
Replaces a string of text with a specified alternative that is read aloud instead. For example, the text "W3C" can be read as "network protocol". This tag is optional.
Syntax
```

```
Properties
Property
Type
Required
Description
alias
String
Yes
Replaces a piece of text with text that is more suitable for reading.
Example:
<speak> W3C </speak>
Tag relationships
The tag can only contain text.

<phoneme>: Specifies pronunciation (Pinyin/phonetic alphabet)

Description
Controls the pronunciation of a specific string of text. You can use Pinyin for Chinese and phonetic alphabet, such as CMU, for English. This tag is suitable for scenarios that require precise pronunciation and is optional.

Syntax

<phoneme alphabet="string" ph="string">text</phoneme>

Properties

Property

Type

Required

Description

alphabet

String

Yes

Specifies the pronunciation type: Pinyin (for Chinese) or phonetic alphabet (for English).

Valid values:

"py": Pinyin
"cmu": phonetic alphabet. For more information, see The CMU Pronouncing Dictionary.

String

Yes

Specifies the specific Pinyin or phonetic alphabet:

The Pinyin for each character is separated by a space, and the number of Pinyin syllables must match the number of characters.
Each Pinyin syllable consists of a pronunciation part and a tone. The tone is an integer from 1 to 5, where 5 indicates a neutral tone.

Example:

<speak>
  去<phoneme alphabet="py" ph="dian3 dang4 hang2">典当行</phoneme>把这个玩意<phoneme alphabet="py" ph="dang4 diao4">当掉</phoneme>
</speak>

<speak>
  How to spell <phoneme alphabet="cmu" ph="S AY N">sin</phoneme>?
</speak>

Tag relationships
The <phoneme> tag can only contain text.

<soundEvent>: Inserts an external sound (such as a ringtone or a cat's meow)

Description
Allows you to insert sound effect files, such as prompt tones or ambient sounds, into the synthesized speech to enrich the audio output. This tag is optional.
Syntax
```
 <soundEvent src="URL"/>
```

Properties

Property

Type

Required

Description

src

String

Yes

Sets the external audio URL.

The audio file must be stored in OSS (see Upload files), and its bucket must have at least public-read permissions. If the URL contains XML special characters, such as &, <, and >, you must escape them.

Audio requirements:
- Sample rate: 16 kHz
- Number of sound channels: mono
- File format: WAV
  If the original audio is not in WAV format, use the ffmpeg tool to transform it:
```
ffmpeg -i input_audio -acodec pcm_s16le -ac 1 -ar 16000 output.wav
```
- File size: no more than 2 MB
- Bit depth: 16-bit

Example:

<speak>
  A horse was frightened<soundEvent src="http://nls.alicdn.com/sound-event/horse-neigh.wav"/>and people scattered to avoid it.
</speak>

Important

You are legally responsible for the copyright of the uploaded audio.

Tag relationships
<soundEvent> is an empty tag and cannot contain any other tags.

<say-as>: Sets how text is read (such as numbers, dates, and phone numbers)

Description
Indicates the content type of a text string, which allows the model to read the text in the appropriate format. This tag is optional.

Syntax

 <say-as interpret-as="string">text</say-as>

Properties

Property

Type

Required

Description

interpret-as

String

Yes

Indicates the information type of the text within the tag.

Valid values:

cardinal: Read as a cardinal number (integer or decimal).
digits: Read as individual digits. For example, 123 is read as one two three.
telephone: Read as a telephone number.
name: Read as a name.
address: Read as an address.
id: Suitable for account names and nicknames. Read in the conventional way.
characters: Read the text within the tag character by character.
punctuation: Read the text within the tag as punctuation marks.
date: Read as a date.
time: Read as a time.
currency: Read as a currency amount.
measure: Read as a unit of measure.

Supported formats for each <say-as> type

cardinal

Format	Example	English output	Description
Number string	145	one hundred forty five	Integer input range: positive or negative integers within 13 digits, [-999999999999, 999999999999]. Decimal input range: There is no special limit on the number of decimal places, but it is recommended not to exceed 10.
Number string starting with zero	0145	one hundred forty five
Negative sign + number string	-145	minus hundred forty five
Three-digit number string separated by commas	60,000	sixty thousand
Negative sign + three-digit number string separated by commas	-208,000	minus two hundred eight thousand
Number string + decimal point + zero	12.00	twelve
Number string + decimal point + number string	12.34	twelve point three four
Three-digit number string separated by commas + decimal point + number string	1,000.1	one thousand point one
Negative sign + number string + decimal point + number string	-12.34	minus twelve point three four
Negative sign + three-digit number string separated by commas + decimal point + number string	-1,000.1	minus one thousand point one
(Three-digit comma-separated) number string + hyphen + (three-digit comma-separated) number	1-1,000	one to one thousand
Other default readings	012.34	twelve point three four	None
	1/2	one half
	-3/4	minus three quarters
	5.1/6	five point one over six
	-3 1/2	minus three and a half
	1,000.3^3	one thousand point three to the power of three
	3e9.1	three times ten to the power of nine point one
	23.10%	twenty three point one percent

digits

Format

Example

English output

Description

Number string

12034

one two zero three four

There is no special limit on the length of the number string, but it is recommended not to exceed 20 digits.

When the number string is grouped by spaces or hyphens, a comma is inserted between the groups to create an appropriate pause. Up to 5 groups are supported.

Number string + space or hyphen + number string + space or hyphen + number string + space or hyphen + number string

1-23-456 7890

one, two three, four five six, seven eight nine zero

telephone

Format	Example	English output	Description
Number string	12034	one two oh three four	There is no special limit on the length of the number string, but it is recommended not to exceed 20 digits. When the number string is grouped by spaces or hyphens, a comma is inserted between the groups to create an appropriate pause. Up to 5 groups are supported.
Number string + space or hyphen + number string + space or hyphen + number string	1-23-456 7890	one, two three, four five six, seven eight nine oh
Plus sign + number string + space or hyphen + number string	+43-211-0567	plus four three, two one one, oh five six seven
Left parenthesis + number string + right parenthesis + space + number string + space or hyphen + number string	(21) 654-3210	(two one) six five four, three two one oh

address
This tag is not supported for English text.
id
For English text, this tag functions the same as the characters tag.

characters

Format

Example

English output

Description

string

*b+3$.c-0'=α

asterisk B plus three dollar dot C dash zero apostrophe equals alpha

Supports Chinese characters, uppercase and lowercase English characters, Arabic numerals 0-9, and some full-width and half-width characters.

The spaces in the output indicate that a pause is inserted between each character, meaning the characters are read one by one.

If the text within the tag contains XML special characters, you must escape them.

punctuation
For English text, this tag functions the same as the characters tag.

date

Format	Example	English output	Description
Four digits/two digits or four digits-two digits	2000/01	two thousand, oh one	Spans across years.
	1900-01	nineteen hundred, oh one
	2001-02	twenty oh one, oh two
	2019-20	twenty nineteen, twenty
	1998-99	nineteen ninety eight, ninety nine
	1999-00	nineteen ninety nine, oh oh
Four-digit number starting with 1 or 2	2000	two thousand	Four-digit year.
	1900	nineteen hundred
	1905	nineteen oh five
	2021	twenty twenty one
Day of the week-Day of the week or Day of the week~Day of the week or Day of the week&Day of the week	mon-wed	monday to wednesday	If the text in the day-of-the-week range tag contains special XML characters, escape the characters.
	tue~fri	tuesday to friday
	sat&sun	saturday and sunday
DD-DD MMM, YYYY or DD~DD MMM, YYYY or DD&DD MMM, YYYY	19-20 Jan, 2000	the nineteen to the twentieth of january two thousand	DD indicates a two-digit day. MMM indicates the three-letter abbreviation or full name of a month. YYYY indicates a four-digit year starting with 1 or 2.
	01 ~ 10 Jul, 2020	the first to the tenth of july twenty twenty
	05&06 Apr, 2009	the fifth and the sixth of april two thousand nine
MMM DD-DD or MMM DD~DD or MMM DD&DD	Feb 01 - 03	feburary the first to the third	MMM indicates the three-letter abbreviation or full name of a month. DD indicates a two-digit day.
	Aug 10–20	august the tenth to the twentieth
	Dec 11&12	december the eleventh and the twelfth
MMM-MMM or MMM~MMM or MMM&MMM	Jan-Jun	january to june	MMM indicates the three-letter abbreviation or full name of a month.
	Jul - Dec	july to dcember
	sep&oct	september and october
YYYY-YYYY or YYYY~YYYY	1990 - 2000	nineteen ninety to two thousand	YYYY indicates a four-digit year that starts with 1 or 2.
YYYY-YYYY or YYYY~YYYY	2001–2021	two thousand one to twenty twenty one	YYYY indicates a four-digit year that starts with 1 or 2.
WWW DD MMM YYYY	Sun 20 Nov 2011	sunday the twentieth of november twenty eleven	WWW is the three-letter abbreviation or full name for a day of the week. DD is a two-digit day. MMM is the three-letter abbreviation or full name for a month. MM is a two-digit month (or the three-letter abbreviation or full name for a month). YYYY is a four-digit year starting with 1 or 2.
WWW DD MMM	Sun 20 Nov	sunday the twentieth of november
WWW MMM DD YYYY	Sun Nov 20 2011	sunday november the twentieth twenty eleven
WWW MMM DD	Sun Nov 20	sunday november the twentieth
WWW YYYY-MM-DD	Sat 2010-10-01	saturday october the first twenty ten
WWW YYYY/MM/DD	Sat 2010/10/01	saturday october the first twenty ten
WWW MM/DD/YYYY	Sun 11/20/2011	sunday november the twentieth twenty eleven
MM/DD/YYYY	11/20/2011	november the twentieth twenty eleven
YYYY	1998	nineteen ninety eight
Other default readings	10 Mar, 2001	the tenth of march two thousand one	None
	10 Mar	the tenth of march
	Mar 2001	march two thousand one
	Fri. 10/Mar/2001	friday the tenth of march two thousand one
	Mar 10th, 2001	march the tenth two thousand one
	Mar 10	march the tenth
	2001/03/10	march the tenth two thousand one
	2001-03-10	march the tenth two thousand one
	2000s	two thousands
	2010's	twenty tens
	1900's	nineteen hundreds
	1990s	nineteen nineties

time

Format	Example	English outputs	Description
HH:MM AM or PM	09:00 AM	nine A M	HH represents a one- or two-digit hour. MM represents a two-digit minute. AM/PM represents morning or afternoon.
	09:03 PM	nine oh three P M
	09:13 p.m.	nine thirteen p m
HH:MM	21:00	twenty one hundred
HHMM	100	one oclock
Time point-Time point	8:00 am - 05:30 pm	eight a m to five p m	Supports common time and time range formats.
	7:05~10:15 AM	seven oh five to ten fifteen A M
	09:00-13:00	nine oclock to thirteen hundred

currency

Format	Example	English output	Description
Number + Currency identifier	1.00 RMB	one yuan	Supported number formats: integers, decimals, and the international format that uses commas as thousands separators. Supported currency identifiers: CN¥ (yuan) CNY (yuan) RMB (yuan) AUD (australian dollar) CAD (canadian dollar) CHF (swiss franc) DKK (danish krone) EUR (euro) GBP (british pound) HKD (Hong Kong(China) dollar) JPY (japanese yen) NOK (norwegian krone) SEK (swedish krona) SGD (singapore dollar) USD (united states dollar)
	2.02 CNY	two point zero two yuan
	1,000.23 CN¥	one thousand point two three yuan
	1.01 SGD	one singapore dollar and one cent
	2.01 CAD	two canadian dollars and one cent
	3.1 HKD	three hong kong dollars and ten cents
	1,000.00 EUR	one thousand euros
Currency identifier + Number	US$ 1.00	one US dollar	Supported number formats: integers, decimals, and the international format that uses commas as thousands separators. Supported currency identifiers: US$ (US dollar) CA$ (Canadian dollar) AU$ (Australian dollar) SG$ (Singapore dollar) HK$ (Hong Kong(China) dollar) C$ (Canadian dollar) A$ (Australian dollar) $ (dollar) £ (pound) € (euro) CN¥ (yuan) CNY (yuan) RMB (yuan) AUD (australian dollar) CAD (canadian dollar) CHF (swiss franc) DKK (danish krone) EUR (euro) GBP (british pound) HKD (Hong Kong (China) dollar) JPY (japanese yen) NOK (norwegian krone) SEK (swedish krona) SGD (singapore dollar) USD (united states dollar)
	$0.01	one cent
	JPY 1.01	one japanese yen and one sen
	£1.1	one pound and ten pence
	€2.01	two euros and one cent
	USD 1,000	one thousand united states dollars
Number + Quantifier + Currency identifier or Currency identifier + Number + Quantifier	1.23 Tn RMB	one point two three trillion yuan	Supported quantifier formats include the following: thousand million billion trillion Mil (million) mil (million) Bil (billion) bil (billion) MM (million) Bn (billion) bn (billion) Tn (trillion) tn (trillion) K(thousand) k (thousand) M (million) m (million)
	$1.2 K	one point two thousand dollars

measure

Format	Example	English Outputs	Description
Number + Unit of measurement	1.0 kg	one kilogram	Supports integers, decimals, and international notation with comma separators. Supports common unit abbreviations.
Number + Unit of measurement	1,234.01 km	one thousand two hundred thirty-four point zero one kilometers
Unit of measurement	mm²	square millimeter

The following table lists the pronunciations of common symbols for <say-as>.

Symbol	English pronunciation
!	exclamation mark
“	double quote
#	pound
$	dollar
%	percent
&	and
‘	left quote
（	left parenthesis
）	right parenthesis
*	asterisk
+	plus
,	comma
-	dash
.	dot
/	slash
:	Solon
；	semicolon
<	less than
=	equals
>	greater than
?	question mark
@	at
[	left bracket
\	backslash
]	right bracket
^	caret
_	underscore
`	backtick
{	left brace
\|	vertical bar
}	right brace
~	tilde
！	exclamation mark
“	left double quote
”	right double quote
‘	left quote
’	right quote
（	left parenthesis
）	right parenthesis
，	comma
。	full stop
—	em dash
:	colon
；	semicolon
？	question mark
、	enumeration comma
…	ellipsis
……	ellipsis
《	left guillemet
》	right guillemet
￥	yuan
≥	greater than or equal to
≤	less than or equal to
≠	not equal
≈	approximately equal
±	plus or minus
×	times
π	pi
Α	alpha
Β	beta
Γ	gamma
Δ	delta
Ε	epsilon
Ζ	zeta
Θ	theta
Ι	iota
Κ	kappa
∧	lambda
Μ	mu
Ν	nu
Ξ	ksi
Ο	omicron
∏	pi
Ρ	rho
∑	sigma
Τ	tau
Υ	upsilon
Φ	phi
Χ	chi
Ψ	psi
Ω	omega
α	alpha
β	beta
γ	gamma
δ	delta
ε	epsilon
ζ	zeta
η	eta
θ	theta
ι	iota
κ	kappa
λ	lambda
μ	mu
ν	nu
ξ	ksi
ο	omicron
π	pi
ρ	rho
σ	sigma
τ	tau
υ	upsilon
φ	phi
χ	chi
ψ	psi
ω	omega

The following table lists common units of measurement for <say-as>.

Format	Category	English example
Abbreviation	Length	nm (nanometer), μm (micrometer), mm (millimeter), cm (centimeter), m (meter), km (kilometer), ft (foot), in (inch)
	Area	cm² (square centimeter), m² (square meter), km² (square kilometer), SqFt (square foot)
	Volume	cm³ (cubic centimeter), m³ (cubic meter), km³ (cubic kilometer), mL (milliliter), L (liter), gal (gallon)
	Weight	μg (microgram), mg (milligram), g (gram), kg (kilogram)
	Time	min (minute), sec (second), ms (millisecond)
	Electromagnetism	μA (microamp), mA (milliamp), Hz (hertz), kHz (kilohertz), MHz (megahertz), GHz (gigahertz), V (volt), kV (kilovolt), kWh (kilowatt hour)
	Sound	dB (decibel)
	Atmospheric pressure	Pa (pascal), kPa (kilopascal), MPa (megapascal)
Other common units		Supports units of measurement that are not limited to the preceding categories, such as tsp (teaspoon), rpm (revolutions per minute), KB (kilobyte), and mmHg (millimetre of mercury).

Relationship
The <say-as> tag can contain text and the <vhml/> tag.

Examples

cardinal

<speak>
  <say-as interpret-as="cardinal">12345</say-as>
</speak>

<speak>
  <say-as interpret-as="cardinal">10234</say-as>
</speak>

digits

<speak>
  <say-as interpret-as="digits">12345</say-as>
</speak>

<speak>
  <say-as interpret-as="digits">10234</say-as>
</speak>

telephone

<speak>
  <say-as interpret-as="telephone">12345</say-as>
</speak>

<speak>
  <say-as interpret-as="telephone">10234</say-as>
</speak>

name

<speak>
  Her former name is <say-as interpret-as="name">Zeng Xiaofan</say-as>
</speak>

address

<speak>
  <say-as interpret-as="address">Fulu International, Building 1, Unit 3, Room 304</say-as>
</speak>

<speak>
  <say-as interpret-as="id">myid_1998</say-as>
</speak>

characters

<speak>
  <say-as interpret-as="characters">Greek letters αβ</say-as>
</speak>

<speak>
  <say-as interpret-as="characters">*b+3.c$=α</say-as>
</speak>

punctuation

<speak>
  <say-as interpret-as="punctuation"> -./:;</say-as>
</speak>

date

<speak>
  <say-as interpret-as="date">1000-10-10</say-as>
</speak>

<speak>
  <say-as interpret-as="date">10-01-2020</say-as>
</speak>

time

<speak>
  <say-as interpret-as="time">5:00am</say-as>
</speak>

<speak>
  <say-as interpret-as="time">0500</say-as>
</speak>

currency

<speak>
  <say-as interpret-as="currency">13,000,000.00RMB</say-as>
</speak>

<speak>
  <say-as interpret-as="currency">$1,000.01</say-as>
</speak>

measure

<speak>
  <say-as interpret-as="measure">100m12cm6mm</say-as>
</speak>

<speak>
  <say-as interpret-as="measure">1,000.01kg</say-as>
</speak>

Alibaba Cloud Model Studio:Speech Synthesis Markup Language (SSML)

Limitations

Getting started

Java SDK

Non-streaming call

Unidirectional streaming call

Python SDK

Non-streaming call

Unidirectional streaming call

WebSocket API

Go

C#

PHP

Node.js

Java

pom.xml

build.gradle

Python

Tags

`<speak>`: Root node

<break>: Controls pause duration

<sub>: Replaces text

<phoneme>: Specifies pronunciation (Pinyin/phonetic alphabet)

<soundEvent>: Inserts an external sound (such as a ringtone or a cat's meow)

<say-as>: Sets how text is read (such as numbers, dates, and phone numbers)

Limitations

Getting started

Java SDK

Non-streaming call

Unidirectional streaming call

Python SDK

Non-streaming call

Unidirectional streaming call

WebSocket API

Go

C#

PHP

Node.js

Java

pom.xml

build.gradle

Python

Tags

<speak>: Root node

<break>: Controls pause duration

<sub>: Replaces text

<phoneme>: Specifies pronunciation (Pinyin/phonetic alphabet)

<soundEvent>: Inserts an external sound (such as a ringtone or a cat's meow)

<say-as>: Sets how text is read (such as numbers, dates, and phone numbers)

`<speak>`: Root node