All Products
Search
Document Center

Overview

Last Updated: Sep 22, 2020

Intelligent Speech Interaction supports synthesizing speech from long text that contains thousands of or tens of thousands of characters. The long-text-to-speech synthesis service converts text data to binary audio data.

Features

  • Supports the coding formats of PCM, WAV, and MP3.

  • Allows you to set the speed, intonation, and volume of the speaker.

  • Allows you to set the speaker type of the generated speech, including male and female voices of different languages or dialects.

  • Supports both real-time and offline speech synthesis.

  • Compared with the speech synthesis service, the long-text-to-speech synthesis service has the following benefits:

    • Supports synthesizing speech from longer text that contains up to 100,000 characters at a time.

    • Synthesizes speech from long text at a high speed. The service can finish processing 50,000 characters in 10 minutes at the earliest.

    • Supports caching synthesized audio data on the client for repeated use.

    • Allows you to set the speaker type of the generated speech based on the scenarios, such as novel and article reading.

Note

If you need to use the long-text-to-speech synthesis service, you must update the SDK to the latest version.

Supported speaker types

The following table describes the speaker types supported by long-text-to-speech synthesis.

Name

Value of the voice parameter

Category

Scenario

Supported language

Supported sampling rate (Hz)

Remarks

Xiaoyun Xiaoyun Standard female voice Common scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Xiaogang Xiaogang Standard male voice Common scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Ruoxi Ruoxi Gentle female voice Common scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000

N/A

Siqi Siqi Gentle female voice Common scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000

N/A

Sijia Sijia Standard female voice Common scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000

N/A

Sicheng Sicheng Standard male voice Common scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000

N/A

Aiqi Aiqi Gentle female voice Common scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Aijia Aijia Standard female voice Common scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Aicheng Aicheng Standard male voice Common scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Aida Aida Standard male voice Common scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Ninger Ninger Standard female voice Common scenario Chinese 8,000, 16,000, and 24,000

N/A

Ruilin Ruilin Standard female voice Common scenario Chinese 8,000, 16,000, and 24,000

N/A

Siyue Siyue Gentle female voice Customer service scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000

N/A

Aiya Aiya Strict female voice Customer service scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Aixia Aixia Amiable female voice Customer service scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Aimei Aimei Sweet female voice Customer service scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Aiyu Aiyu Natural female voice Customer service scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Aiyue Aiyue Gentle female voice Customer service scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Aijing Aijing Strict female voice Customer service scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Xiaomei Xiaomei Sweet female voice Customer service scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000

N/A

Aina Aina Female voice with Zhejiang accent Customer service scenario Chinese 8,000 and 16,000

N/A

Yina Yina Female voice with Zhejiang accent Customer service scenario Chinese 8,000, 16,000, and 24,000

N/A

Sijing Sijing Strict female voice Customer service scenario Chinese 8,000, 16,000, and 24,000

N/A

Sitong Sitong Child voice Child voice scenario Chinese 8,000, 16,000, and 24,000

N/A

Xiaobei Xiaobei Lolita female voice Child voice scenario Chinese 8,000, 16,000, and 24,000

N/A

Aitong Aitong Child voice Child voice scenario Chinese 8,000 and 16,000

N/A

Aiwei Aiwei Lolita female voice Child voice scenario Chinese 8,000 and 16,000

N/A

Aibao Aibao Lolita female voice Child voice scenario Chinese 8,000 and 16,000

N/A

Harry Harry Male voice with British accent English scenario English 8,000 and 16,000

N/A

Abby Abby Female voice with American accent English scenario English 8,000 and 16,000

N/A

Andy Andy Male voice with American accent English scenario English 8,000 and 16,000

N/A

Eric Eric Male voice with British accent English scenario English 8,000 and 16,000

N/A

Emily Emily Female voice with British accent English scenario English 8,000 and 16,000

N/A

Luna Luna Female voice with British accent English scenario English 8,000 and 16,000

N/A

Luca Luca Male voice with British accent English scenario English 8,000 and 16,000

N/A

Wendy Wendy Female voice with British accent English scenario English 8,000, 16,000, and 24,000

N/A

William William Male voice with British accent English scenario English 8,000, 16,000, and 24,000

N/A

Olivia Olivia Female voice with British accent English scenario English 8,000, 16,000, and 24,000

N/A

Shanshan Shanshan Cantonese female voice Dialect scenario Cantonese (simplified) and mixed Cantonese and English 8,000, 16,000, and 24,000

N/A

Aiyuan Aiyuan Amiable female voice Literature scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Aiying Aiying Lolita female voice Literature scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Aixiang Aixiang Charming male voice Literature scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Aimo Aimo Emotional male voice Literature scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Aiye Aiye Young male voice Literature scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Aiting Aiting Charming female voice Literature scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Aifan Aifan Emotional female voice Literature scenario Chinese and mixed Chinese and English 8,000 and 16,000

N/A

Lydia Lydia Female voice of mixed Chinese and English English scenario Chinese and mixed Chinese and English 8,000 and 16,000

In public preview

Xiaoyue Xiaoyue Female voice with Sichuan accent Dialect scenario Chinese and mixed Chinese and English 8,000 and 16,000

In public preview

Aishuo Aishuo Natural male voice Customer service scenario Chinese and mixed Chinese and English 8,000 and 16,000

In public preview

Aide Aide Strict male voice Literature scenario Chinese and mixed Chinese and English 8,000 and 16,000

In public preview

Qingqing Qingqing Female voice with Formosan accent Dialect scenario Chinese 8,000 and 16,000

In public preview

Cuijie Cuijie Female voice of Northeastern Mandarin Dialect scenario Chinese 8,000 and 16,000

In public preview

Xiaoze Xiaoze Male voice with strong Hunan accent Dialect scenario Chinese 8,000 and 16,000

In public preview

Usage notes

  • The entered text must be UTF-8 encoded.

  • The features of the long-text-to-speech synthesis service are similar to those of speech synthesis. We recommend that you compare these two services for a deeper understanding.

Endpoints

Access types

Description

URL

External access from the Internet

This endpoint allows you to access the long-text-to-speech synthesis service from any host by using the Internet. By default, the Internet access URL is built in the SDK.

wss://nls-gateway.cn-shanghai.aliyuncs.com/ws/v1

Internal access from an Elastic Compute Service (ECS) instance located in the China (Shanghai) region

This endpoint allows you to access the long-text-to-speech synthesis service from an ECS instance located in the China (Shanghai) region by using an internal network.

Note

Access from an ECS instance in the internal network does not consume Internet access traffic.

You cannot access an AnyTunnel virtual IP address (VIP) from a classic network-connected ECS instance. This means that you cannot use such an ECS instance to access the long-text-to-speech synthesis service by using an internal network. To access this service by using an AnyTunnel VIP, you must create a virtual private network (VPC) and access the service from the VPC. For more information about the network types of ECS instances, see Network types.

ws://nls-gateway.cn-shanghai-internal.aliyuncs.com:80/ws/v1

Interaction process

长文本语音合成

Note

  • The interaction flowchart applies to the SDK for Java and SDK for C++, but does not apply to the RESTful API. For more information about the flowchart for the RESTful API, see RESTful API.

  • In addition to audio streams returned in the response, the server adds the task_id field to the response header for all responses to indicate the ID of the synthesis task. We recommend that you record the value of this field. If an error occurs, you can submit a ticket and provide the task ID and error message for troubleshooting.

1. Authenticate the client

To establish a WebSocket connection with the server, the client must use a token for authentication. For more information about how to obtain the token, see Obtain a Token.

2. Start the synthesis task

You must use the relevant set method of the SpeechSynthesizer object to set request parameters for the client to send a speech synthesis request. The following table describes the request parameters.

Parameter

Type

Required

Description

appkey

String

Yes

The appkey of your project created in the Intelligent Speech Interaction console.

text

String

Yes

The text to be processed, which must be UTF-8 encoded. Spaces are required between words.

voice

String

No

The speaker type. Default value: xiaoyun.

format

String

No

The audio encoding format. Default value: pcm. Valid values: pcm, wav, and mp3.

sample_rate

Integer

No

The audio sampling rate, in Hz. Default value: 16000.

volume

Integer

No

The volume of the speaker. Value range: 0 to 100. Default value: 50.

speech_rate

Integer

No

The speed of the speaker. Value range: -500 to 500. Default value: 0.

pitch_rate

Integer

No

The intonation of the speaker. Value range: -500 to 500. Default value: 0.

3. Receive the synthesized audio data

The server returns the synthesized speech binary data, and the SDK receives and processes the binary data.

4. Complete the synthesis task

After the synthesis task is completed, the server sends an notification message. Example:

{
    "header":{
        "namespace":"SpeechLongSynthesizer",
        "name":"SynthesisCompleted",
        "status":20000000,
        "message_id":"396c80b3abf84082a48cb9e5c424****",
        "task_id":"f5805be640364cdcafc8da63e512****",
        "status_text":"Gateway:SUCCESS:Success."
    }
}
Note

In the demo, the synthesized audio is stored in a file. If you need to play the synthesized audio in real time, we recommend that you use stream playback. The stream playback mode allows you to play the synthesized audio while audio data is being received. You do not need to wait until the synthesis task is completed. This reduces the latency.

Service Status Code

Each response contains a status field, which indicates the service status code. The following tables describe service status codes.

  • common error

    Error code

    Cause

    Solution

    40000001The authentication fails.Check whether the token that you use is correct or whether the token expires.
    40000002The text is invalid.Check whether the uploaded text meets the requirement.
    403The token expires or the request contains invalid parameters.Check whether the token used by the client is valid. Then, check the settings of request parameters.
    40000004The idle connection times out.Check whether no data has been sent to the server for 10 consecutive seconds.
    40000005The number of requests exceeds the upper limit.Check whether the number of concurrent connections or the queries per second (QPS) exceeds the upper limit.
    40000000A client error occurred. This is the default client error code.Resolve the error based on the error message or submit a ticket.
    50000000A server error occurred. This is the default server error code.If the error code is occasionally returned, ignore it. If the error code is returned multiple times, submit a ticket.
    50000001An internal call error occurred.If the error code is occasionally returned, ignore it. If the error code is returned multiple times, submit a ticket.

  • Gateway errors

    Error code

    Cause

    Solution

    40010001The method is not supported.If you use the SDK, submit a ticket.
    40010002The instruction is not supported.If you use the SDK, submit a ticket.
    40010003The instruction is invalid.If you use the SDK, submit a ticket.
    40010004The client is disconnected.Check whether the client is disconnected before the server completes the requested task.
    40010005The task is in an abnormal state.Check whether the instruction is supported in the current task status.

  • configuration errors

    Error codes

    Cause

    Solution

    40020105

    The specified appkey is invalid.

    Check whether the appkey is correct and belongs to the same Alibaba Cloud account as the token.

  • Service errors

    Error codes

    Cause

    Solution

    41020001

    A specified parameter is invalid.

    Check whether the specified parameters are correct.

    51020001

    An error occurred on the server.

    If the error code is occasionally returned, ignore it. If the error code is returned multiple times, submit a ticket.