All Products
Search
Document Center

Intelligent Speech Interaction:Overview of speech synthesis

Last Updated:Sep 18, 2023

The speech synthesis service is used to convert input text to binary audio data.

Features

  • Supports the following audio coding formats: pulse-code modulation (PCM), WAV, and MP3.

  • Allows you to configure the speed, intonation, and volume of the speaker.

  • Allows you to set the speaker of the generated speech, including male voices and female voices for different languages or dialects.

    Important

    Supports phoneme boundary detection for each Chinese character or English word. The speech synthesis service generates a timestamp for each word in the synthesized speech. This timestamp indicates the point in time of each Chinese character or English word in the speech. The timestamp information can be used for lip synchronization or dubbing. For more information, see Timestamp feature.

    Name

    Value of the voice parameter

    Type

    Scenario

    Supported language

    Supported sampling rate (Hz)

    Phoneme boundary detection for each character or word

    Remarks

    Xiaoyun

    Xiaoyun

    Standard female voice

    All scenarios

    Chinese or bilingual (Chinese and English)

    8K/16K

    No

    None

    Xiaogang

    Xiaogang

    Standard male voice

    All scenarios

    Chinese or bilingual (Chinese and English)

    8K/16K

    No

    None

    Ruoxi

    Ruoxi

    Gentle female voice

    All scenarios

    Chinese or bilingual (Chinese and English)

    8K/16K/24K

    No

    None

    Siqi

    Siqi

    Gentle female voice

    All scenarios

    Chinese or bilingual (Chinese and English)

    8K/16K/24K

    Yes

    None

    Sijia

    Sijia

    Standard female voice

    All scenarios

    Chinese or bilingual (Chinese and English)

    8K/16K/24K

    No

    None

    Sicheng

    Sicheng

    Standard male voice

    All scenarios

    Chinese or bilingual (Chinese and English)

    8K/16K/24K

    Yes

    None

    Aiqi

    Aiqi

    Gentle female voice

    All scenarios

    Chinese or bilingual (Chinese and English)

    8K/16K

    Yes

    None

    Aijia

    Aijia

    Standard female voice

    All scenarios

    Chinese or bilingual (Chinese and English)

    8K/16K

    Yes

    None

    Aicheng

    Aicheng

    Standard male voice

    All scenarios

    Chinese or bilingual (Chinese and English)

    8K/16K

    Yes

    None

    Aida

    Aida

    Standard male voice

    All scenarios

    Chinese or bilingual (Chinese and English)

    8K/16K

    Yes

    None

    Ning'er

    Ninger

    Standard female voice

    All scenarios

    Chinese only

    8K/16K/24K

    No

    None

    Ruilin

    Ruilin

    Standard female voice

    All scenarios

    Chinese only

    8K/16K/24K

    No

    None

    Siyue

    Siyue

    Gentle female voice

    Customer service

    Chinese or bilingual (Chinese and English)

    8K/16K/24K

    No

    None

    Aiya

    Aiya

    Harsh female voice

    Customer service

    Chinese or bilingual (Chinese and English)

    8K/16K

    Yes

    None

    Aixia

    Aixia

    Amiable female voice

    Customer service

    Chinese or bilingual (Chinese and English)

    8K/16K

    Yes

    None

    Aimei

    Aimei

    Sweet female voice

    Customer service

    Chinese or bilingual (Chinese and English)

    8K/16K

    Yes

    None

    Aiyu

    Aiyu

    Natural female voice

    Customer service

    Chinese or bilingual (Chinese and English)

    8K/16K

    Yes

    None

    Aiyue

    Aiyue

    Gentle female voice

    Customer service

    Chinese or bilingual (Chinese and English)

    8K/16K

    Yes

    None

    Aijing

    Aijing

    Harsh female voice

    Customer service

    Chinese or bilingual (Chinese and English)

    8K/16K

    Yes

    None

    Xiaomei

    Xiaomei

    Sweet female voice

    Customer service

    Chinese or bilingual (Chinese and English)

    8K/16K/24K

    No

    None

    Aina

    Aina

    Female voice with Zhejiang accent

    Customer service

    Chinese only

    8K/16K

    Yes

    None

    Yina

    Yina

    Female voice with Zhejiang accent

    Customer service

    Chinese only

    8K/16K/24K

    No

    None

    Sijing

    Sijing

    Harsh female voice

    Customer service

    Chinese only

    8K/16K/24K

    Yes

    None

    Sitong

    Sitong

    Child voice

    Scenarios in which child voices are required

    Chinese only

    8K/16K/24K

    No

    None

    Xiaobei

    Xiaobei

    Little girl voice

    Scenarios in which child voices are required

    Chinese only

    8K/16K/24K

    Yes

    None

    Aitong

    Aitong

    Child voice

    Scenarios in which child voices are required

    Chinese only

    8K/16K

    Yes

    None

    Aiwei

    Aiwei

    Little girl voice

    Scenarios in which child voices are required

    Chinese only

    8K/16K

    Yes

    None

    Aibao

    Aibao

    Little girl voice

    Scenarios in which child voices are required

    Chinese only

    8K/16K

    Yes

    None

    Harry

    Harry

    Male voice with British accent

    English only

    English only

    8K/16K

    No

    None

    Abby

    Abby

    Female voice with American accent

    English only

    English only

    8K/16K

    No

    None

    Andy

    Andy

    Male voice with American accent

    English only

    English only

    8K/16K

    No

    None

    Eric

    Eric

    Male voice with British accent

    English only

    English only

    8K/16K

    No

    None

    Emily

    Emily

    Female voice with British accent

    English only

    English only

    8K/16K

    No

    None

    Luna

    Luna

    Female voice with British accent

    English only

    English only

    8K/16K

    No

    None

    Luca

    Luca

    Male voice with British accent

    English only

    English only

    8K/16K

    No

    None

    Wendy

    Wendy

    Female voice with British accent

    English only

    English only

    8K/16K/24K

    No

    None

    William

    William

    Male voice with British accent

    English only

    English only

    8K/16K/24K

    No

    None

    Olivia

    Olivia

    Female voice with British accent

    English only

    English only

    8K/16K/24K

    No

    None

    Shanshan

    Shanshan

    Voice of a female that speaks Cantonese

    Scenarios in which dialects are used

    Cantonese (simplified) and bilingual (Cantonese and English)

    8K/16K/24K

    No

    None

    Xiaoyue

    Xiaoyue

    Female voice with Sichuan accent

    Scenarios in which dialects are used

    Chinese or bilingual (Chinese and English)

    8K/16K

    No

    In public preview

    Lydia

    Lydia

    Female voice with bilingual (Chinese and English)

    English only

    English only

    8K/16K

    No

    In public preview

    Aishuo

    Aishuo

    Natural male voice

    Customer service

    Chinese or bilingual (Chinese and English)

    8K/16K

    Yes

    In public preview

    Qingqing

    Qingqing

    Voice of a female that speaks Taiwanese

    Scenarios in which dialects are used

    Chinese only

    8K/16K

    No

    In public preview

    Cuijie

    Cuijie

    Voice of a female that speaks Northeastern Mandarin

    Scenarios in which dialects are used

    Chinese only

    8K/16K

    No

    In public preview

    Xiaoze

    Xiaoze

    Male voice with strong Hunan accent

    Scenarios in which dialects are used

    Chinese only

    8K/16K

    Yes

    In public preview

Limits

  • The input text must be UTF-8 encoded.

  • The input text can be up to 300 characters in length. If the text contains more than 300 characters, the additional characters are deleted.

Service addresses

Type

Description

URL

Access from external networks

You can use the URL to access the speech synthesis service from all clients over the Internet. The URL for external access is specified as the default URL in the SDK.

wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1

1. Provide a token to pass the authentication

To establish a WebSocket connection from your client to the server and provide a token to pass the authentication. For more information about how to obtain a token, see Obtain a token.

2. Start the synthesis task

The client sends a request to start speech synthesis. You can use the SET method of the SpeechSynthesizer object in the SDK to configure request parameters. The following table describes the request parameters.

Parameter

Type

Required

Description

appkey

String

Yes

The appkey of your project that is created in the Intelligent Speech Interaction console.

text

String

Yes

The text that you want to synthesize. The text must be UTF-8 encoded. The text can be up to 300 characters in length. Use space characters to separate the words in the text.

voice

String

No

The speaker that you want to use. Default value: xiaoyun.

format

String

No

The audio coding format. Default value: pcm. Valid values: pcm, wav, and mp3.

sample_rate

Integer

No

The audio sampling rate. Unit: Hz. Default value: 16000.

volume

Integer

No

The voice volume of the speaker. Value range: 0 to 100. Default value: 50.

speech_rate

Integer

No

The speed at which the speaker speaks. Value range: -500 to 500. Default value: 0.

pitch_rate

Integer

No

The intonation of the speaker. Value range: -500 to 500. Default value: 0.

3. Receive the synthesized audio data

The server returns the synthesized audio data in binary format. The client receives and processes the audio data by using the SDK.

4. Complete the synthesis task

After the synthesis task is completed, the server sends a notification message. The following example shows a sample notification message:

{
    "header": {
        "message_id": "05450bf69c53413f8d88aed1ee60****",
        "task_id": "640bc797bb684bd6960185651307****",
        "namespace": "SpeechSynthesizer",
        "name": "SynthesisCompleted",
        "status": 20000000,
        "status_message": "GATEWAYSUCCESSSuccess."
    }
}
Note

In the demo, the synthesized audio is stored in a file. If you want to play the synthesized audio during the reception process, we recommend that you use stream playback. The stream playback mode allows you to play the synthesized audio while audio data is being received. This reduces the amount of time that you need to wait before you can play the audio.

Status codes

Each response contains a status code. The following tables describe the status codes.

Common errors

Status code

Cause

Solution

40000001

The client failed to pass the authentication.

Check whether the token that is used by the client is valid or expired.

40000002

The request is invalid.

Check whether the request that is sent by the client meets the requirements.

403

The token is expired or the request contains invalid parameters.

Check whether the token that is used by the client is expired. Then, check whether the parameter values are valid.

40000004

The client timed out.

Check whether the client did not send data to the server for a long period of time, such as 10 seconds.

40000005

The number of requests exceeds the upper limit.

Check whether the number of concurrent connections or queries per second (QPS) value exceeds the upper limit. If the number of concurrent connections exceeds the upper limit, we recommend that you upgrade Intelligent Speech Interaction from Trial Edition to Commercial Edition. If you use Commercial Edition, we recommend that you purchase more resources to increase the concurrency.

40000000

A client error occurred. This is the default status code for client errors.

Resolve the error based on the error message or submit a ticket.

50000000

A server error occurred. This is the default status code for server errors.

If the status code is occasionally returned, ignore it. If the status code is returned multiple times, submit a ticket.

50000001

An internal call error occurred.

If the status code is occasionally returned, ignore it. If the status code is returned multiple times, submit a ticket.

Gateway errors

Status code

Cause

Solution

40010001

The method is not supported.

If you use the SDK, submit a ticket.

40010002

The instruction is not supported.

If you use the SDK, submit a ticket.

40010003

The instruction format is invalid.

If you use the SDK, submit a ticket.

40010004

The client unexpectedly disconnected.

Check whether the client disconnected before the server completed the requested task.

40010005

The task status is invalid.

Check whether the instruction is supported when the task is in the current state.

Configuration errors

Status code

Cause

Solution

40020105

The application does not exist.

Check whether the appkey is correct and belongs to the same Alibaba Cloud account as the token.

Text-to-speech (TTS) service errors

Status code

Cause

Solution

41020001

One or more parameters are invalid.

Check whether the specified parameter values are valid.

51020001

A TTS server error occurred.

If the status code is occasionally returned, ignore it. If the status code is returned multiple times, submit a ticket.