Overview of speech synthesis - Intelligent Speech Interaction

The speech synthesis service is used to convert input text to binary audio data.

Features

Supports the following audio coding formats: pulse-code modulation (PCM), WAV, and MP3.
Allows you to configure the speed, intonation, and volume of the speaker.

Allows you to set the speaker of the generated speech, including male voices and female voices for different languages or dialects.

Important

Supports phoneme boundary detection for each Chinese character or English word. The speech synthesis service generates a timestamp for each word in the synthesized speech. This timestamp indicates the point in time of each Chinese character or English word in the speech. The timestamp information can be used for lip synchronization or dubbing. For more information, see Timestamp feature.

Name	Value of the voice parameter	Type	Scenario	Supported language	Supported sampling rate (Hz)	Phoneme boundary detection for each character or word	Remarks
Xiaoyun	Xiaoyun	Standard female voice	All scenarios	Chinese or bilingual (Chinese and English)	8K/16K	No	None
Xiaogang	Xiaogang	Standard male voice	All scenarios	Chinese or bilingual (Chinese and English)	8K/16K	No	None
Ruoxi	Ruoxi	Gentle female voice	All scenarios	Chinese or bilingual (Chinese and English)	8K/16K/24K	No	None
Siqi	Siqi	Gentle female voice	All scenarios	Chinese or bilingual (Chinese and English)	8K/16K/24K	Yes	None
Sijia	Sijia	Standard female voice	All scenarios	Chinese or bilingual (Chinese and English)	8K/16K/24K	No	None
Sicheng	Sicheng	Standard male voice	All scenarios	Chinese or bilingual (Chinese and English)	8K/16K/24K	Yes	None
Aiqi	Aiqi	Gentle female voice	All scenarios	Chinese or bilingual (Chinese and English)	8K/16K	Yes	None
Aijia	Aijia	Standard female voice	All scenarios	Chinese or bilingual (Chinese and English)	8K/16K	Yes	None
Aicheng	Aicheng	Standard male voice	All scenarios	Chinese or bilingual (Chinese and English)	8K/16K	Yes	None
Aida	Aida	Standard male voice	All scenarios	Chinese or bilingual (Chinese and English)	8K/16K	Yes	None
Ning'er	Ninger	Standard female voice	All scenarios	Chinese only	8K/16K/24K	No	None
Ruilin	Ruilin	Standard female voice	All scenarios	Chinese only	8K/16K/24K	No	None
Siyue	Siyue	Gentle female voice	Customer service	Chinese or bilingual (Chinese and English)	8K/16K/24K	No	None
Aiya	Aiya	Harsh female voice	Customer service	Chinese or bilingual (Chinese and English)	8K/16K	Yes	None
Aixia	Aixia	Amiable female voice	Customer service	Chinese or bilingual (Chinese and English)	8K/16K	Yes	None
Aimei	Aimei	Sweet female voice	Customer service	Chinese or bilingual (Chinese and English)	8K/16K	Yes	None
Aiyu	Aiyu	Natural female voice	Customer service	Chinese or bilingual (Chinese and English)	8K/16K	Yes	None
Aiyue	Aiyue	Gentle female voice	Customer service	Chinese or bilingual (Chinese and English)	8K/16K	Yes	None
Aijing	Aijing	Harsh female voice	Customer service	Chinese or bilingual (Chinese and English)	8K/16K	Yes	None
Xiaomei	Xiaomei	Sweet female voice	Customer service	Chinese or bilingual (Chinese and English)	8K/16K/24K	No	None
Aina	Aina	Female voice with Zhejiang accent	Customer service	Chinese only	8K/16K	Yes	None
Yina	Yina	Female voice with Zhejiang accent	Customer service	Chinese only	8K/16K/24K	No	None
Sijing	Sijing	Harsh female voice	Customer service	Chinese only	8K/16K/24K	Yes	None
Sitong	Sitong	Child voice	Scenarios in which child voices are required	Chinese only	8K/16K/24K	No	None
Xiaobei	Xiaobei	Little girl voice	Scenarios in which child voices are required	Chinese only	8K/16K/24K	Yes	None
Aitong	Aitong	Child voice	Scenarios in which child voices are required	Chinese only	8K/16K	Yes	None
Aiwei	Aiwei	Little girl voice	Scenarios in which child voices are required	Chinese only	8K/16K	Yes	None
Aibao	Aibao	Little girl voice	Scenarios in which child voices are required	Chinese only	8K/16K	Yes	None
Harry	Harry	Male voice with British accent	English only	English only	8K/16K	No	None
Abby	Abby	Female voice with American accent	English only	English only	8K/16K	No	None
Andy	Andy	Male voice with American accent	English only	English only	8K/16K	No	None
Eric	Eric	Male voice with British accent	English only	English only	8K/16K	No	None
Emily	Emily	Female voice with British accent	English only	English only	8K/16K	No	None
Luna	Luna	Female voice with British accent	English only	English only	8K/16K	No	None
Luca	Luca	Male voice with British accent	English only	English only	8K/16K	No	None
Wendy	Wendy	Female voice with British accent	English only	English only	8K/16K/24K	No	None
William	William	Male voice with British accent	English only	English only	8K/16K/24K	No	None
Olivia	Olivia	Female voice with British accent	English only	English only	8K/16K/24K	No	None
Shanshan	Shanshan	Voice of a female that speaks Cantonese	Scenarios in which dialects are used	Cantonese (simplified) and bilingual (Cantonese and English)	8K/16K/24K	No	None
Xiaoyue	Xiaoyue	Female voice with Sichuan accent	Scenarios in which dialects are used	Chinese or bilingual (Chinese and English)	8K/16K	No	In public preview
Lydia	Lydia	Female voice with bilingual (Chinese and English)	English only	English only	8K/16K	No	In public preview
Aishuo	Aishuo	Natural male voice	Customer service	Chinese or bilingual (Chinese and English)	8K/16K	Yes	In public preview
Qingqing	Qingqing	Voice of a female that speaks Taiwanese	Scenarios in which dialects are used	Chinese only	8K/16K	No	In public preview
Cuijie	Cuijie	Voice of a female that speaks Northeastern Mandarin	Scenarios in which dialects are used	Chinese only	8K/16K	No	In public preview
Xiaoze	Xiaoze	Male voice with strong Hunan accent	Scenarios in which dialects are used	Chinese only	8K/16K	Yes	In public preview

Limits

The input text must be UTF-8 encoded.
The input text can be up to 300 characters in length. If the text contains more than 300 characters, the additional characters are deleted.

Service addresses

Type	Description	URL
Access from external networks	You can use the URL to access the speech synthesis service from all clients over the Internet. The URL for external access is specified as the default URL in the SDK.	wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1

1. Provide a token to pass the authentication

To establish a WebSocket connection from your client to the server and provide a token to pass the authentication. For more information about how to obtain a token, see Obtain a token.

2. Start the synthesis task

The client sends a request to start speech synthesis. You can use the SET method of the SpeechSynthesizer object in the SDK to configure request parameters. The following table describes the request parameters.

Parameter	Type	Required	Description
appkey	String	Yes	The appkey of your project that is created in the Intelligent Speech Interaction console.
text	String	Yes	The text that you want to synthesize. The text must be `UTF-8` encoded. The text can be up to 300 characters in length. Use space characters to separate the words in the text.
voice	String	No	The speaker that you want to use. Default value: `xiaoyun`.
format	String	No	The audio coding format. Default value: pcm. Valid values: pcm, wav, and mp3.
sample_rate	Integer	No	The audio sampling rate. Unit: Hz. Default value: 16000.
volume	Integer	No	The voice volume of the speaker. Value range: 0 to 100. Default value: 50.
speech_rate	Integer	No	The speed at which the speaker speaks. Value range: -500 to 500. Default value: 0.
pitch_rate	Integer	No	The intonation of the speaker. Value range: -500 to 500. Default value: 0.

3. Receive the synthesized audio data

The server returns the synthesized audio data in binary format. The client receives and processes the audio data by using the SDK.

4. Complete the synthesis task

After the synthesis task is completed, the server sends a notification message. The following example shows a sample notification message:

{
    "header": {
        "message_id": "05450bf69c53413f8d88aed1ee60****",
        "task_id": "640bc797bb684bd6960185651307****",
        "namespace": "SpeechSynthesizer",
        "name": "SynthesisCompleted",
        "status": 20000000,
        "status_message": "GATEWAYSUCCESSSuccess."
    }
}

Note

In the demo, the synthesized audio is stored in a file. If you want to play the synthesized audio during the reception process, we recommend that you use stream playback. The stream playback mode allows you to play the synthesized audio while audio data is being received. This reduces the amount of time that you need to wait before you can play the audio.

Status codes

Each response contains a status code. The following tables describe the status codes.

Common errors

Status code	Cause	Solution
40000001	The client failed to pass the authentication.	Check whether the token that is used by the client is valid or expired.
40000002	The request is invalid.	Check whether the request that is sent by the client meets the requirements.
403	The token is expired or the request contains invalid parameters.	Check whether the token that is used by the client is expired. Then, check whether the parameter values are valid.
40000004	The client timed out.	Check whether the client did not send data to the server for a long period of time, such as 10 seconds.
40000005	The number of requests exceeds the upper limit.	Check whether the number of concurrent connections or queries per second (QPS) value exceeds the upper limit. If the number of concurrent connections exceeds the upper limit, we recommend that you upgrade Intelligent Speech Interaction from Trial Edition to Commercial Edition. If you use Commercial Edition, we recommend that you purchase more resources to increase the concurrency.
40000000	A client error occurred. This is the default status code for client errors.	Resolve the error based on the error message or submit a ticket.
50000000	A server error occurred. This is the default status code for server errors.	If the status code is occasionally returned, ignore it. If the status code is returned multiple times, submit a ticket.
50000001	An internal call error occurred.	If the status code is occasionally returned, ignore it. If the status code is returned multiple times, submit a ticket.

Gateway errors

Status code	Cause	Solution
40010001	The method is not supported.	If you use the SDK, submit a ticket.
40010002	The instruction is not supported.	If you use the SDK, submit a ticket.
40010003	The instruction format is invalid.	If you use the SDK, submit a ticket.
40010004	The client unexpectedly disconnected.	Check whether the client disconnected before the server completed the requested task.
40010005	The task status is invalid.	Check whether the instruction is supported when the task is in the current state.

Configuration errors

Status code	Cause	Solution
40020105	The application does not exist.	Check whether the appkey is correct and belongs to the same Alibaba Cloud account as the token.

Text-to-speech (TTS) service errors

Status code	Cause	Solution
41020001	One or more parameters are invalid.	Check whether the specified parameter values are valid.
51020001	A TTS server error occurred.	If the status code is occasionally returned, ignore it. If the status code is returned multiple times, submit a ticket.