All Products
Search
Document Center

API reference

Last Updated: Nov 29, 2019

Features

Speech synthesis allows you to synthesize a text file to a binary file of speech.

  • Supports the coding formats of PCM, WAV, and MP3.
  • Supports setting the speed, intonation, and volume.
  • Supports setting the male and female voices.

Speaker description

Name Value of the voice parameter Type Scenario Supported language Supported sampling rate (Hz)
Xiaoyun Xiaoyun Standard female voice Common scenario Chinese and mixed Chinese and English 8,000 and 16,000
Xiaogang Xiaogang Standard male voice Common scenario Chinese and mixed Chinese and English 8,000 and 16,000
Xiaomeng Xiaomeng Standard female voice Common scenario Chinese and mixed Chinese and English 8,000 and 16,000
Xiaowei Xiaowei Standard male voice Common scenario Chinese and mixed Chinese and English 8,000 and 16,000
Ruoxi Ruoxi Gentle female voice Common scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000
Siqi Siqi Gentle female voice Common scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000
Sijia Sijia Standard female voice Common scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000
Sicheng Sicheng Standard male voice Common scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000
Aiqi Aiqi Gentle female voice Common scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000
Aijia Aijia Standard female voice Common scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000
Aicheng Aicheng Standard male voice Common scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000
Aida Aida Standard male voice Common scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000
Ninger Ninger Standard female voice Common scenario Chinese 8,000, 16,000, and 24,000
Ruilin Ruilin Standard female voice Common scenario Chinese 8,000, 16,000, and 24,000
Amei Amei Sweet female voice Customer service scenario Chinese and mixed Chinese and English 8,000 and 16,000
Xiaoxue Xiaoxue Gentle female voice Customer service scenario Chinese and mixed Chinese and English 8,000 and 16,000
Siyue Siyue Gentle female voice Customer service scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000
Aixia Aixia Amiable female voice Customer service scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000
Aimei Aimei Sweet female voice Customer service scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000
Aiyu Aiyu Natural female voice Customer service scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000
Aiyue Aiyue Gentle female voice Customer service scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000
Aijing Aijing Strict female voice Customer service scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000
Xiaomei Xiaomei Sweet female voice Customer service scenario Chinese and mixed Chinese and English 8,000, 16,000, and 24,000
Yina Yina Female voice with Zhejiang accent Customer service scenario Chinese 8,000, 16,000, and 24,000
Sijing Sijing Strict female voice Customer service scenario Chinese 8,000, 16,000, and 24,000
Sitong Sitong Child voice Child voice scenario Chinese 8,000, 16,000, and 24,000
Xiaobei Xiaobei Lolita female voice Child voice scenario Chinese 8,000, 16,000, and 24,000
Aibao Aibao Lolita female voice Child voice scenario Chinese 8,000, 16,000, and 24,000
Halen Halen Female voice English scenario English 8,000 and 16,000
Harry Harry Male voice English scenario English 8,000 and 16,000
Wendy Wendy Female voice English scenario English 8,000, 16,000, and 24,000
William William Male voice English scenario English 8,000, 16,000, and 24,000
Olivia Olivia Female voice English scenario English 8,000, 16,000, and 24,000
Shanshan Shanshan Cantonese female voice Dialect scenario Cantonese (simplified) and mixed Cantonese and English 8,000, 16,000, and 24,000

Limits

  • The entered text must be UTF-8 encoded.
  • The entered text contains a maximum of 300 characters. If text contains more than 300 characters, the extra characters are deleted, and only 300 characters are synthesized.

Endpoints

Access type Description URL
External access from the Internet This endpoint allows you to access the speech synthesis service from any host over the Internet. By default, the Internet access URL is built in the SDK. You do not need to set the URL manually. wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1

Interaction process

Note: The interaction flowchart applies to Java SDK, C++ SDK, iOS SDK, and Android SDK, but does not apply to the RESTful API. For more information about the flowchart for the RESTful API, see RESTful API 2.0.

tts

Note: In addition to audio streams returned in the response, the server adds the task_id field to the response header for all responses to indicate the ID of the synthesis task. You need to record the value of this field. If an error occurs, you can open a ticket to submit the task ID and error message.

1. Authentication

To establish a WebSocket connection with the server, the client must use a token for authentication. For more information about how to obtain the token, see Obtain a token.

2. Synthesis startup

The client sends a speech synthesis request. You need to use the relevant set method of the SpeechSynthesizer object to set request parameters. The following table describes the request parameters.

Parameter Type Required Description
appkey String Yes The appkey of a project created in the Intelligent Speech Interaction console.
text String Yes The text to be synthesized. The text must be UTF-8 encoded and cannot exceed 300 characters in length. Separate words with spaces.
voice String No The speaker. Default value: xiaoyun.
format String No The audio coding format. Default value: pcm. Valid values: pcm, wav, and mp3.
sample_rate Integer No The audio sampling rate, in Hz. Default value: 16000.
volume Integer No The volume. Value range: 0 to 100. Default value: 50.
speech_rate Integer No The speed. Value range: -500 to 500. Default value: 0.
pitch_rate Integer No The intonation. Value range: -500 to 500. Default value: 0.

3. Audio data synthesis

The server returns the synthesized speech binary data, and the SDK receives and processes the binary data.

4. Synthesis completion

After the speech synthesis is completed, the server sends an event notification. Example:

  1. {
  2. "header": {
  3. "message_id": "05450bf69c53413f8d88aed1ee600e93",
  4. "task_id": "640bc797bb684bd69601856513079df5",
  5. "namespace": "SpeechSynthesizer",
  6. "name": "SynthesisCompleted",
  7. "status": 20000000,
  8. "status_message": "GATEWAY|SUCCESS|Success."
  9. }
  10. }

Notes:In the sample demo, the synthesized audio is stored in a file. If you need to play the audio in real time, we recommend that you use stream playback to receive audio data while playing. This reduces the latency.

Service status codes

Each response message contains a status field, which indicates the service status code. The following table describes service status codes.

Common errors

Error code Cause Solution
40000001 The error message returned because the client fails authentication. Check whether the token used by the client is correct and valid.
40000002 The error message returned because the message is invalid. Check whether the message sent by the client meets relevant requirements.
403 The error message returned because the token expires or the request contains incorrect parameters. Check whether the token used by the client is valid. Then, check request parameter settings.
40000004 The error message returned because the idle status of the client times out. Check whether the client does not send any data to the server for a long time.
40000005 The error message returned because the number of requests exceeds the upper limit. Check whether the number of concurrent connections or the queries per second (QPS) exceeds the upper limit.
40000000 The error message returned because a client error has occurred. This is the default client error code. Resolve the error according to the error message or open a ticket.
50000000 The error message returned because a server error has occurred. This is the default server error code. If the error code is occasionally returned, ignore it. If the error code is returned multiple times, open a ticket.
50000001 The error message returned because an internal call error has occurred. If the error code is occasionally returned, ignore it. If the error code is returned multiple times, open a ticket.

Gateway errors

Error code Cause Solution
40010001 The error message returned because the method is not supported. If you use the SDK, open a ticket.
40010002 The error message returned because the instruction is not supported. If you use the SDK, open a ticket.
40010003 The error message returned because the instruction is invalid. If you use the SDK, open a ticket.
40010004 The error message returned because the client is disconnected. Check whether the client is disconnected before the server completes the requested task.
40010005 The error message returned because the task status is incorrect. Check whether the instruction is supported in the current task status.

Metadata errors

Error code Cause Solution
40020105 The error message returned because the application does not exist. Check whether the appkey is correct and belongs to the same account as the token.

TTS errors

Error code Cause Solution
41020001 The error message returned because a parameter error occurs. Check whether the specified parameter is correct.
51020001 The error message returned because an error occurs on the TTS server. If the error code is occasionally returned, ignore it. If the error code is returned multiple times, open a ticket.