The speech synthesis service provides the Natural User Interaction (NUI) SDK for mobile clients to convert text to binary speech data.
Description
Compared with common SDKs, the NUI SDK is smaller in size and supports more comprehensive status management. The NUI SDK provides comprehensive speech processing capabilities and can also serve as an atomic SDK, meeting diverse user requirements. In addition, the NUI SDK uses a unified API.
The NUI SDK has the following features:
Supports generating audio files in the pulse-code modulation (PCM) and MP3 formats.
Allows you to set the speed, intonation, and volume of the generated speeches.
Allows you to set the speaker type of the generated speeches. The following table describes the supported speaker type.
Name | Value of the voice parameter | Category | Scenario | Supported language | Supported sampling rate (Hz) | Support phoneme boundary detection for each word | Remarks |
Xiaoyun | Xiaoyun | Standard female voice | Common scenario | Chinese and mixed Chinese and English | 8,000 and 16,000 | No | N/A |
Xiaogang | Xiaogang | Standard male voice | Common scenario | Chinese and mixed Chinese and English | 8,000 and 16,000 | No | N/A |
Ruoxi | Ruoxi | Gentle female voice | Common scenario | Chinese and mixed Chinese and English | 8,000, 16,000, and 24,000 | No | N/A |
Siqi | Siqi | Gentle female voice | Common scenario | Chinese and mixed Chinese and English | 8,000, 16,000, and 24,000 | Yes | N/A |
Sijia | Sijia | Standard female voice | Common scenario | Chinese and mixed Chinese and English | 8,000, 16,000, and 24,000 | No | N/A |
Sicheng | Sicheng | Standard male voice | Common scenario | Chinese and mixed Chinese and English | 8,000, 16,000, and 24,000 | Yes | N/A |
Aiqi | Aiqi | Gentle female voice | Common scenario | Chinese and mixed Chinese and English | 8,000 and 16,000 | Yes | N/A |
Aijia | Aijia | Standard female voice | Common scenario | Chinese and mixed Chinese and English | 8,000 and 16,000 | Yes | N/A |
Aicheng | Aicheng | Standard male voice | Common scenario | Chinese and mixed Chinese and English | 8,000 and 16,000 | Yes | N/A |
Aida | Aida | Standard male voice | Common scenario | Chinese and mixed Chinese and English | 8,000 and 16,000 | Yes | N/A |
Ninger | Ninger | Standard female voice | Common scenario | Simplified Chinese | 8,000, 16,000, and 24,000 | No | N/A |
Ruilin | Ruilin | Standard female voice | Common scenario | Simplified Chinese | 8,000, 16,000, and 24,000 | No | N/A |
Siyue | Siyue | Gentle female voice | Customer service scenario | Chinese and mixed Chinese and English | 8,000, 16,000, and 24,000 | No | N/A |
Aiya | Aiya | Strict female voice | Customer service scenario | Chinese and mixed Chinese and English | 8,000 and 16,000 | Yes | N/A |
Aixia | Aixia | Amiable female voice | Customer service scenario | Chinese and mixed Chinese and English | 8,000 and 16,000 | Yes | N/A |
Aimei | Aimei | Sweet female voice | Customer service scenario | Chinese and mixed Chinese and English | 8,000 and 16,000 | Yes | N/A |
Aiyu | Aiyu | Natural female voice | Customer service scenario | Chinese and mixed Chinese and English | 8,000 and 16,000 | Yes | N/A |
Aiyue | Aiyue | Gentle female voice | Customer service scenario | Chinese and mixed Chinese and English | 8,000 and 16,000 | Yes | N/A |
Aijing | Aijing | Strict female voice | Customer service scenario | Chinese and mixed Chinese and English | 8,000 and 16,000 | Yes | N/A |
Xiaomei | Xiaomei | Sweet female voice | Customer service scenario | Chinese and mixed Chinese and English | 8,000, 16,000, and 24,000 | No | N/A |
Aina | Aina | Female voice with Zhejiang accent | Customer service scenario | Simplified Chinese | 8,000 and 16,000 | Yes | N/A |
Yina | Yina | Female voice with Zhejiang accent | Customer service scenario | Simplified Chinese | 8,000, 16,000, and 24,000 | No | N/A |
Sijing | Sijing | Strict female voice | Customer service scenario | Simplified Chinese | 8,000, 16,000, and 24,000 | Yes | N/A |
Sitong | Sitong | Child voice | Child voice scenario | Simplified Chinese | 8,000, 16,000, and 24,000 | No | N/A |
Xiaobei | Xiaobei | Lolita female voice | Child voice scenario | Simplified Chinese | 8,000, 16,000, and 24,000 | Yes | N/A |
Aitong | Aitong | Child voice | Child voice scenario | Simplified Chinese | 8,000 and 16,000 | Yes | N/A |
Aiwei | Aiwei | Lolita female voice | Child voice scenario | Simplified Chinese | 8,000 and 16,000 | Yes | N/A |
Aibao | Aibao | Lolita female voice | Child voice scenario | Simplified Chinese | 8,000 and 16,000 | Yes | N/A |
Harry | Harry | Male voice with British accent | English scenario | English | 8,000 and 16,000 | No | N/A |
Abby | Abby | Female voice with American accent | English scenario | English | 8,000 and 16,000 | No | N/A |
Andy | Andy | Male voice with American accent | English scenario | English | 8,000 and 16,000 | No | N/A |
Eric | Eric | Male voice with British accent | English scenario | English | 8,000 and 16,000 | No | N/A |
Emily | Emily | Female voice with British accent | English scenario | English | 8,000 and 16,000 | No | N/A |
Luna | Luna | Female voice with British accent | English scenario | English | 8,000 and 16,000 | No | N/A |
Luca | Luca | Male voice with British accent | English scenario | English | 8,000 and 16,000 | No | N/A |
Wendy | Wendy | Female voice with British accent | English scenario | English | 8,000, 16,000, and 24,000 | No | N/A |
William | William | Male voice with British accent | English scenario | English | 8,000, 16,000, and 24,000 | No | N/A |
Olivia | Olivia | Female voice with British accent | English scenario | English | 8,000, 16,000, and 24,000 | No | N/A |
Shanshan | Shanshan | Cantonese female voice | Dialect scenario | Cantonese (simplified) and mixed Cantonese and English | 8,000, 16,000, and 24,000 | No | N/A |
Xiaoyue | Xiaoyue | Female voice with Sichuan accent | Dialect scenario | Chinese and mixed Chinese and English | 8,000 and 16,000 | No | Available in public preview of Intelligent Speech Interaction |
Lydia | Lydia | Female voice of mixed Chinese and English | English scenario | English | 8,000 and 16,000 | No | Available in public preview of Intelligent Speech Interaction |
Aishuo | Aishuo | Natural male voice | Customer service scenario | English | 8,000 and 16,000 | Yes | Available in public preview of Intelligent Speech Interaction |
Qingqing | Qingqing | Female voice with Formosan accent | Dialect scenario | Simplified Chinese | 8,000 and 16,000 | No | Available in public preview of Intelligent Speech Interaction |
Cuijie | Cuijie | Female voice of Northeastern Mandarin | Dialect scenario | Simplified Chinese | 8,000 and 16,000 | No | Available in public preview of Intelligent Speech Interaction |
Xiaoze | Xiaoze | Male voice with strong Hunan accent | Dialect scenario | Simplified Chinese | 8,000 and 16,000 | Yes | Available in public preview of Intelligent Speech Interaction |
Limits
The entered text must be
UTF-8
encoded.The entered text can contain a maximum of 300 characters. If the text contains more than 300 characters, the excessive characters are deleted, and only the first 300 characters are synthesized.
Endpoints
Access type | Description | URL |
External access from the Internet | This endpoint allows you to access the speech synthesis service from any host over the Internet. By default, the Internet access URL is built in the SDK. | wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1 |
Interaction process
In addition to audio streams returned in the response, the server adds the task_id parameter to the response header for all responses to indicate the ID of the synthesis task. You can record the value of this parameter. If an error occurs, you can submit a ticket to report the task ID and error message.
1. Authenticate the client
To establish a WebSocket connection with the server, the client must use a token for authentication. For more information about how to obtain the token, see Obtain a Token.
The following table describes the parameters used for authentication and initialization.
Parameter | Type | Required | Description |
workspace | String | Yes | The working directory from which the SDK reads the configuration file. |
app_key | String | Yes | The appkey of your project created in the Intelligent Speech Interaction console. |
token | String | Yes | The token provided as the credential for you to use Intelligent Speech Interaction. Make sure that the token is valid. You can set the token when you initialize the SDK and update the token when you set the request parameters. |
device_id | String | Yes | The unique identifier of the device, for example, the media access control (MAC) address, serial number, or pseudo unique ID of the device. |
2. Send a request to use the speech synthesis service
You must set the request parameters for the client to send a service request. You can set the request parameters by calling the setparamTts method in the SDK. The following table describes the request parameters.
Parameter | Type | Required | Description |
appkey | String | Yes | The appkey of your project created in the Intelligent Speech Interaction console. |
token | String | No | The token provided as the credential for you to use Intelligent Speech Interaction. You can update the token as required by setting this parameter. |
direct_host | String | No | The IP address that is resolved from the Domain Name System (DNS) domain name. The client completes the resolution and uses the obtained IP address to access the service. |
font_name | String | No | The speaker type. Default value: xiaoyun. |
encode_type | String | No | The audio encoding format. Default value: PCM. Valid values: PCM, WAV, and MP3. |
sample_rate | String | No | The audio sampling rate. Unit: Hz. Default value: 16000. |
volume | String | No | The volume of the speaker. Valid values: 0 to 2. Default value: 1.0. |
speed_level | String | No | The speed of the speaker. Valid values: 0.5 to 2. Default value: 1.0. A greater value indicates a higher speed. |
pitch_level | String | No | The intonation of the speaker. Valid values: -500 to 500. Default value: 0. A greater value indicates a sharper voice. |
3. Receive the synthesized speech data
The server returns the synthesized speech data in the binary format, and the SDK receives and processes the binary data.
4. Complete the synthesis task
After the synthesis task is completed, the server sends a notification message.
Error codes
If an error occurs during speech synthesis, the SDK reports a TTS_EVENT_ERROR event to the server and returns an error message to the client. The following table describes the error messages that may be returned.
Error code | Error message | Description |
0 | TTS_SUCCESS | The task is successful. |
140000 | TTS_CREATE_FAILED | The error message returned because the engine failed to be initialized. |
140001 | TTS_ENGINE_INVALID | The error message returned because the engine is not initialized. |
140002 | TTS_TEXT_ERROR | The error message returned because the entered text is invalid. For example, it is left empty. |
140003 | TTS_MALLOC_FAILED | The error message returned because the memory that you have applied for failed to be allocated. |
140004 | TTS_TEXT_QUEUE_FULL | The error message returned because the task queue is full. |
140005 | TTS_ASSETPATH_INVALID | The error message returned because the specified resource path is invalid. |
140006 | TTS_HANLDE_INVALID | The error message returned because the processing thread does not exist. |
140007 | TTS_CREATE_HANLDE_FAILED | The error message returned because the processing thread failed to be created. |
140008 | TTS_AUTH_FAILED | The error message returned because the authentication failed. You cannot use the SDK before you complete the authentication. |
140009 | TTS_TEXT_QUEUE_EMPTY | The error message returned because the queue of synthesis tasks is empty. |
140010 | TTS_MODE_INVALID | The error message returned because the synthesis mode is invalid. |
140012 | TTS_OPEN_FILE_FAILED | The error message returned because the file failed to be opened. |
140013 | TTS_STATE_INVALID | The error message returned because the state of the state machine is invalid. |
140014 | TTS_SYNTHESIZER_INIT_ERROR | The error message returned because the synthesizer failed to be initialized. |
140015 | TTS_SYNTHESIZER_RELEASE_ERROR | The error message returned because the synthesizer failed to be released. |
140016 | TTS_SYNTHESIZER_FAILED | The error message returned because the speech synthesis failed. |
140017 | TTS_WAIT_TIMEOUT | The error message returned because the request timed out. |
140018 | TTS_CLOSED | The error message returned because the code used for speech synthesis is not provided. |
140100 | TTS_PARAM_INVALID | The error message returned because a specified parameter is invalid. |
140101 | TTS_PARAM_VALUE_INVALID | The error message returned because a specified parameter value is invalid. |
140102 | TTS_CFG_OPEN_FAILED | The error message returned because the configuration file failed to be opened. |
140103 | TTS_CFG_WRONG_FORMAT | The error message returned because the configuration file is in an invalid format. |
140150 | TTS_LOG_OPEN_FAILED | The error message returned because the log file failed to be created. |
140200 | TTS_AM_CREATE_FAILED | The error message returned because the player failed to be created. |
140201 | TTS_AM_OPEN_FAILED | The error message returned because the player failed to be opened. |
140210 | TTS_DECODER_INIT_FAILED | The error message returned because the audio decoder failed to be initialized. |
140211 | TTS_DECODER_MALLOC_FAILED | The error message returned because the memory that you have applied for failed to be allocated to the audio decoder. |
140212 | TTS_DECODER_INPUT_TOO_MANY | The error message returned because the size of the text entered for a single time exceeds the upper limit. The excessive data is deleted. |
140213 | TTS_DECODER_OUTPUT_TOO_MANY | The error message returned because the size of the generated speech data exceeds the cache size. The excessive data is deleted. |
140220 | TTS_AP_INIT_FAILED | The error message returned because the audio processing unit failed to be opened. |
140221 | TTS_AP_START_FAILED | The error message returned because the audio processing unit failed to be started. |
140222 | TTS_AP_MALLOC_FAILED | The error message returned because the memory that you have applied for failed to be allocated to the audio processing unit. |
140230 | TTS_BGM_START_FAILED | The error message returned because the background music (BGM) failed to be played. |
140231 | TTS_BGM_DECODE_INVALID | The error message returned because the BGM decoder failed to be initialized. |
140232 | TTS_BGM_ADD_FAILED | The error message returned because the BGM failed to be added to the sentence. |
140233 | TTS_BGM_MALLOC_FAILED | The error message returned because the memory that you have applied for failed to be allocated to the BGM. |
140234 | TTS_BGM_OPEN_FILE_FAILED | The error message returned because the BGM file failed to be opened. |
140235 | TTS_BGM_FILE_FORMAT_ERROR | The error message returned because the BGM file is in an invalid format. |
140300 | TTS_CACHE_INIT_FAILED | The error message returned because the cache failed to be initialized. |
140301 | TTS_CACHE_MGR_INVALID | The error message returned because the cache manager is not initialized. |
140302 | TTS_CACHE_CMD_ERROR | The error message returned because the issued cache instruction is invalid. |
140303 | TTS_CACHE_CALLBACK_INVALID | The error message returned because the callback method is not initialized. |
140304 | TTS_CACHE_START_READ_FAILED | The error message returned because the cache file failed to be opened. |
140305 | TTS_CACHE_READ_FAILED | The error message returned because the cached data failed to be read. |
140306 | TTS_CACHE_MALLOC_FAILED | The error message returned because the memory that you have applied for failed to be allocated to the cache. |
140307 | TTS_CACHE_DELETE_FAILED | The error message returned because the cache file failed to be deleted. |
140308 | TTS_CACHE_PATH_INVALID | The error message returned because the directory for storing cache files failed to be created. |
140309 | TTS_CACHE_LIST_CREATE_FAILED | The error message returned because the cache file list failed to be created. |
140310 | TTS_CACHE_FAILED | The error message returned because the caching failed. |
140311 | TTS_CACHE_TOO_MANY | The error message returned because the size of cached data exceeds the upper limit. |
140312 | TTS_CACHE_PARAM_INVALID | The error message returned because a specified cache parameter is invalid. |
140313 | TTS_CACHE_RECORDING_OPEN_FAILED | The error message returned because the local file failed to be opened. |
140350 | TTS_FONT_INIT_FAILED | The error message returned because the font manager failed to be initialized. |
140351 | TTS_FONT_INITLIST_FAILED | The error message returned because the fontlist manager failed to be initialized. |
140352 | TTS_FONT_INITLIST_INVALID | The error message returned because the fontlist manager is not initialized. |
140353 | TTS_FONT_CMD_INVALID | The error message returned because the instruction is in an invalid format. |
140354 | TTS_FONT_RESPONSE_ERROR | The error message returned because the response from the server is in an invalid format. |
140355 | TTS_FONT_RESPONSELIST_ERROR | The error message returned because the response to a fontlist request is in an invalid format. |
140356 | TTS_FONT_GET_FONTLIST_FAILED | The error message returned because the fontlist failed to be queried. |
140357 | TTS_FONT_REQUEST_CMD_ERROR | The error message returned because the instruction used to create a request is invalid. |
140358 | TTS_FONT_LOCALMSG_ERROR | The error message returned because the local list file failed to be parsed. |
140359 | TTS_FONT_LOCALFILE_ERROR | The error message returned because the local list file failed to be saved. |
140360 | TTS_FONT_CLOUDMSG_ERROR | The error message returned because the list on the server failed to be parsed. |
140900 | TTS_LOCAL_CRE_ENGINE_ERROR | The error message returned because the client engine failed to be initialized. |
140901 | TTS_LOCAL_ENGINE_INVALID | The error message returned because the client engine is not initialized. |
140902 | TTS_LOCAL_ASSET_ERROR | The error message returned because the local resource verification failed. |
140903 | TTS_LOCAL_CRE_TASK_ERROR | The error message returned because the synthesis task failed to be created on the client. |
140904 | TTS_LOCAL_TASK_INVALID | The error message returned because the synthesis task created on the client is invalid. |
140905 | TTS_LOCAL_START_FAILED | The error message returned because the synthesis task created on the client failed to be started. |
141000 | TTS_CLOUD_CREATE_FAILED | The error message returned because the server engine failed to be initialized. |
141001 | TTS_CLOUD_ENGINE_INVALID | The error message returned because the server engine is not initialized. |
141002 | TTS_CLOUD_TASK_FAILED | The error message returned because the synthesis task failed to be created on the server. |
141003 | TTS_CLOUD_TASK_INVALID | The error message returned because the synthesis task created on the server is invalid. |
141004 | TTS_CLOUD_START_FAILED | The error message returned because the synthesis task created on the server failed to be started. |
141005 | TTS_CLOUD_CANCEL_FAILED | The error message returned because the synthesis task created on the server failed to be canceled. |
141006 | TTS_CLOUD_NETWORK_BROKEN | The error message returned because the network connection is unstable. |
144001 | TTS_CLOUD_AUTH_FAILED | The error message returned because the authentication failed. |
144002 | TTS_CLOUD_INVALID_MESSAGE | The error message returned because the returned message is invalid. |
144003 | TTS_CLOUD_INVALID_TOKEN | The error message returned because the token expires or is invalid. |
144004 | TTS_CLOUD_WAIT_TIMEOUT | The error message returned because the idle connection timed out. |
144005 | TTS_CLOUD_EXCEED_CONCURRENCY | The error message returned because the number of requests exceeds the upper limit. |
144100 | TTS_CLOUD_INVALID_INTERFACE | The error message returned because the method is not supported. |
144101 | TTS_CLOUD_UNSUPPORTED_ORDER | The error message returned because the instruction is not supported. |
144102 | TTS_CLOUD_INVALID_ORDER | The error message returned because the instruction is invalid. |
144103 | TTS_CLOUD_CLIENT_DISCONNECT | The error message returned because the client is disconnected. |
144200 | TTS_CLOUD_INVALID_APPKEY | The error message returned because the specified appkey is invalid. |
144300 | TTS_CLOUD_INVALID_PARAM | The error message returned because a specified cloud service parameter is invalid. |
144400 | TTS_CLOUD_SERVER_ERROR | The error message returned because a server error has occurred. |