All Products
Search
Document Center

Intelligent Speech Interaction:Overview

更新时间:Nov 09, 2023

The speech synthesis service provides the Natural User Interaction (NUI) SDK for mobile clients to convert text to binary speech data.

Description

Compared with common SDKs, the NUI SDK is smaller in size and supports more comprehensive status management. The NUI SDK provides comprehensive speech processing capabilities and can also serve as an atomic SDK, meeting diverse user requirements. In addition, the NUI SDK uses a unified API.

The NUI SDK has the following features:

  • Supports generating audio files in the pulse-code modulation (PCM) and MP3 formats.

  • Allows you to set the speed, intonation, and volume of the generated speeches.

  • Allows you to set the speaker type of the generated speeches. The following table describes the supported speaker type.

Name

Value of the voice parameter

Category

Scenario

Supported language

Supported sampling rate (Hz)

Support phoneme boundary detection for each word

Remarks

Xiaoyun

Xiaoyun

Standard female voice

Common scenario

Chinese and mixed Chinese and English

8,000 and 16,000

No

N/A

Xiaogang

Xiaogang

Standard male voice

Common scenario

Chinese and mixed Chinese and English

8,000 and 16,000

No

N/A

Ruoxi

Ruoxi

Gentle female voice

Common scenario

Chinese and mixed Chinese and English

8,000, 16,000, and 24,000

No

N/A

Siqi

Siqi

Gentle female voice

Common scenario

Chinese and mixed Chinese and English

8,000, 16,000, and 24,000

Yes

N/A

Sijia

Sijia

Standard female voice

Common scenario

Chinese and mixed Chinese and English

8,000, 16,000, and 24,000

No

N/A

Sicheng

Sicheng

Standard male voice

Common scenario

Chinese and mixed Chinese and English

8,000, 16,000, and 24,000

Yes

N/A

Aiqi

Aiqi

Gentle female voice

Common scenario

Chinese and mixed Chinese and English

8,000 and 16,000

Yes

N/A

Aijia

Aijia

Standard female voice

Common scenario

Chinese and mixed Chinese and English

8,000 and 16,000

Yes

N/A

Aicheng

Aicheng

Standard male voice

Common scenario

Chinese and mixed Chinese and English

8,000 and 16,000

Yes

N/A

Aida

Aida

Standard male voice

Common scenario

Chinese and mixed Chinese and English

8,000 and 16,000

Yes

N/A

Ninger

Ninger

Standard female voice

Common scenario

Simplified Chinese

8,000, 16,000, and 24,000

No

N/A

Ruilin

Ruilin

Standard female voice

Common scenario

Simplified Chinese

8,000, 16,000, and 24,000

No

N/A

Siyue

Siyue

Gentle female voice

Customer service scenario

Chinese and mixed Chinese and English

8,000, 16,000, and 24,000

No

N/A

Aiya

Aiya

Strict female voice

Customer service scenario

Chinese and mixed Chinese and English

8,000 and 16,000

Yes

N/A

Aixia

Aixia

Amiable female voice

Customer service scenario

Chinese and mixed Chinese and English

8,000 and 16,000

Yes

N/A

Aimei

Aimei

Sweet female voice

Customer service scenario

Chinese and mixed Chinese and English

8,000 and 16,000

Yes

N/A

Aiyu

Aiyu

Natural female voice

Customer service scenario

Chinese and mixed Chinese and English

8,000 and 16,000

Yes

N/A

Aiyue

Aiyue

Gentle female voice

Customer service scenario

Chinese and mixed Chinese and English

8,000 and 16,000

Yes

N/A

Aijing

Aijing

Strict female voice

Customer service scenario

Chinese and mixed Chinese and English

8,000 and 16,000

Yes

N/A

Xiaomei

Xiaomei

Sweet female voice

Customer service scenario

Chinese and mixed Chinese and English

8,000, 16,000, and 24,000

No

N/A

Aina

Aina

Female voice with Zhejiang accent

Customer service scenario

Simplified Chinese

8,000 and 16,000

Yes

N/A

Yina

Yina

Female voice with Zhejiang accent

Customer service scenario

Simplified Chinese

8,000, 16,000, and 24,000

No

N/A

Sijing

Sijing

Strict female voice

Customer service scenario

Simplified Chinese

8,000, 16,000, and 24,000

Yes

N/A

Sitong

Sitong

Child voice

Child voice scenario

Simplified Chinese

8,000, 16,000, and 24,000

No

N/A

Xiaobei

Xiaobei

Lolita female voice

Child voice scenario

Simplified Chinese

8,000, 16,000, and 24,000

Yes

N/A

Aitong

Aitong

Child voice

Child voice scenario

Simplified Chinese

8,000 and 16,000

Yes

N/A

Aiwei

Aiwei

Lolita female voice

Child voice scenario

Simplified Chinese

8,000 and 16,000

Yes

N/A

Aibao

Aibao

Lolita female voice

Child voice scenario

Simplified Chinese

8,000 and 16,000

Yes

N/A

Harry

Harry

Male voice with British accent

English scenario

English

8,000 and 16,000

No

N/A

Abby

Abby

Female voice with American accent

English scenario

English

8,000 and 16,000

No

N/A

Andy

Andy

Male voice with American accent

English scenario

English

8,000 and 16,000

No

N/A

Eric

Eric

Male voice with British accent

English scenario

English

8,000 and 16,000

No

N/A

Emily

Emily

Female voice with British accent

English scenario

English

8,000 and 16,000

No

N/A

Luna

Luna

Female voice with British accent

English scenario

English

8,000 and 16,000

No

N/A

Luca

Luca

Male voice with British accent

English scenario

English

8,000 and 16,000

No

N/A

Wendy

Wendy

Female voice with British accent

English scenario

English

8,000, 16,000, and 24,000

No

N/A

William

William

Male voice with British accent

English scenario

English

8,000, 16,000, and 24,000

No

N/A

Olivia

Olivia

Female voice with British accent

English scenario

English

8,000, 16,000, and 24,000

No

N/A

Shanshan

Shanshan

Cantonese female voice

Dialect scenario

Cantonese (simplified) and mixed Cantonese and English

8,000, 16,000, and 24,000

No

N/A

Xiaoyue

Xiaoyue

Female voice with Sichuan accent

Dialect scenario

Chinese and mixed Chinese and English

8,000 and 16,000

No

Available in public preview of Intelligent Speech Interaction

Lydia

Lydia

Female voice of mixed Chinese and English

English scenario

English

8,000 and 16,000

No

Available in public preview of Intelligent Speech Interaction

Aishuo

Aishuo

Natural male voice

Customer service scenario

English

8,000 and 16,000

Yes

Available in public preview of Intelligent Speech Interaction

Qingqing

Qingqing

Female voice with Formosan accent

Dialect scenario

Simplified Chinese

8,000 and 16,000

No

Available in public preview of Intelligent Speech Interaction

Cuijie

Cuijie

Female voice of Northeastern Mandarin

Dialect scenario

Simplified Chinese

8,000 and 16,000

No

Available in public preview of Intelligent Speech Interaction

Xiaoze

Xiaoze

Male voice with strong Hunan accent

Dialect scenario

Simplified Chinese

8,000 and 16,000

Yes

Available in public preview of Intelligent Speech Interaction

Limits

  • The entered text must be UTF-8 encoded.

  • The entered text can contain a maximum of 300 characters. If the text contains more than 300 characters, the excessive characters are deleted, and only the first 300 characters are synthesized.

Endpoints

Access type

Description

URL

External access from the Internet

This endpoint allows you to access the speech synthesis service from any host over the Internet. By default, the Internet access URL is built in the SDK.

wss://nls-gateway-ap-southeast-1.aliyuncs.com/ws/v1

Interaction process

image
Note

In addition to audio streams returned in the response, the server adds the task_id parameter to the response header for all responses to indicate the ID of the synthesis task. You can record the value of this parameter. If an error occurs, you can submit a ticket to report the task ID and error message.

1. Authenticate the client

To establish a WebSocket connection with the server, the client must use a token for authentication. For more information about how to obtain the token, see Obtain a Token.

The following table describes the parameters used for authentication and initialization.

Parameter

Type

Required

Description

workspace

String

Yes

The working directory from which the SDK reads the configuration file.

app_key

String

Yes

The appkey of your project created in the Intelligent Speech Interaction console.

token

String

Yes

The token provided as the credential for you to use Intelligent Speech Interaction. Make sure that the token is valid. You can set the token when you initialize the SDK and update the token when you set the request parameters.

device_id

String

Yes

The unique identifier of the device, for example, the media access control (MAC) address, serial number, or pseudo unique ID of the device.

2. Send a request to use the speech synthesis service

You must set the request parameters for the client to send a service request. You can set the request parameters by calling the setparamTts method in the SDK. The following table describes the request parameters.

Parameter

Type

Required

Description

appkey

String

Yes

The appkey of your project created in the Intelligent Speech Interaction console.

token

String

No

The token provided as the credential for you to use Intelligent Speech Interaction. You can update the token as required by setting this parameter.

direct_host

String

No

The IP address that is resolved from the Domain Name System (DNS) domain name. The client completes the resolution and uses the obtained IP address to access the service.

font_name

String

No

The speaker type. Default value: xiaoyun.

encode_type

String

No

The audio encoding format. Default value: PCM. Valid values: PCM, WAV, and MP3.

sample_rate

String

No

The audio sampling rate. Unit: Hz. Default value: 16000.

volume

String

No

The volume of the speaker. Valid values: 0 to 2. Default value: 1.0.

speed_level

String

No

The speed of the speaker. Valid values: 0.5 to 2. Default value: 1.0. A greater value indicates a higher speed.

pitch_level

String

No

The intonation of the speaker. Valid values: -500 to 500. Default value: 0. A greater value indicates a sharper voice.

3. Receive the synthesized speech data

The server returns the synthesized speech data in the binary format, and the SDK receives and processes the binary data.

4. Complete the synthesis task

After the synthesis task is completed, the server sends a notification message.

Error codes

If an error occurs during speech synthesis, the SDK reports a TTS_EVENT_ERROR event to the server and returns an error message to the client. The following table describes the error messages that may be returned.

Error code

Error message

Description

0

TTS_SUCCESS

The task is successful.

140000

TTS_CREATE_FAILED

The error message returned because the engine failed to be initialized.

140001

TTS_ENGINE_INVALID

The error message returned because the engine is not initialized.

140002

TTS_TEXT_ERROR

The error message returned because the entered text is invalid. For example, it is left empty.

140003

TTS_MALLOC_FAILED

The error message returned because the memory that you have applied for failed to be allocated.

140004

TTS_TEXT_QUEUE_FULL

The error message returned because the task queue is full.

140005

TTS_ASSETPATH_INVALID

The error message returned because the specified resource path is invalid.

140006

TTS_HANLDE_INVALID

The error message returned because the processing thread does not exist.

140007

TTS_CREATE_HANLDE_FAILED

The error message returned because the processing thread failed to be created.

140008

TTS_AUTH_FAILED

The error message returned because the authentication failed. You cannot use the SDK before you complete the authentication.

140009

TTS_TEXT_QUEUE_EMPTY

The error message returned because the queue of synthesis tasks is empty.

140010

TTS_MODE_INVALID

The error message returned because the synthesis mode is invalid.

140012

TTS_OPEN_FILE_FAILED

The error message returned because the file failed to be opened.

140013

TTS_STATE_INVALID

The error message returned because the state of the state machine is invalid.

140014

TTS_SYNTHESIZER_INIT_ERROR

The error message returned because the synthesizer failed to be initialized.

140015

TTS_SYNTHESIZER_RELEASE_ERROR

The error message returned because the synthesizer failed to be released.

140016

TTS_SYNTHESIZER_FAILED

The error message returned because the speech synthesis failed.

140017

TTS_WAIT_TIMEOUT

The error message returned because the request timed out.

140018

TTS_CLOSED

The error message returned because the code used for speech synthesis is not provided.

140100

TTS_PARAM_INVALID

The error message returned because a specified parameter is invalid.

140101

TTS_PARAM_VALUE_INVALID

The error message returned because a specified parameter value is invalid.

140102

TTS_CFG_OPEN_FAILED

The error message returned because the configuration file failed to be opened.

140103

TTS_CFG_WRONG_FORMAT

The error message returned because the configuration file is in an invalid format.

140150

TTS_LOG_OPEN_FAILED

The error message returned because the log file failed to be created.

140200

TTS_AM_CREATE_FAILED

The error message returned because the player failed to be created.

140201

TTS_AM_OPEN_FAILED

The error message returned because the player failed to be opened.

140210

TTS_DECODER_INIT_FAILED

The error message returned because the audio decoder failed to be initialized.

140211

TTS_DECODER_MALLOC_FAILED

The error message returned because the memory that you have applied for failed to be allocated to the audio decoder.

140212

TTS_DECODER_INPUT_TOO_MANY

The error message returned because the size of the text entered for a single time exceeds the upper limit. The excessive data is deleted.

140213

TTS_DECODER_OUTPUT_TOO_MANY

The error message returned because the size of the generated speech data exceeds the cache size. The excessive data is deleted.

140220

TTS_AP_INIT_FAILED

The error message returned because the audio processing unit failed to be opened.

140221

TTS_AP_START_FAILED

The error message returned because the audio processing unit failed to be started.

140222

TTS_AP_MALLOC_FAILED

The error message returned because the memory that you have applied for failed to be allocated to the audio processing unit.

140230

TTS_BGM_START_FAILED

The error message returned because the background music (BGM) failed to be played.

140231

TTS_BGM_DECODE_INVALID

The error message returned because the BGM decoder failed to be initialized.

140232

TTS_BGM_ADD_FAILED

The error message returned because the BGM failed to be added to the sentence.

140233

TTS_BGM_MALLOC_FAILED

The error message returned because the memory that you have applied for failed to be allocated to the BGM.

140234

TTS_BGM_OPEN_FILE_FAILED

The error message returned because the BGM file failed to be opened.

140235

TTS_BGM_FILE_FORMAT_ERROR

The error message returned because the BGM file is in an invalid format.

140300

TTS_CACHE_INIT_FAILED

The error message returned because the cache failed to be initialized.

140301

TTS_CACHE_MGR_INVALID

The error message returned because the cache manager is not initialized.

140302

TTS_CACHE_CMD_ERROR

The error message returned because the issued cache instruction is invalid.

140303

TTS_CACHE_CALLBACK_INVALID

The error message returned because the callback method is not initialized.

140304

TTS_CACHE_START_READ_FAILED

The error message returned because the cache file failed to be opened.

140305

TTS_CACHE_READ_FAILED

The error message returned because the cached data failed to be read.

140306

TTS_CACHE_MALLOC_FAILED

The error message returned because the memory that you have applied for failed to be allocated to the cache.

140307

TTS_CACHE_DELETE_FAILED

The error message returned because the cache file failed to be deleted.

140308

TTS_CACHE_PATH_INVALID

The error message returned because the directory for storing cache files failed to be created.

140309

TTS_CACHE_LIST_CREATE_FAILED

The error message returned because the cache file list failed to be created.

140310

TTS_CACHE_FAILED

The error message returned because the caching failed.

140311

TTS_CACHE_TOO_MANY

The error message returned because the size of cached data exceeds the upper limit.

140312

TTS_CACHE_PARAM_INVALID

The error message returned because a specified cache parameter is invalid.

140313

TTS_CACHE_RECORDING_OPEN_FAILED

The error message returned because the local file failed to be opened.

140350

TTS_FONT_INIT_FAILED

The error message returned because the font manager failed to be initialized.

140351

TTS_FONT_INITLIST_FAILED

The error message returned because the fontlist manager failed to be initialized.

140352

TTS_FONT_INITLIST_INVALID

The error message returned because the fontlist manager is not initialized.

140353

TTS_FONT_CMD_INVALID

The error message returned because the instruction is in an invalid format.

140354

TTS_FONT_RESPONSE_ERROR

The error message returned because the response from the server is in an invalid format.

140355

TTS_FONT_RESPONSELIST_ERROR

The error message returned because the response to a fontlist request is in an invalid format.

140356

TTS_FONT_GET_FONTLIST_FAILED

The error message returned because the fontlist failed to be queried.

140357

TTS_FONT_REQUEST_CMD_ERROR

The error message returned because the instruction used to create a request is invalid.

140358

TTS_FONT_LOCALMSG_ERROR

The error message returned because the local list file failed to be parsed.

140359

TTS_FONT_LOCALFILE_ERROR

The error message returned because the local list file failed to be saved.

140360

TTS_FONT_CLOUDMSG_ERROR

The error message returned because the list on the server failed to be parsed.

140900

TTS_LOCAL_CRE_ENGINE_ERROR

The error message returned because the client engine failed to be initialized.

140901

TTS_LOCAL_ENGINE_INVALID

The error message returned because the client engine is not initialized.

140902

TTS_LOCAL_ASSET_ERROR

The error message returned because the local resource verification failed.

140903

TTS_LOCAL_CRE_TASK_ERROR

The error message returned because the synthesis task failed to be created on the client.

140904

TTS_LOCAL_TASK_INVALID

The error message returned because the synthesis task created on the client is invalid.

140905

TTS_LOCAL_START_FAILED

The error message returned because the synthesis task created on the client failed to be started.

141000

TTS_CLOUD_CREATE_FAILED

The error message returned because the server engine failed to be initialized.

141001

TTS_CLOUD_ENGINE_INVALID

The error message returned because the server engine is not initialized.

141002

TTS_CLOUD_TASK_FAILED

The error message returned because the synthesis task failed to be created on the server.

141003

TTS_CLOUD_TASK_INVALID

The error message returned because the synthesis task created on the server is invalid.

141004

TTS_CLOUD_START_FAILED

The error message returned because the synthesis task created on the server failed to be started.

141005

TTS_CLOUD_CANCEL_FAILED

The error message returned because the synthesis task created on the server failed to be canceled.

141006

TTS_CLOUD_NETWORK_BROKEN

The error message returned because the network connection is unstable.

144001

TTS_CLOUD_AUTH_FAILED

The error message returned because the authentication failed.

144002

TTS_CLOUD_INVALID_MESSAGE

The error message returned because the returned message is invalid.

144003

TTS_CLOUD_INVALID_TOKEN

The error message returned because the token expires or is invalid.

144004

TTS_CLOUD_WAIT_TIMEOUT

The error message returned because the idle connection timed out.

144005

TTS_CLOUD_EXCEED_CONCURRENCY

The error message returned because the number of requests exceeds the upper limit.

144100

TTS_CLOUD_INVALID_INTERFACE

The error message returned because the method is not supported.

144101

TTS_CLOUD_UNSUPPORTED_ORDER

The error message returned because the instruction is not supported.

144102

TTS_CLOUD_INVALID_ORDER

The error message returned because the instruction is invalid.

144103

TTS_CLOUD_CLIENT_DISCONNECT

The error message returned because the client is disconnected.

144200

TTS_CLOUD_INVALID_APPKEY

The error message returned because the specified appkey is invalid.

144300

TTS_CLOUD_INVALID_PARAM

The error message returned because a specified cloud service parameter is invalid.

144400

TTS_CLOUD_SERVER_ERROR

The error message returned because a server error has occurred.