All Products
Document Center

:Before you begin

Last Updated:Nov 26, 2020

If you want to use Intelligent Speech Interaction, you can read the Quick Start documentation to help you get started with Intelligent Speech Interaction. Then, we recommend that you read the following topics in sequence to obtain up-to-date information about Intelligent Speech Interaction.




Introduces the terms and concepts related to Intelligent Speech Interaction.

Manage projects

Describes how to create projects and set project parameters in the Intelligent Speech Interaction console.

Obtain an access token

Describes how to obtain an access token. You must obtain an access token before you call Intelligent Speech Interaction services.

Call Intelligent Speech Interaction services

Use customization tools for speech recognition

Describes how to use customization tools to improve the effectiveness of speech recognition.

Differences among various Intelligent Speech Interaction services


Real-time performance



Audio coding format

Call method

Free quota


Short sentence recognition

Real-time recognition.

Recognizes short speech that lasts for 1 minute or less.

Scenarios such as voice search in apps, customer service hotlines, chat conversations, and voice command control

Pulse-code modulation (PCM) for uncompressed PCM or WAV files and Opus


A maximum of two concurrent call requests

Separate resource package

Real-time speech recognition

Real-time recognition.

Recognizes speech data streams that last for a long period of time.

Uninterrupted speech recognition scenarios such as conference speeches and live streaming

PCM for uncompressed PCM or WAV files


A maximum of two concurrent call requests

Separate resource package

Speech synthesis

Real-time synthesis.

Converts text that contains a maximum of 300 UTF-8 encoded characters to speech.

Scenarios that require text-to-speech synthesis

PCM, WAV, and MP3


A maximum of two concurrent call requests

Separate resource package

Recording file recognition

Non-real-time recognition. After a free trial user sends a recognition request for a recording file, the recognition server recognizes the file and returns the result within 24 hours. For a paying user, the recognition result is returned within 6 hours.


This is not true if the recording files that are uploaded within 30 minutes are more than 500 hours in length. If you need to convert such data, contact the pre-sales service.

Recognizes a recording file that has a maximum size of 512 MB.

Scenarios that do not require real-time recognition

Single-track and dual-track WAV and MP3


Call requests for recognizing recording files that are up to 2 hours in length for each calendar day

Separate resource package

Long text speech synthesis

Non-real-time synthesis.

Converts text data that contains thousands or tens of thousands of characters to binary audio data.

Scenarios such as reading novels and articles

PCM, WAV, and MP3


No trial edition available

Separate resource package


  • Except for the recording file recognition service, other speech interaction services of Intelligent Speech Interaction support only mono speech data.

  • Intelligent Speech Interaction supports only 16-bit audio files that are sampled at 8 kHz or 16 kHz.