All Products
Search
Document Center

Before you begin

Last Updated: Oct 10, 2019

If you are using Intelligent Speech Interaction for the first time, you can read Quick Start first to prepare yourself and experience the use of relevant services.

If you have completed the quick start procedures, we recommend that you read the following topics in sequence to know more about Intelligent Speech Interaction.

Topic Description
Concepts Introduces the terms and concepts related to Intelligent Speech Interaction.
Console User Guide: Manage projects Demonstrates how to create your own speech recognition project and set project parameters in the console.
Obtain a token Describes how to obtain an access token. You must obtain an access token before calling Intelligent Speech Interaction services. Pay attention to the validity period of the access token when using it.
Call Intelligent Speech Interaction services Short sentence recognition Real-time speech recognition Speech synthesis Recording file recognition
Use the speech recognition customization platform Introduces the customization platform that you can use to optimize speech recognition effects.

Differences among various Intelligent Speech Interaction services

Service Timeliness Feature Scenario Audio coding format Call method Free quota Purchase
Short sentence recognition Real-time recognition Recognizes short speech that lasts within 1 minute. Scenarios such as voice search in apps, customer service hotlines, chat conversations, and voice command control Pulse-code modulation (PCM) (uncompressed PCM or WAV files) and Opus Java SDK, C++ SDK, Android SDK, and iOS SDK A maximum of two concurrent call requests Separate resource package
Real-time speech recognition Real-time recognition Recognizes speech data streams that last for a long time. Uninterrupted speech recognition scenarios such as conference speeches and live streaming PCM (uncompressed PCM or WAV files) Java SDK, C++ SDK, Android SDK, and iOS SDK A maximum of two concurrent call requests Separate resource package
Speech synthesis Real-time synthesis Converts text that contains a maximum of 300 UTF-8 encoded characters to speech. Scenarios that require text-to-speech PCM, WAV, and MP3 Java SDK, C++ SDK, Android SDK, and iOS SDK A maximum of two concurrent call requests Separate resource package
Recording file recognition Recognition within 24 hours Recognizes a recording file that has a maximum size of 512 MB. Scenarios that do not require real-time recognition Single-track and dual-track WAV and MP3 Java SDK, C++ SDK, Go SDK, .NET SDK, Node.js SDK, PHP SDK, and Python SDK Call requests for recognizing recording files with a maximum duration of 2 hours per calendar day Separate resource package

Note:

  • Except for the recording file recognition service, other recognition services of Intelligent Speech Interaction support only mono speech data.
  • Intelligent Speech Interaction only supports 16-bit audio files sampled at 8 kHz or 16 kHz.