All Products
Search
Document Center

Intelligent Speech Interaction:Overview

Last Updated:Feb 28, 2023

Alibaba Cloud uses a wealth of corpus data to train speech recognition models for scenarios including general, education, justice, and healthcare. Based on corpus training, Alibaba Cloud provides high-accuracy models for these scenarios. If you do not find a model suitable for your speech recognition scenario from the standard models provided by Alibaba Cloud, or you need to further optimize the standard models, you can use the self-learning platform.

By using the self-learning platform, you can upload a .txt training corpus file on the console interface to train a basic linguistic model that you select for your scenario. This way, you can effectively improve the speech recognition accuracy of words, especially proper nouns and high-frequency words, in your scenario.

Comparison between two methods used to create a custom linguistic model

If you use the Intelligent Speech Interaction console, you can click Switch Scene under your project, select a scenario, and then add a custom linguistic model. After the custom linguistic model is published, it is automatically associated with the appkey of your project. You do not need to specify a custom linguistic model in your code.

If you use the Alibaba Cloud pctowap open platform (POP) API to create a custom linguistic model, you must call the corresponding SDK method to specify the ID of the model in your code before the model takes effect.

Notes on training corpora

Call limits

  • The training corpus data must be relevant to your specific business field. The more similar the pronunciation of the corpus data and the speech data to be recognized, the higher the accuracy of speech recognition.

  • The training corpus file that you upload must be a .txt file encoded in UTF-8 without the byte order mark (BOM). The maximum size of each training corpus file is 10 MB.

  • Each sentence or each keyword to be tuned occupies a line. Each line can be up to 500 characters in length.

  • You must spell out the numerals in the .txt training corpus file. For example, write 58.9 dollars as fifty-eight point nine dollars.

  • Each .txt training corpus file must contain at least one sentence that contains more than 4 words.

  • Special characters are not allowed, except the commas (,), periods (.), question marks (?), and exclamation points (!). Punctuation marks must be added at the end of each sentence.

Optimization suggestions

You can copy the keywords that are difficult to recognize or sentences that contain such keywords for a few lines, for example, 10 lines. Make sure that each keyword occupies a line in the training corpus. If the recognition results are still not satisfactory, you can repeat the keywords or sentences more times as needed.

Important
  • If the recognition results are not as expected, you must first check whether the issue is caused by unclear pronunciation or poor audio quality. If the issue persists after you optimize the pronunciation or audio quality, we recommend that you modify the training corpus.

  • We recommend that you determine the final training corpus after you test the performance of your custom linguistic model to avoid recognition defects for speech data that contains homophones.

Example

Download the sample training corpus. The following training corpus is the introduction of Alibaba Group:

In September 1999, eighteen founders with Jack Ma as the leader founded Alibaba Group in an apartment in Hangzhou. The first website of Alibaba Group was Alibaba.com, an English website that focused on the global wholesale trade market.
In the same year, Alibaba Group launched a Chinese website that focused on the wholesale trade market in China.
In October 1999, Alibaba Group raised the funds of USD 5 million from multiple investment agencies.
In October 1999, Alibaba Group raised the funds of USD 5 million from multiple investment agencies.
In January 2000, Alibaba Group raised the funds of USD 20 million from multiple investment agencies including SoftBank.
In January 2000, Alibaba Group raised the funds of USD 20 million from multiple investment agencies including SoftBank.
In September 2000, Alibaba Group held the first West Lake Cybersecurity Conference. Commercial and opinion leaders of the Internet industry came together and discussed major issues of the industry.

In the training corpus, you can repeat sentences that contain business keywords such as "funds" and "Internet" for a few times.

To use the training corpus, perform the following steps:

  1. Select a basic model: In this example, select the general model. You can select a model based on your business scenario.

  2. Collect the training corpus: Save the downloaded training corpus as a .txt file. If you customize a training corpus, split the training corpus into separate sentences based on punctuation marks. Write each sentence in a separate line in the training corpus.

  3. Train and apply the selected model: Upload the training corpus and train the selected model by using the self-learning platform. The trained model can effectively recognize the vocabulary in the training corpus to produce the desired recognition results.