All Products
Search
Document Center

Introduction

Last Updated: Sep 29, 2019

Custom models

Alibaba Cloud Intelligent Speech Interaction provides basic speech recognition models, including universal models and models specific to certain fields, for example, the 8 kHz telephone customer service and quality inspection model and the e-commerce model. If you have accumulated a large amount of historical data in your field, you can use the data to optimize your models.

On the speech recognition customization platform, you can create a model based on a basic model and upload a training corpus. After training the model based on its training corpus, you can effectively improve the speech recognition rate in specific scenarios. This method is especially suitable to optimize the recognition effect of proper nouns and high-frequency words.

Differences between the console and POP API operations for configuring custom models

You can train and manage custom models in the console. If you want to use a custom model for a project in the console, go to the Project Settings page of the project, click the Speech Recognition tab, click Switch Model in the Models section, select the specified custom model, and then click Publish. You do not need to specify the custom model in the code because the model is bound to the appkey of the project.

If you create a custom model by using the corresponding pctowap open platform (POP) API operation, you need to call the corresponding SDK method to specify the ID of the model in the client code for the model to take effect.

Training corpus

Restrictions

  1. The training corpus must be relevant to the specific business field. A higher speech recognition rate can be achieved if the training corpus is similar to the object to be recognized.
  2. The training corpus file must be in the TXT format. It must be encoded in UTF-8 without the byte order mark (BOM). The size of the file cannot exceed 10 MB.
  3. Each sentence or each keyword to be tuned occupies a line. Each line can be up to 500 characters in length.
  4. You need to spell out the numbers in the training corpus. For example, you need to convert 58.9 to fifty-eight point nine.
  5. The training corpus must contain at least one sentence containing more than 4 words.
  6. Special characters are not allowed, except the commas (,), periods (.), question marks (?), and exclamation points (!). Punctuation marks must be added in the end of a sentence.

Optimization suggestions

You can copy the keywords that are hard to recognize or sentences that include the keywords for a few lines, for example, 10 lines. Each keyword occupies a line in the training corpus. If the recognition effect is not optimized, you can copy the keywords or sentences for more lines.

Note:

  • You need to check whether the poor recognition effect lies in the poor quality of the speech.
  • Excessive lines of the same keyword or sentence may lower the recognition rate of other words or the whole speech. Finding a balance requires more practices in real business operations.

Application example

The following training corpus is the introduction of Alibaba Group:

  1. In September 1999, eighteen founders with Jack Ma as the leader founded Alibaba Group in an apartment in Hangzhou. The first website of Alibaba Group was Alibaba.com, an English website that focused on the global wholesale trade market.
  2. In the same year, Alibaba Group launched a Chinese website that focused on the wholesale trade market in China.
  3. In October 1999, Alibaba Group raised the funds of USD 5 million from multiple investment agencies.
  4. In October 1999, Alibaba Group raised the funds of USD 5 million from multiple investment agencies.
  5. In January 2000, Alibaba Group raised the funds of USD 20 million from multiple investment agencies including SoftBank.
  6. In January 2000, Alibaba Group raised the funds of USD 20 million from multiple investment agencies including SoftBank.
  7. In September 2000, Alibaba Group held the first West Lake Cybersecurity Conference. Commercial and opinion leaders of the Internet industry came together and discussed major issues of the industry.

In the training corpus, sentences that contain business keywords such as “funds” and “Internet” can be repeated for a few times.Click here to download the training corpus file.

Basic training process:

  1. Select a basic model: Use a universal model based on the specific scenario.
  2. Collect the training corpus: Save the preceding training corpus file. If you customize the training corpus, you can trim the corpus based on the punctuation marks and save each sentence as a line in the training corpus.
  3. Train the model: Upload the training corpus and train the model with the training corpus on the customization platform. Then, you can use the model to recognize words in the corpus with a high recognition rate.