A Create Dataset for EasyASR Models component converts audio data in the WAV format and text data into TFRecord files. You can then use the TFRecord files as pre-processed data to train or evaluate Automatic Speech Recognition (ASR) and speech classification models. This topic describes how to set parameters for a Create Dataset for EasyASR Models component and provides an example on how to use a Create Dataset for EasyASR Models component.

Prerequisites

OSS is activated, and Machine Learning Studio is authorized to access OSS. For more information, see Activate OSS and Grant PAI the permissions to access OSS.

Limits

Only Machine Learning Studio 2.0 provides the algorithm component.

Background information

A Create Dataset for EasyASR Models component converts audio data in the WAV format and text data that contains labeling results, into TFRecord files and stores the TFRecord files in an Object Storage Service (OSS) bucket. This component can be used to prepare data for training or evaluating ASR and speech classification models.

You can find the Create Dataset for EasyASR Models component in the Data Preprocessing subfolder in the Audio Algorithm folder of the component library.

Configure the component in thePAI console

  • Input port

    The input port of a Create Dataset for EasyASR Models component must be connected to a Read File Data component. You must set the OSS Data Path parameter of the Read File Data component to the OSS path of the source CSV file.

  • Component parameters
    Tab Parameter Required Description Default
    Parameter Settings Output Path Yes The OSS path of the output TFRecord files. Example: oss://my_bucket/output/. N/A
    Model Tuning Running Mode No
    The computing engine that is used to run the algorithm component. You can select a computing engine based on your business requirements. The following computing engines are supported:
    MaxCompute
    Number of Workers No The number of workers that are used for data conversion. 1
    CPU Machine Type No The type of the computing instance. This parameter is required only if you set the Running Mode parameter to DLC. N/A
  • Output port

    You can connect the output port of a Create Dataset for EasyASR Models component to an ASR Model Training or EasyASR Speech Classification Training component.

Example

  1. Prepare a CSV file that contains audio data and text data.

    The audio file used to train an ASR or speech classification model must be split in advance. We recommend that you split the audio file into segments of about 10 to 12 seconds in length and store the processed audio data in an OSS bucket. The audio file must contain mono audio with a sampling rate of 16,000 Hz. In this example, a speech recognition model is trained. The paths of the audio segments and the labeling results are stored in a CSV file. Each path and the text are separated with a comma (,). In the CSV file, the header line indicates the column names. You can enter wav_filename,transcript as the header line. This way, the first column stores the OSS paths of WAV files, and the second column stores the labeling results. The words of the text are separated with spaces, and all punctuations are replaced with semicolons (;). If a word does not exist in the vocabulary, this word must be replaced with an asterisk (*).

    You can download the vocabulary based on the model that you use. For more information, see Use EasyASR for speech recognition.

  2. Build an experiment shown in the following figure. ExampleYou must set the OSS Data Path parameter of the Read File Data component to the OSS path of the CSV file. For more information about how to set other parameters, see Component parameters in this topic.
  3. View the output TFRecord files in the specified OSS path.
    After the experiment is run, you can view the output TFRecord files in the OSS path that you specify for the Output Path parameter of the Create Dataset for EasyASR Models component. The following figure shows sample output results. Output TFRecod files