All Products
Search
Document Center

Platform For AI:Machine reading comprehension predict

Last Updated:Feb 22, 2024

The machine reading comprehension predict component of Platform for AI (PAI) allows you to make batch predictions by using the models trained by the machine reading comprehension training component. This topic describes how to configure the component and provides an example on how to use the component.

Limits

You can use the machine reading comprehension predict component based on Deep Learning Containers (DLC) resources.

Configure the component in Machine Learning Designer

  • Input ports

    Input port

    Data type

    Recommended upstream component

    Required

    input saved model

    OSS

    machine reading comprehension training

    Yes

    Data for Prediction

    OSS

    Read File Data

    Yes

  • Component parameters

    Configure the component on the pipeline page of Machine Learning Designer. The following table describes the parameters.

    Tab

    Parameter

    Description

    Fields Setting

    Language

    The language of the input file. Default value: zh. Valid values:

    • zh

    • en

    Input Schema

    The data schema of each column in the input file. Separate multiple columns with commas (,). Default value: qas_id:str:1,context_text:str:1,question_text:str:1,answer_text:str:1,start_position_character:str:1,title:str:1.

    Question Column

    The name of the column that contains questions in the input file. Default value: question_text.

    Context Column

    The name of the column that contains text passages in the input file. Default value: context_text.

    Answer Column

    The name of the column that contains answers in the input file. Default value: answer_text.

    Id Column

    The name of the ID column in the input file. Default value: qas_id.

    Start Position Column

    The name of the column that contains the starting positions of answer spans in the input file. If the answer to a question can be found in the text passage, the starting position of the answer span is recorded in this column. Default value: start_position_character.

    Output data file

    The path of the Object Storage Service (OSS) bucket that stores the answer file used by this component.

    Use User-defined Model

    Specifies whether to use a custom model. Default value: no. Valid values:

    • no

    • yes

    OSS Directory for Alink Model

    This parameter is required only if you set the Use User-defined Model parameter to yes.

    The path of the OSS bucket that stores the custom model.

    Parameters Setting

    batchSize

    The number of samples that you want to tprocess at the same time. If the model is trained on multiple servers that use multiple GPUs, this parameter specifies the number of samples that are processed by each GPU at the same time. The value must be of the INT type. Default value: 256.

    Sequence Length

    The maximum length of a text passage that can be handled. The value must be of the INT type. Default value: 384.

    Max Query Length

    The maximum length of a question that can be handled. The value must be of the INT type. Default value: 64.

    Max Answer Length

    The maximum length of an answer that can be handled. The value must be of the INT type. Default value: 30.

    Doc Stride

    The length of a sliding window for each sliced text passage. The value must be of the INT type. Default value: 128.

    pretrainModelNameOrPath

    The name or path of the pre-trained model provided by the system. Default value: hfl/macbert-base-zh. Valid values:

    • User Defined

    • hfl/macbert-base-zh

    • hfl/macbert-large-zh

    • bert-base-uncased

    • bert-large-uncased

    Additional Parameters

    The custom parameters. You can fine-tune the model parameters based on your data.

    Format: {A: xxx, B: xxx}. Separate keys and values with colons (:). Separate multiple parameters with commas (,).

    Tuning

    GPU Machine type

    The instance type of the GPU-accelerated node that you want to use. The default value is gn5-c8g1.2xlarge, which indicates that the node uses 8 vCPUs, 80 GB memory, and a single P100 GPU.

    num_GPU_worker

    The number of GPUs for each worker. Default value: 1.

Example

The following figure shows a sample pipeline in which the machine reading comprehension predict component is used.image

Perform the following steps to configure the component:

  1. Prepare a prediction dataset and upload the dataset to an OSS bucket. For more information, see the "Upload an object" section in the Get started by using the OSS console topic.

    A dataset can be in the TSV or TEXT format and contains the following columns: ID column, text column, question column, answer column (optional), start position column (optional), and title column (optional).

    In this example, a TSV file is used to show how to train a model.

  2. Use the Read File Data -3 component to read the prediction dataset. Set the OSS Data Path parameter of the Read File Data component to the OSS path in which the prediction dataset is stored.

  3. Connect the Read File Data-3 component to the machine reading comprehension predict component as an upstream node and configure the machine reading comprehension predict component. For more information, see the "Component parameters" section of this topic.

References