All Products
Search
Document Center

Platform For AI:Conditional Random Field

Last Updated:Oct 26, 2023

This topic describes the Conditional Random Field component provided by Machine Learning Designer (formerly known as Machine Learning Studio).

A conditional random field (CRF) is a conditional probability distribution model of a group of output random variables based on a group of input random variables. This model presumes that the output random variables constitute a Markov random field (MRF). CRFs can be used in different prediction scenarios. The linear chain CRF is mostly used, especially in annotation scenarios. For more information, see Wikipedia.

Configure the component

You can use one of the following methods to configure the Conditional Random Field component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Conditional Random Field component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.

Tab

Parameter

Description

Fields Setting

ID Columns

The column that contains the ID of each sample. Samples are stored in n-tuples.

Feature Columns

The word to be annotated and its features if the word has features.

Target Columns

The column that you want to select.

Parameters Setting

Feature Generation Template

Default value:

[-2:0],[-1:0],[0:0],[1:0],[2:0],[-1:0]/[0:0],
[0:0]/[1:0],[-2:1],[-1:1],[0:1],[1:1],[2:1],
[-2:1]/[-1:1],[-1:1]/[0:1],[0:1]/[1:1],[1:1]/[2:1],
[-2:1]/[-1:1]/[0:1],[-1:1]/[0:1]/[1:1],
[0:1]/[1:1]/[2:1]

.

Infrequently Used Word Filtering Threshold

Default value: 1.

L1 Regularization Coefficient

Default value: 1.

L2 Regularization Coefficient

Default value: 0.

Maximum Iterations

Default value: 100.

Convergence Threshold

Default value: 0.00001.

Tuning

Cores

The number of cores. By default, the system determines the value

Memory Size per Core

The memory size of each core. By default, the system determines the value

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

PAI -name=linearcrf    
    -project=algo_public    
    -DinputTableName=crf_input_table    
    -DidColName=sentence_id    
    -DfeatureColNames=word,f1    
    -DlabelColName=label    
    -DoutputTableName=crf_model    
    -Dlifecycle=28    
    -DcoreNum=10

Parameter

Required

Description

Default value

inputTableName

Yes

The table that contains the input features.

No default value

inputTablePartitions

No

The partitions selected from the table that contains the input features.

All partitions

featureColNames

No

The feature columns selected from the input table.

All columns, excluding the label column

labelColName

Yes

The column that you want to select.

No default value

idColName

Yes

The column that contains sample labels.

No default value

outputTableName

Yes

The table that contains output models.

No default value

outputTablePartitions

No

The partitions selected from the output model table.

All partitions

template

No

The template that is used to generate features.

  • Definition

    <template .=. <template_item,<template_item,...,<template_item
    <template_item .=. [row_offset:col_index]/[row_offset:col_index]/.../[row_offset:col_index]
    row_offset .=. integer
    col_index .=. integer>
  • Default value

    [-2:0],[-1:0],[0:0],[1:0],[2:0],[-1:0]/[0:0],[0:0]/[1:0],[-2:1],[-1:1],[0:1],[1:1],[2:1],[-2:1]/[-1:1],[-1:1]/[0:1],[0:1]/[1:1],[1:1]/[2:1],[-2:1]/[-1:1]/[0:1],[-1:1]/[0:1]/[1:1],[0:1]/[1:1]/[2:1]

freq

No

The parameter for filtering features. Only feature values greater than or equal to the freq value are retained.

1

iterations

No

The maximum number of iterations of optimizations.

100

l1Weight

No

The parameter weight of L1 regularization.

1.0

l2Weight

No

The parameter weight of L2 regularization.

1.0

epsilon

No

The convergence deviation. This parameter specifies the requirement to finish the Limited-memory Broyden Fletcher Goldfarb Shanno (L-BFGS) process, which is the deviation between the log-likelihood values in two iterations.

0.0001

lbfgsStep

No

The historical step size for optimization that is performed by using the L-BFGS algorithm. Only the L-BFGS algorithm supports this parameter.

10

threadNum

No

The number of parallel threads used for model training.

3

lifecycle

No

The lifecycle of the output table.

No default value

coreNum

No

The number of cores.

Determined by the system

memSizePerCore

No

The memory size of each core.

Determined by the system

Example

  • Input data

    sentence_id

    word

    f1

    label

    1

    Rockwell

    NNP

    B-NP

    1

    International

    NNP

    I-NP

    1

    Corp

    NNP

    I-NP

    1

    's

    POS

    B-NP

    ...

    ...

    ...

    ...

    823

    Ohio

    NNP

    B-NP

    823

    grew

    VBD

    B-VP

    823

    3.8

    CD

    B-NP

    823

    %

    NN

    I-NP

    823

    .

    .

    O

  • Prediction algorithm PAI command

    PAI -name=crf_predict    
        -project=algo_public    
        -DinputTableName=crf_test_input_table    
        -DmodelTableName=crf_model    
        -DidColName=sentence_id    
        -DfeatureColNames=word,f1    
        -DlabelColName=label    
        -DoutputTableName=crf_predict_result    
        -DdetailColName=prediction_detail   
        -Dlifecycle=28    
        -DcoreNum=10

    Parameter

    Required

    Description

    Default value

    inputTableName

    Yes

    The table that contains the input features.

    No default value

    inputTablePartitions

    No

    The partitions selected from the table that contains the input features.

    All partitions

    featureColNames

    No

    The feature columns selected from the input table.

    All columns, excluding the label column

    labelColName

    No

    The column that you want to select.

    No default value

    IdColName

    Yes

    The column that contains sample labels.

    No default value

    resultColName

    No

    The result column in the output table.

    prediction_result

    scoreColName

    No

    The score column in the output table.

    prediction_score

    detailColName

    No

    The detail column in the output table.

    No default value

    outputTableName

    Yes

    The output prediction result table.

    No default value

    outputTablePartitions

    No

    The partitions selected from the output prediction result table.

    All partitions

    modelTableName

    Yes

    The algorithm model table.

    No default value

    modelTablePartitions

    No

    The partitions selected from the algorithm model table.

    All partitions

    lifecycle

    No

    The lifecycle of the output table.

    No default value

    coreNum

    No

    The number of cores.

    Determined by the system

    memSizePerCore

    No

    The memory size of each core.

    Determined by the system

  • Output data

    sentence_id

    word

    f1

    label

    1

    Confidence

    NN

    B-NP

    1

    in

    IN

    B-PP

    1

    the

    DT

    B-NP

    1

    pound

    NN

    I-NP

    ...

    ...

    ...

    ...

    77

    have

    VBP

    B-VP

    77

    announced

    VBN

    I-VP

    77

    similar

    JJ

    B-NP

    77

    increases

    NNS

    I-NP

    77

    .

    .

    O

    Note

    The label column is optional.