This topic describes the Conditional Random Field component provided by Machine Learning Studio.

A conditional random field (CRF) is a conditional probability distribution model of a group of output random variables based on a group of input random variables. This model presumes that the output random variables constitute a Markov random field (MRF). CRFs can be used in different prediction scenarios. The linear chain CRF is mostly used, especially in annotation scenarios. For more information, see Wikipedia.

Configure the component

You can configure the component by using one of the following methods:
  • Machine Learning Platform for AI console
    Tab Parameter Description
    Fields Setting ID Columns The column that contains the ID of each sample. Samples are stored in n-tuples.
    Feature Columns The word to be annotated and their features (if any).
    Target Columns The column that you want to select.
    Parameters Setting Feature Generation Template Default value:
    [-2:0],[-1:0],[0:0],[1:0],[2:0],[-1:0]/[0:0],
    [0:0]/[1:0],[-2:1],[-1:1],[0:1],[1:1],[2:1],
    [-2:1]/[-1:1],[-1:1]/[0:1],[0:1]/[1:1],[1:1]/[2:1],
    [-2:1]/[-1:1]/[0:1],[-1:1]/[0:1]/[1:1],
    [0:1]/[1:1]/[2:1]
    .
    Infrequently Used Word Filtering Threshold Default value: 1.
    L 1 Regularization Coefficient Default value: 1.
    L 2 Regularization Coefficient Default value: 0.
    Maximum Iterations Default value: 100.
    Convergence Threshold Default value: 0.00001.
    Tuning Cores The number of cores. The value is automatically allocated.
    Memory Size per Core The size of the memory required by each core. The value is automatically allocated.
  • PAI command
    PAI -name=linearcrf    
        -project=algo_public    
        -DinputTableName=crf_input_table    
        -DidColName=sentence_id    
        -DfeatureColNames=word,f1    
        -DlabelColName=label    
        -DoutputTableName=crf_model    
        -Dlifecycle=28    
        -DcoreNum=10
    Parameter Required Description Default value
    inputTableName Yes The table that contains the input features. No default value
    inputTablePartitions No The partitions selected from the table that contains the input features. Full table
    featureColNames No The feature columns that are selected from the input table. All columns, excluding the label column
    labelColName Yes The column that you want to select. No default value
    idColName Yes The column that containssample labels. No default value
    outputTableName Yes The table that contains output models. No default value
    outputTablePartitions No The partitions that are selected from the output model table. Full table
    template No The template that is used to generate features.
    • Definition
      <template . =. <template_item,<template_item,...,<template_item
      <template_item . =. [row_offset:col_index]/[row_offset:col_index]/.../[row_offset:col_index]
      row_offset . =. integer
      col_index . =. integer
    • Default value
      [-2:0],[-1:0],[0:0],[1:0],[2:0],[-1:0]/[0:0],[0:0]/[1:0],[-2:1],[-1:1],[0:1],[1:1],[2:1],[-2:1]/[-1:1],[-1:1]/[0:1],[0:1]/[1:1],[1:1]/[2:1],[-2:1]/[-1:1]/[0:1],[-1:1]/[0:1]/[1:1],[0:1]/[1:1]/[2:1]
    freq No The parameter for filtering features. It saves only feature values greater than or equal to the freq value. 1
    iterations No The maximum number of iterations of optimizations. 100
    l1Weight All partitions The parameter weight of L1 regularization. 1.0
    l2Weight No The parameter weight of L2 regularization. 1.0
    epsilon No The convergence deviation. This parameter specifies the requirement to finish the L-BFGS process, which is the deviation between the log-likelihood values in two iterations. 0.0001
    lbfgsStep No The historical step size for optimization that is performed by using the L-BFGS algorithm. Only the L-BFGS algorithm supports this parameter. 10
    threadNum No The number of parallel threads launched for model training. 3
    lifecycle No The lifecycle of the output table. No default value
    coreNum No The number of cores. Automatically allocated
    memSizePerCore No The memory size of each core. Automatically allocated

Examples

  • Input data
    sentence_id word f1 label
    1 Rockwell NNP B-NP
    1 International NNP I-NP
    1 Corp NNP I-NP
    1 ‘s POS B-NP
    823 Ohio NNP B-NP
    823 grew VBD B-VP
    823 3.8 CD B-NP
    823 % NN I-NP
    823 . . O
  • Prediction algorithm PAI command
    PAI -name=crf_predict    
        -project=algo_public    
        -DinputTableName=crf_test_input_table    
        -DmodelTableName=crf_model    
        -DidColName=sentence_id    
        -DfeatureColNames=word,f1    
        -DlabelColName=label    
        -DoutputTableName=crf_predict_result    
        -DdetailColName=prediction_detail   
        -Dlifecycle=28    
        -DcoreNum=10
    Parameter Required Description Default value
    inputTableName Yes The table that contains the input features. No default value
    inputTablePartitions No The partitions selected from the input table. Full table
    featureColNames No The feature columns selected from the input table. All columns, excluding the label column
    labelColName No The column that you want to select. No default value
    IdColName Yes The sample label column. No default value
    resultColName No The result column in the output table. prediction_result
    scoreColName No The score column in the output table. prediction_score
    detailColName No The detail column in the output table. No default value
    outputTableName Yes The output prediction result table. No default value
    outputTablePartitions No The partitions selected from the output prediction result table. Full table
    modelTableName Yes The algorithm model table. No default value
    modelTablePartitions No The partitions selected from the algorithm model table. Full table
    lifecycle No The lifecycle of the output table. No default value
    coreNum No The number of cores. Automatically allocated
    memSizePerCore No The memory size of each core. Automatically allocated
  • Output data
    sentence_id word f1 label
    1 Confidence NN B-NP
    1 in IN B-PP
    1 the DT B-NP
    1 pound NN I-NP
    77 have VBP B-VP
    77 announced VBN I-VP
    77 similar JJ B-NP
    77 increases NNS I-NP
    77 . . O
    Note The label column is optional.