Configure CRF Prediction for NLP Sequence Labeling - Platform For AI

The Conditional Random Field Prediction component is an algorithm component provided by Machine Learning Designer (formerly known as Machine Learning Studio) based on the online prediction model Linear Conditional Random Field (LinearCRF). This component is used to process sequence labeling tasks. This topic describes how to configure parameters for the Conditional Random Field Prediction component. This topic also provides an example on how to use the Conditional Random Field Prediction component.

Configure parameters

You can visually configure component parameters in Designer.

Parameter	Description
Select the ID column	Samples are stored as N-tuples. The ID column contains the unique ID for each sample.
Select a Feature Column	The word to annotate and its corresponding features.
Select the Target Column	Select the target column.
Prediction Result Column	The name of the prediction result column. The default value is prediction_result.
Prediction Score Column	The name of the prediction score column. The default value is prediction_score.
Prediction Detail Column	The name of the prediction detail column. Leave this parameter empty if you do not need the prediction detail column.

Example

The online prediction phase of LinearCRF requires a training model in the Model I/O format. The training data table uses the following format.

sentence_id	word	f1	f2	label
1	Rockwell	NNP	POS	B-NP
1	International	NNP	NP	I-NP
1	Corp	NNP	PO	I-NP
1	's	POS	NN	B-NP
...	...	...	...	...

The feature names word, f1, and f2 in the input format must match the feature column names in the training data table. In an online prediction input request, the features of different words are separated by spaces. The input format for the LinearCRF online prediction model is as follows.

{
       "inputs":[
         {
               "word":{
                    "dataType": 50,
                    "dataValue":"Rockwell International Corp 's ..."
                },
                 "f1": {
                   "dataType": 50,
                   "dataValue":"NNP NNP NNP POS ..."
                },
                 "f2": {
                   "dataType": 50,
                   "dataValue":"POS NP PO NN ..."
                }
         }]
}

The output returns prediction_result, prediction_score, and prediction_detail for all words in the input request. The results are in JSON format within outputValue. The output format for the LinearCRF online prediction model is as follows.

{
    "outputs": [
    {
       "outputLabel": "CRFProcessor_Result",
       "outputValue": {
        "dataType": 50,
        "dataValue": {
            "Rockwell NNP POS": {
            "prediction_result":"B-NP",
            "prediction_score":0.99,
            "prediction_detail":{"B-ADJP":0.000145, "B-NP":0.99, ...}
            },
            "International NNP NP": ...
        }
       }
    }
    ]
}

If the input format is incorrect, the program returns an error message, as shown below.

{
    "outputs": [
    {
       "outputLabel": "CRFProcessor_Result",
       "outputValue": {
        "dataType":50,
        "dataValue": "Failed: The input format is incorrect"
       }
    }
    ]
}