This topic describes the Prediction component provided by Machine Learning Studio.

Configure the Prediction component

Traditional data mining algorithms can use the Prediction component to perform model prediction. The component uses the training model and prediction data as input and generates the prediction result.

You can configure the component by using one of the following methods:
  • Machine Learning Platform for AI (PAI) console
    Tab Parameter Description
    Fields Setting Feature Columns The feature columns that are selected from the input table for prediction. By default, all columns in the input table are selected.
    Reserved Columns The columns that you want to reserve in the output table. We recommend that you add a label column to facilitate evaluation.
    Output Result Column The result column in the output table.
    Output Score Column The score column in the output table.
    Output Detail Column The details column in the output table.
    Sparse Matrix Specifies whether the input data is in the sparse format. Data in the sparse format is presented by using key-value pairs.
    KV Delimiter The delimiter that is used to separate keys and values. By default, colons (:) are used.
    KV Pair Delimiter The delimiter that is used to separate key-value pairs. By default, commas (,) are used.
    Tuning Cores The number of cores. This parameter must be used with the Memory Size per Core parameter. The value of this parameter must be a positive integer.
    Memory Size per Core The memory size of each core. This parameter must be used with the Cores parameter. Unit: MB.
  • PAI command
    pai -name prediction
        -DmodelName=nb_model
        -DinputTableName=wpbc
        -DoutputTableName=wpbc_pred
        -DappendColNames=label;
    Parameter Required Description Default value
    inputTableName Yes The name of the input table. N/A
    featureColNames No The feature columns that are selected from the input table for prediction. Separate multiple columns with commas (,). All columns
    appendColNames No The prediction columns that are selected from the input table and appended to the output table. N/A
    inputTablePartitions No The partitions that are selected from the input table for training. Specify this parameter in one of the following formats:
    • Partition_name=value
    • name1=value1/name2=value2: multi-level partitions
    Note If you specify multiple partitions, separate these partitions with commas (,).
    All partitions
    outputTablePartition No The partitions whose results are contained in the output table. N/A
    resultColName No The column in the output table that contains the prediction result with the highest probability among all possible results. prediction_result
    scoreColName No The column in the output table that contains the probability of the prediction result. prediction_score
    detailColName No The details column in the output table that contains all possible results and their probabilities. prediction_detail
    enableSparse No Specifies whether the input data is in the sparse format. Valid values: true and false. false
    itemDelimiter No The delimiter that is used to separate key-value pairs in the sparse format. ,
    kvDelimiter No The delimiter that is used to separate keys and values. :
    modelName Yes The name of the input clustering model. N/A
    outputTableName Yes The name of the output table. N/A
    lifecycle No The lifecycle of the output table. N/A
    coreNum No The number of cores. Automatically allocated
    memSizePerCore No The memory size of each core. Unit: MB. Automatically allocated

Example

  1. Execute the following SQL statements to generate test data:
    create table pai_rf_test_input as
    select * from
    (
    select 1 as f0,2 as f1, "good" as class from dual
    union all
    select 1 as f0,3 as f1, "good" as class from dual
    union all
    select 1 as f0,4 as f1, "bad" as class from dual
    union all
    select 0 as f0,3 as f1, "good" as class from dual
    union all
    select 0 as f0,4 as f1, "bad" as class from dual
    )tmp;
  2. Run the following PAI command to build a model. The random forest algorithm is used in this example.
    PAI -name randomforests
       -project algo_public
       -DinputTableName="pai_rf_test_input"
       -DmodelName="pai_rf_test_model"
       -DforceCategorical="f1"
       -DlabelColName="class"
       -DfeatureColNames="f0,f1"
       -DmaxRecordSize="100000"
       -DminNumPer="0"
       -DminNumObj="2"
       -DtreeNum="3";
  3. Run the following PAI command to submit the parameters configured for the Prediction component:
    PAI -name prediction
        -project algo_public
        -DinputTableName=pai_rf_test_input
        -DmodelName=pai_rf_test_model
        -DresultColName=prediction_result
        -DscoreColName=prediction_score
        -DdetailColName=prediction_detail
        -DoutputTableName=pai_temp_2283_76333_1
  4. View the output result table pai_temp_2283_76333_1, as shown in the following figure. Prediction resultwhere:
    • prediction_result: the column that contains the prediction result with the highest probability among all possible results.
    • prediction_score: the column that contains the probability of the prediction result.

      In this example, the prediction result can be good or bad, depending on whose probability is higher. The prediction_score column contains the probability of the prediction result that can be good or bad.

    • prediction_detail: the column that contains all possible results and their probabilities.