All Products
Search
Document Center

Platform For AI:Random Forest Feature Importance Evaluation

Last Updated:Oct 26, 2023

The Random Forest Feature Importance Evaluation component allows you to use raw data and a random forest model to calculate feature importance.

Configure the component

You can use one of the following methods to configure the Random Forest Feature Importance Evaluation component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Random Forest Feature Importance Evaluation component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.

Tab

Parameter

Description

Fields Setting

Feature Columns

Optional. The feature columns that are selected from the input table for training. By default, all columns other than the label column are selected.

Target Column

Required. The label column.

Click the Directory icon. In the Select Column dialog box, enter the keywords of the column that you want to search for. Select the column and click OK.

Parameters Setting

Parallel Computing Cores

Optional. The number of cores used in parallel computing.

Memory Size per Core

Optional. The memory size of each core. Unit: MB.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

pai -name feature_importance -project algo_public
    -DinputTableName=pai_dense_10_10
    -DmodelName=xlab_m_random_forests_1_20318_v0
    -DoutputTableName=erkang_test_dev.pai_temp_2252_20319_1
    -DlabelColName=y
    -DfeatureColNames="pdays,previous,emp_var_rate,cons_price_idx,cons_conf_idx,euribor3m,nr_employed,age,campaign,poutcome"
    -Dlifecycle=28 ;

Parameter

Required

Description

Default value

inputTableName

Yes

The name of the input table.

No default value

outputTableName

Yes

The name of the output table.

No default value

labelColName

Yes

The name of the label column in the input table.

No default value

modelName

Yes

The name of the input model.

No default value

featureColNames

No

The feature columns that are selected from the input table for training.

All columns other than the label column

inputTablePartitions

No

The partitions that are selected from the input table for training.

All partitions

lifecycle

No

The lifecycle of the output table.

Not specified

coreNum

No

The number of cores.

Determined by the system

memSizePerCore

No

The memory size of each core. Unit: MB.

Determined by the system

Example

  1. Execute the following SQL statements to generate training data:

    drop table if exists pai_dense_10_10;
    create table if not exists pai_dense_10_10 as
    select
        age,campaign,pdays, previous, poutcome, emp_var_rate, cons_price_idx, cons_conf_idx, euribor3m, nr_employed, y
    from  bank_data limit 10;
  2. Create the experiment shown in the following figure. For more information, see Algorithm modeling.

    The data source is pai_dense_10_10. y is the label column of the random forest model, and other columns are feature columns. Select age and campaign for the Columns Forced to Convert parameter. This indicates that the two columns are processed as enumerated features, and default settings are retained for other columns. Generate a model

  3. Run the experiment and view the prediction results. Result

  4. After the experiment is run, right-click the Random Forest Feature Importance Evaluation component and select View Analytics Report to view the result. Analysis report