The Random Forest Feature Importance Evaluation component allows you to use raw data and a random forest model to calculate feature importance.
Configure the component
- Use the Machine Learning Platform for AI console
Tab Parameter Description Fields Setting Feature Columns Optional. The feature columns that are selected from the input table for training. By default, all columns other than the label column are selected. Target Column Required. The label column.
Click the icon. In the Select Column dialog box, enter the keywords of the column that you want to search for. Select the column and click OK.
Parameters Setting Concurrent Computing Core Number Optional. The number of cores used in parallel computing. Memory Size per Core Optional. The memory size of each core. Unit: MB.
- Use commands
pai -name feature_importance -project algo_public -DinputTableName=pai_dense_10_10 -DmodelName=xlab_m_random_forests_1_20318_v0 -DoutputTableName=erkang_test_dev.pai_temp_2252_20319_1 -DlabelColName=y - DfeatureColNames="pdays,previous,emp_var_rate,cons_price_idx,cons_conf_idx,euribor3m,nr_employed,age,campaign,poutcome" -Dlifecycle=28 ;
Parameter Required Description Default value inputTableName Yes The name of the input table. N/A outputTableName Yes The name of the output table. N/A labelColName Yes The name of the label column in the input table. N/A modelName Yes The name of the input model. N/A featureColNames No The feature columns that are selected from the input table for training. All columns other than the label column inputTablePartitions No The partitions that are selected from the input table for training. All partitions lifecycle No The lifecycle of the output table. Empty string coreNum No The number of cores. Determined by the system memSizePerCore No The memory size of each core. Determined by the system
- Execute the following SQL statements to generate training data:
drop table if exists pai_dense_10_10; create table if not exists pai_dense_10_10 as select age,campaign,pdays, previous, poutcome, emp_var_rate, cons_price_idx, cons_conf_idx, euribor3m, nr_employed, y from bank_data limit 10;
- Create the experiment shown in the following figure. For more information, see Generate a model by using an algorithm.
The data source is pai_dense_10_10. y is the label column of the random forest model, and other columns are feature columns. Select age and campaign for the Converted Column parameter. This indicates that the two columns are processed as enumerated features, and default settings are retained for other columns.
- Run the experiment and view the prediction results.
- After the experiment is run, right-click the Random Forest Feature Importance Evaluation component and select View Analytics Report to view the result.