All Products
Search
Document Center

Platform For AI:Random forest feature importance

Last Updated:Apr 02, 2026

When building or refining a classification or regression model with Random Forest, you may want to know which input features drive predictions — and which add little value. The Random Forest Feature Importance Evaluation component ranks each feature by its contribution to the model, so you can prune uninformative columns, prioritize data collection, or explain model behavior to stakeholders.

The component supports two scoring methods:

  • Mean decrease in impurity (MDI): measures how much each feature reduces the average impurity across all decision trees during training.

  • Permutation importance: measures how much model accuracy drops when a feature's values are randomly shuffled.

Prerequisites

Before you begin, ensure that you have:

  • A trained Random Forest model available in PAI

  • Access to Machine Learning Designer

  • An input table containing the feature columns and label column

Configure the component

Method 1: Configure the component on the pipeline page

On the pipeline details page in Machine Learning Designer, add the Random Forest Feature Importance Evaluation component to the pipeline and configure the following parameters.

Tab Parameter Description
Fields Setting Feature Columns Optional. The feature columns selected from the input table for training. Defaults to all columns other than the label column.
Target Column Required. The label column. Click the Directory icon, enter a keyword in the Select Column dialog box, select the column, and click OK.
Parameters Setting Parallel Computing Cores Optional. The number of cores used in parallel computing.
Memory Size per Core Optional. The memory size per core, in MB.

Method 2: Use PAI commands

Use the SQL Script component to run PAI commands. For details, see Scenario 4: Execute PAI commands within the SQL script component.

pai -name feature_importance -project algo_public
    -DinputTableName=pai_dense_10_10
    -DmodelName=xlab_m_random_forests_1_20318_v0
    -DoutputTableName=erkang_test_dev.pai_temp_2252_20319_1
    -DlabelColName=y
    -DfeatureColNames="pdays,previous,emp_var_rate,cons_price_idx,cons_conf_idx,euribor3m,nr_employed,age,campaign,poutcome"
    -Dlifecycle=28 ;
Parameter Required Default Description
inputTableName Yes Name of the input table.
outputTableName Yes Name of the output table.
labelColName Yes Name of the label column in the input table.
modelName Yes Name of the input model.
featureColNames No All columns other than the label column Feature columns selected from the input table for training.
inputTablePartitions No All partitions Partitions selected from the input table for training.
lifecycle No Not specified Lifecycle of the output table.
coreNum No Determined by the system Number of cores.
memSizePerCore No Determined by the system Memory size per core, in MB.

Example

This example uses the bank_data table to evaluate which features most influence a classification model trained with Random Forest.

  1. Create the training table by running the following SQL statements. This example selects the top 10 records from bank_data.

    drop table if exists pai_dense_10_10;
    create table pai_dense_10_10 as
    select
        age, campaign, pdays, previous, poutcome,
        emp_var_rate, cons_price_idx, cons_conf_idx,
        euribor3m, nr_employed, y
    from bank_data limit 10;
  2. Create the experiment shown below. For details, see Custom pipelines. Use pai_dense_10_10 as the data source. Set y as the label column; all other columns are feature columns. For the Columns Forced to Convert parameter, select age and campaign — these columns are processed as enumerated (categorical) features. Keep the default settings for all other parameters.

    Generate a model

  3. Run the experiment and view the prediction results.

    Result

  4. Right-click the Random Forest Feature Importance Evaluation component and select View Analytics Report to view the result.

    Analysis report