All Products
Search
Document Center

Platform For AI:Linear Model Feature Importance

Last Updated:Mar 06, 2024

The Linear Model Feature Importance component is used to calculate the feature importance for a linear model, such as linear regression and logistic regression for binary classification. Both the sparse and dense data formats are supported. This topic describes how to configure the Linear Model Feature Importance component.

Limits

You can use the Linear Model Feature Importance component based only on the computing resources of MaxCompute.

Configure the component

You can configure the component by using one of the following methods:

Method 1: Configure the component in the Platform for AI (PAI) console

Configure the component parameters in Machine Learning Designer. The following table describes the parameters.

Tab

Parameter

Description

Fields Setting

Feature Columns

Select the feature columns for training from the input table. Optional. By default, all columns except the label column are selected.

Target Column

Required. The label column. Click Select Fields. In the Select Fields dialog box, enter the keyword of the column that you want to search for. Select the column and click OK.

Input Sparse Format Data

Optional. Specifies whether data in the input table is sparse.

Tuning

Cores

Optional. The number of cores used in computing.

Memory Size per Core

Optional. The memory size of each core. Unit: MB.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. The following section describes the parameters. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

PAI -name regression_feature_importance -project algo_public
    -DmodelName=xlab_m_logisticregressi_20317_v0
    -DoutputTableName=pai_temp_2252_20321_1
    -DlabelColName=y
    -DfeatureColNames=pdays,previous,emp_var_rate,cons_price_idx,cons_conf_idx,euribor3m,nr_employed,age,campaign
    -DenableSparse=false -DinputTableName=pai_dense_10_9;

Parameter

Required

Description

Default value

inputTableName

Yes

The name of the input table.

None

outputTableName

Yes

The name of the output table.

None

labelColName

Yes

The label column that is selected from the input table.

None

modelName

Yes

The name of the input model.

None

featureColNames

No

The feature columns that are selected from the input table.

All columns other than the label column

inputTablePartitions

No

The partitions that are selected from the input table.

Full table

enableSparse

No

Specifies whether data in the input table is sparse.

false

itemDelimiter

No

The delimiter that is used to separate key-value pairs when data in the input table is sparse.

Backspace

kvDelimiter

No

The delimiter that is used to separate keys and values when data in the input table is sparse.

Colons (:)

lifecycle

No

The lifecycle of the output table.

Not specified

coreNum

No

The number of cores.

Determined by the system

memSizePerCore

No

The memory size of each core.

Determined by the system

Example

  1. Create a table named bank_data and import data to the table. For more information, see Create tables and Import data to tables.

  2. Execute the following SQL statements to generate training data:

    create table if not exists pai_dense_10_9 as
    select
        age,campaign,pdays, previous, emp_var_rate, cons_price_idx, cons_conf_idx, euribor3m, nr_employed, fixed_deposit
    from  bank_data limit 10;
  3. Create a pipeline shown in the following figure and run the component. For more information, see Algorithm modeling.image

    1. In the left-side component list of Machine Learning Designer, separately search for the Read Table, Logistic Regression for Multiclass Classification, and Linear Model Feature Importance components, and drag the components to the canvas on the right.

    2. Connect nodes by drawing lines to organize the nodes into a pipeline that includes upstream and downstream relationships based on the preceding figure.

    3. Configure the component parameters.

      • On the canvas, click the Read Table-1 component. On the Select Table tab in the right pane, set Table Name to bank_data.

      • On the canvas, click the Logistic Regression for Multiclass Classification-1 component. On the Fields Setting tab, select age, campaign, pdays, previous, emp_var_rate, cons_price_idx, cons_conf_idx, euribor3m, and nr_installed for the Training Feature Columns parameter. Set the Target Columns parameter to fixed_deposit. Retain the default values for the remaining parameters.

      • On the canvas, click the Linear Model Feature Importance-1 component. On the Fields Setting tab, set the Target Column parameter to fixed_deposit. Retain the default values for the remaining parameters.

    4. After the parameter configuration is complete, click the image button to run the pipeline.

  4. After the pipeline is run, right-click the Linear Model Feature Importance-1 component and choose View Data > Model Importance Table.image

    The following table describes the calculation formulas for metrics.

    Column name

    Formula

    weight

    abs(w_)

    importance

    abs(w_j) * STD(f_i)

    Note

    abs(w_j) indicates the absolute value of the feature coefficient. STD(f_i) indicates the standard deviation of the training data.

  5. Right-click the Linear Model Feature Importance-1 component and select View Analytics Report to view the reports for visualized data analysis.image

References