All Products
Search
Document Center

Platform For AI:Regression Model Evaluation

Last Updated:Mar 05, 2025

Regression Model Evaluation refers to the process of quantifying the performance of a model by comparing its predicted results with the actual outcomes, using regression metrics such as Mean Squared Error, Mean Absolute Error, and R-squared. During the evaluation process, visual tools like residual histograms are often generated to analyze the distribution characteristics of the prediction errors, helping to identify potential areas for model improvement. This process ensures that the model possesses good predictive ability and stability.

Configure the component

Method 1: Configure the component on the pipeline page

Add a Regression Model Evaluation component on the pipeline page and configure the following parameters:

Category

Parameter

Description

Fields Setting

Original Regression Value

The actual observed values of the target variable in the dataset, used to evaluate the predictive performance of the regression model and serve as a basis for comparison.

Predicted Regression Value

The estimated values of the target variable obtained through the regression model. The predicted values are generated by the model based on the input features.

Tuning

Worker number

For information about how to configure the number of workers and their memory, see Appendix: How to estimate resource usage.

Memory Size per Node

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

PAI -name regression_evaluation -project algo_public
    -DinputTableName=input_table
    -DyColName=y_col
    -DpredictionColName=prediction_col
    -DindexOutputTableName=index_output_table
    -DresidualOutputTableName=residual_output_table;

Parameter

Required

Default value

Description

inputTableName

Yes

None

The name of the input table.

inputTablePartitions

No

Full table

The partitions that are selected from the input table for computing.

yColName

Yes

None

The name of the column that contains original dependent variables in the input table. The columns of numeric data types are supported.

predictionColName

Yes

None

The name of the column that contains dependent variables in the prediction result. The columns of numeric data types are supported.

indexOutputTableName

Yes

None

The name of the output table of regression metrics.

residualOutputTableName

Yes

None

The name of the output table of the histogram of residuals.

intervalNum

No

100

The number of intervals of the histogram.

lifecycle

No

None

The lifecycle of the output table. The value of this parameter must be a positive integer.

coreNum

No

Determined by the system

The number of cores. Valid values: 1 to 9999.

memSizePerCore

No

Determined by the system

The memory size of each core. Valid values: 1024 to 64 × 1024. Unit: MB.

Output

The output table of regression metrics is generated in the JSON format and contains the following parameters.

Parameter

Description

SST

The total sum of squares.

SSE

The sum of squared errors.

SSR

The sum of squares due to regression.

R2

The coefficient of determination.

R

The coefficient of multiple correlations.

MSE

The mean-square error.

RMSE

The root-mean-square error.

MAE

The mean absolute error.

MAD

The mean absolute deviation.

MAPE

The mean absolute percentage error.

count

The number of rows.

yMean

The mean of original dependent variables.

predictionMean

The mean of prediction results.