Regression Model Evaluation refers to the process of quantifying the performance of a model by comparing its predicted results with the actual outcomes, using regression metrics such as Mean Squared Error, Mean Absolute Error, and R-squared. During the evaluation process, visual tools like residual histograms are often generated to analyze the distribution characteristics of the prediction errors, helping to identify potential areas for model improvement. This process ensures that the model possesses good predictive ability and stability.
Configure the component
Method 1: Configure the component on the pipeline page
Add a Regression Model Evaluation component on the pipeline page and configure the following parameters:
Category | Parameter | Description |
Fields Setting | Original Regression Value | The actual observed values of the target variable in the dataset, used to evaluate the predictive performance of the regression model and serve as a basis for comparison. |
Predicted Regression Value | The estimated values of the target variable obtained through the regression model. The predicted values are generated by the model based on the input features. | |
Tuning | Worker number | For information about how to configure the number of workers and their memory, see Appendix: How to estimate resource usage. |
Memory Size per Node |
Method 2: Use PAI commands
Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name regression_evaluation -project algo_public
-DinputTableName=input_table
-DyColName=y_col
-DpredictionColName=prediction_col
-DindexOutputTableName=index_output_table
-DresidualOutputTableName=residual_output_table;Parameter | Required | Default value | Description |
inputTableName | Yes | None | The name of the input table. |
inputTablePartitions | No | Full table | The partitions that are selected from the input table for computing. |
yColName | Yes | None | The name of the column that contains original dependent variables in the input table. The columns of numeric data types are supported. |
predictionColName | Yes | None | The name of the column that contains dependent variables in the prediction result. The columns of numeric data types are supported. |
indexOutputTableName | Yes | None | The name of the output table of regression metrics. |
residualOutputTableName | Yes | None | The name of the output table of the histogram of residuals. |
intervalNum | No | 100 | The number of intervals of the histogram. |
lifecycle | No | None | The lifecycle of the output table. The value of this parameter must be a positive integer. |
coreNum | No | Determined by the system | The number of cores. Valid values: 1 to 9999. |
memSizePerCore | No | Determined by the system | The memory size of each core. Valid values: 1024 to 64 × 1024. Unit: MB. |
Output
The output table of regression metrics is generated in the JSON format and contains the following parameters.
Parameter | Description |
SST | The total sum of squares. |
SSE | The sum of squared errors. |
SSR | The sum of squares due to regression. |
R2 | The coefficient of determination. |
R | The coefficient of multiple correlations. |
MSE | The mean-square error. |
RMSE | The root-mean-square error. |
MAE | The mean absolute error. |
MAD | The mean absolute deviation. |
MAPE | The mean absolute percentage error. |
count | The number of rows. |
yMean | The mean of original dependent variables. |
predictionMean | The mean of prediction results. |