The Linear Regression component is used to analyze the linear relationship between a dependent variable and multiple independent variables.

Configure the component

You can use one of the following methods to configure the Linear Regression component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Linear Regression component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
TabParameterDescription
Fields SettingFeature ColumnsThe feature columns that are selected from the input table for training.
Label ColumnThe label column. The columns of the DOUBLE and BIGINT types are supported.
Use Sparse FormatSpecifies whether the input data is in the sparse format. Data in the sparse format is presented by using key-value pairs.
KV Pair DelimiterThe default delimiter is a comma (,).
KV DelimiterThe delimiter that is used to separate keys and values. Colons (:) are used by default.
Parameters SettingMaximum IterationsThe maximum number of iterations performed by the algorithm.
Minimum Likelihood DevianceThe algorithm is terminated if the difference of log-likelihood between two iterations is less than the value specified by this parameter.
Specifies the regularization typeThe regularization type. Valid values: L1, L2, and None.
Regularization CoefficientThe regularization coefficient. This parameter is invalid if the Specifies the regularization type parameter is set to None.
Generate Model Evaluation TableThe metrics include R-Squared, adjusted R-Squared, AIC, degree of freedom, residual standard deviation, and residual deviation.
Regression Coefficient EvaluationThe metrics include the t value and p value, and the confidence interval is [2.5%,97.5%]. This parameter is valid only if Generate Model Evaluation Table is selected.
TuningNumber of Computing CoresThe number of cores. By default, the system determines the value.
Memory Size per CoreThe memory size of each core. By default, the system determines the value.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name linearregression
    -project algo_public
    -DinputTableName=lm_test_input
    -DfeatureColNames=x
    -DlabelColName=y
    -DmodelName=lm_test_input_model_out;
ParameterRequiredDescriptionDefault value
inputTableNameYesThe name of the input table. N/A
modelNameYesThe name of the output model. N/A
outputTableNameNoThe name of the output model evaluation table. This parameter is required if the enableFitGoodness parameter is set to true. N/A
labelColNameYesThe label column. This parameter specifies the dependent variable. The columns of the DOUBLE and BIGINT types are supported. You can select only one column. N/A
featureColNamesYesThe feature columns. This parameter specifies the independent variables. If data in the input table is in the dense format, the columns of the DOUBLE and BIGINT types are supported. If the input data is in the sparse format, only the columns of the STRING type are supported. N/A
inputTablePartitionsNoThe partitions that are selected from the input table for training. N/A
enableSparseNoSpecifies whether data in the input table is in the sparse format. Valid values: true and false. false
itemDelimiterNoThe delimiter that is used to separate key-value pairs. This parameter is valid if the enableSparse parameter is set to true. ,
kvDelimiterNoThe delimiter that is used to separate keys and values. This parameter is valid if the enableSparse parameter is set to true. :
maxIterNoThe maximum number of iterations performed by the algorithm. 100
epsilonNoThe minimum likelihood error. The algorithm is terminated if the difference of log-likelihood between two iterations is less than the value specified by this parameter. 0.000001
regularizedTypeNoThe regularization type. Valid values: l1, l2, and None. None
regularizedLevelNoThe regularization coefficient. This parameter is invalid if the regularizedType parameter is set to None. 1
enableFitGoodnessNoSpecifies whether to generate the model evaluation table. The metrics include R-Squared, adjusted R-Squared, AIC, degree of freedom, residual standard deviation, and residual deviation. Valid values: true and false. false
enableCoefficientEstimateNoSpecifies whether to evaluate the regression coefficient. The metrics include the t value and p value, and the confidence interval is [2.5%,97.5%]. This parameter is valid if the enableFitGoodness parameter is set to true. Valid values: true and false. false
lifecycleNoThe lifecycle of the output model evaluation table. -1
coreNumNoThe number of cores used in computing. Determined by the system
memSizePerCoreNoThe memory size of each core. Valid values: 1024 to 20 × 1024. Unit: MB. Determined by the system

Example

  1. Execute the following SQL statements to generate test data:
     drop table if exists lm_test_input;
      create table lm_test_input as
      select
        *
      from
      (
        select 10 as y, 1.84 as x1, 1 as x2, '0:1.84 1:1' as sparsecol1 from dual
          union all
        select 20 as y, 2.13 as x1, 0 as x2, '0:2.13' as sparsecol1 from dual
          union all
        select 30 as y, 3.89 as x1, 0 as x2, '0:3.89' as sparsecol1 from dual
          union all
        select 40 as y, 4.19 as x1, 0 as x2, '0:4.19' as sparsecol1 from dual
          union all
        select 50 as y, 5.76 as x1, 0 as x2, '0:5.76' as sparsecol1 from dual
          union all
        select 60 as y, 6.68 as x1, 2 as x2, '0:6.68 1:2' as sparsecol1 from dual
          union all
        select 70 as y, 7.58 as x1, 0 as x2, '0:7.58' as sparsecol1 from dual
          union all
        select 80 as y, 8.01 as x1, 0 as x2, '0:8.01' as sparsecol1 from dual
          union all
        select 90 as y, 9.02 as x1, 3 as x2, '0:9.02 1:3' as sparsecol1 from dual
          union all
        select 100 as y, 10.56 as x1, 0 as x2, '0:10.56' as sparsecol1 from dual
      ) tmp;
  2. Run the following PAI command to submit the parameters configured for the Linear Regression component:
    PAI -name linearregression
        -project algo_public
        -DinputTableName=lm_test_input
        -DlabelColName=y
        -DfeatureColNames=x1,x2
        -DmodelName=lm_test_input_model_out
        -DoutputTableName=lm_test_input_conf_out
        -DenableCoefficientEstimate=true
        -DenableFitGoodness=true
        -Dlifecycle=1;
  3. Run the following PAI command to submit the parameters configured for the Prediction component:
    pai -name prediction
        -project algo_public
        -DmodelName=lm_test_input_model_out
        -DinputTableName=lm_test_input
        -DoutputTableName=lm_test_input_predict_out
        -DappendColNames=y;
  4. View the generated model evaluation table lm_test_input_conf_out.
    +------------+------------+------------+------------+--------------------+------------+
    | colname    | value      | tscore     | pvalue     | confidenceinterval | p          |
    +------------+------------+------------+------------+--------------------+------------+
    | Intercept  | -6.42378496687763 | -2.2725755951390028 | 0.06       | {"2.5%": -11.964027, "97.5%": -0.883543} | coefficient |
    | x1         | 10.260063429838898 | 23.270944360826963 | 0.0        | {"2.5%": 9.395908, "97.5%": 11.124219} | coefficient |
    | x2         | 0.35374498323846265 | 0.2949247320997519 | 0.81       | {"2.5%": -1.997160, "97.5%": 2.704650} | coefficient |
    | rsquared   | 0.9879675667384592 | NULL       | NULL       | NULL               | goodness   |
    | adjusted_rsquared | 0.9845297286637332 | NULL       | NULL       | NULL               | goodness   |
    | aic        | 59.331109494251805 | NULL       | NULL       | NULL               | goodness   |
    | degree_of_freedom | 7.0        | NULL       | NULL       | NULL               | goodness   |
    | standardErr_residual | 3.765777749448906 | NULL       | NULL       | NULL               | goodness   |
    | deviance   | 99.26757440771128 | NULL       | NULL       | NULL               | goodness   |
    +------------+------------+------------+------------+--------------------+------------+
  5. View the prediction result table lm_test_input_predict_out indicated by the following code:
    +------------+-------------------+------------------+-------------------+
    | y          | prediction_result | prediction_score | prediction_detail |
    +------------+-------------------+------------------+-------------------+
    | 10         | NULL              | 12.808476727264404 | {"y": 12.8084767272644} |
    | 20         | NULL              | 15.43015013867922 | {"y": 15.43015013867922} |
    | 30         | NULL              | 33.48786177519568 | {"y": 33.48786177519568} |
    | 40         | NULL              | 36.565880804147355 | {"y": 36.56588080414735} |
    | 50         | NULL              | 52.674180388994415 | {"y": 52.67418038899442} |
    | 60         | NULL              | 62.82092871092313 | {"y": 62.82092871092313} |
    | 70         | NULL              | 71.34749583130122 | {"y": 71.34749583130122} |
    | 80         | NULL              | 75.75932310613193 | {"y": 75.75932310613193} |
    | 90         | NULL              | 87.1832221199846 | {"y": 87.18322211998461} |
    | 100        | NULL              | 101.92248485222113 | {"y": 101.9224848522211} |
    +------------+-------------------+------------------+-------------------+