The Linear Regression component is used to analyze the linear relationship between a dependent variable and multiple independent variables.

Configure the component

You can configure the component by using one of the following methods:
  • Use the Machine Learning Platform for AI console
    Tab Parameter Description
    Fields Setting Feature Columns The feature columns that are selected from the input table for training.
    Label Column The label column. The columns of the DOUBLE and BIGINT types are supported.
    Use Sparse Format Specifies whether the input data is in the sparse format. Data in the sparse format is presented by using key-value pairs.
    KV Pair Delimiter The default delimiter is a comma (,).
    KV Delimiter The delimiter that is used to separate keys and values. Colons (:) are used by default.
    Parameters Setting Maximum Iterations The maximum number of iterations performed by the algorithm.
    Minimum Likelihood Deviance The algorithm is terminated if the difference of log-likelihood between two iterations is less than the value specified by this parameter.
    Specifies the regularization type The regularization type. Valid values: L1, L2, and None.
    Regularization Coefficient The regularization coefficient. This parameter is invalid if the Specifies the regularization type parameter is set to None.
    Generate Model Evaluation Table The metrics include R-Squared, adjusted R-Squared, AIC, degree of freedom, residual standard deviation, and residual deviation.
    Regression Coefficient Evaluation The metrics include the t value and p value, and the confidence interval is [2.5%,97.5%]. This parameter is valid only if Generate Model Evaluation Table is selected.
    Tuning Number of Computing Cores The number of cores. By default, the system determines the value.
    Memory Size per Core The memory size of each core. By default, the system determines the value.
  • Use commands
    PAI -name linearregression
        -project algo_public
        -DinputTableName=lm_test_input
        -DfeatureColNames=x
        -DlabelColName=y
        -DmodelName=lm_test_input_model_out;
    Parameter Required Description Default value
    inputTableName Yes The name of the input table. N/A
    modelName Yes The name of the output model. N/A
    outputTableName No The name of the output model evaluation table. This parameter is required if the enableFitGoodness parameter is set to true. N/A
    labelColName Yes The label column. This parameter specifies the dependent variable. The columns of the DOUBLE and BIGINT types are supported. You can select only one column. N/A
    featureColNames Yes The feature columns. This parameter specifies the independent variables. If data in the input table is in the dense format, the columns of the DOUBLE and BIGINT types are supported. If the input data is in the sparse format, only the columns of the STRING type are supported. N/A
    inputTablePartitions No The partitions that are selected from the input table for training. N/A
    enableSparse No Specifies whether data in the input table is in the sparse format. Valid values: true and false. false
    itemDelimiter No The delimiter that is used to separate key-value pairs. This parameter is valid if the enableSparse parameter is set to true. ,
    kvDelimiter No The delimiter that is used to separate keys and values. This parameter is valid if the enableSparse parameter is set to true. :
    maxIter No The maximum number of iterations performed by the algorithm. 100
    epsilon No The minimum likelihood error. The algorithm is terminated if the difference of log-likelihood between two iterations is less than the value specified by this parameter. 0.000001
    regularizedType No The regularization type. Valid values: l1, l2, and None. None
    regularizedLevel No The regularization coefficient. This parameter is invalid if the regularizedType parameter is set to None. 1
    enableFitGoodness No Specifies whether to generate the model evaluation table. The metrics include R-Squared, adjusted R-Squared, AIC, degree of freedom, residual standard deviation, and residual deviation. Valid values: true and false. false
    enableCoefficientEstimate No Specifies whether to evaluate the regression coefficient. The metrics include the t value and p value, and the confidence interval is [2.5%,97.5%]. This parameter is valid if the enableFitGoodness parameter is set to true. Valid values: true and false. false
    lifecycle No The lifecycle of the output model evaluation table. -1
    coreNum No The number of cores used in computing. Determined by the system
    memSizePerCore No The memory size of each core. Valid values: 1024 to 20 × 1024. Unit: MB. Determined by the system

Example

  1. Execute the following SQL statements to generate test data:
     drop table if exists lm_test_input;
      create table lm_test_input as
      select
        *
      from
      (
        select 10 as y, 1.84 as x1, 1 as x2, '0:1.84 1:1' as sparsecol1 from dual
          union all
        select 20 as y, 2.13 as x1, 0 as x2, '0:2.13' as sparsecol1 from dual
          union all
        select 30 as y, 3.89 as x1, 0 as x2, '0:3.89' as sparsecol1 from dual
          union all
        select 40 as y, 4.19 as x1, 0 as x2, '0:4.19' as sparsecol1 from dual
          union all
        select 50 as y, 5.76 as x1, 0 as x2, '0:5.76' as sparsecol1 from dual
          union all
        select 60 as y, 6.68 as x1, 2 as x2, '0:6.68 1:2' as sparsecol1 from dual
          union all
        select 70 as y, 7.58 as x1, 0 as x2, '0:7.58' as sparsecol1 from dual
          union all
        select 80 as y, 8.01 as x1, 0 as x2, '0:8.01' as sparsecol1 from dual
          union all
        select 90 as y, 9.02 as x1, 3 as x2, '0:9.02 1:3' as sparsecol1 from dual
          union all
        select 100 as y, 10.56 as x1, 0 as x2, '0:10.56' as sparsecol1 from dual
      ) tmp;
  2. Run the following PAI command to submit the parameters configured for the Linear Regression component:
    PAI -name linearregression
        -project algo_public
        -DinputTableName=lm_test_input
        -DlabelColName=y
        -DfeatureColNames=x1,x2
        -DmodelName=lm_test_input_model_out
        -DoutputTableName=lm_test_input_conf_out
        -DenableCoefficientEstimate=true
        -DenableFitGoodness=true
        -Dlifecycle=1;
  3. Run the following PAI command to submit the parameters configured for the Prediction component:
    pai -name prediction
        -project algo_public
        -DmodelName=lm_test_input_model_out
        -DinputTableName=lm_test_input
        -DoutputTableName=lm_test_input_predict_out
        -DappendColNames=y;
  4. View the generated model evaluation table lm_test_input_conf_out.
    +------------+------------+------------+------------+--------------------+------------+
    | colname    | value      | tscore     | pvalue     | confidenceinterval | p          |
    +------------+------------+------------+------------+--------------------+------------+
    | Intercept  | -6.42378496687763 | -2.2725755951390028 | 0.06       | {"2.5%": -11.964027, "97.5%": -0.883543} | coefficient |
    | x1         | 10.260063429838898 | 23.270944360826963 | 0.0        | {"2.5%": 9.395908, "97.5%": 11.124219} | coefficient |
    | x2         | 0.35374498323846265 | 0.2949247320997519 | 0.81       | {"2.5%": -1.997160, "97.5%": 2.704650} | coefficient |
    | rsquared   | 0.9879675667384592 | NULL       | NULL       | NULL               | goodness   |
    | adjusted_rsquared | 0.9845297286637332 | NULL       | NULL       | NULL               | goodness   |
    | aic        | 59.331109494251805 | NULL       | NULL       | NULL               | goodness   |
    | degree_of_freedom | 7.0        | NULL       | NULL       | NULL               | goodness   |
    | standardErr_residual | 3.765777749448906 | NULL       | NULL       | NULL               | goodness   |
    | deviance   | 99.26757440771128 | NULL       | NULL       | NULL               | goodness   |
    +------------+------------+------------+------------+--------------------+------------+
  5. View the prediction result table lm_test_input_predict_out indicated by the following code:
    +------------+-------------------+------------------+-------------------+
    | y          | prediction_result | prediction_score | prediction_detail |
    +------------+-------------------+------------------+-------------------+
    | 10         | NULL              | 12.808476727264404 | {"y": 12.8084767272644} |
    | 20         | NULL              | 15.43015013867922 | {"y": 15.43015013867922} |
    | 30         | NULL              | 33.48786177519568 | {"y": 33.48786177519568} |
    | 40         | NULL              | 36.565880804147355 | {"y": 36.56588080414735} |
    | 50         | NULL              | 52.674180388994415 | {"y": 52.67418038899442} |
    | 60         | NULL              | 62.82092871092313 | {"y": 62.82092871092313} |
    | 70         | NULL              | 71.34749583130122 | {"y": 71.34749583130122} |
    | 80         | NULL              | 75.75932310613193 | {"y": 75.75932310613193} |
    | 90         | NULL              | 87.1832221199846 | {"y": 87.18322211998461} |
    | 100        | NULL              | 101.92248485222113 | {"y": 101.9224848522211} |
    +------------+-------------------+------------------+-------------------+