All Products
Search
Document Center

Platform For AI:Lasso Regression Training

Last Updated:May 31, 2023

The Least Absolute Shrinkage and Selection Operator (LASSO) regression algorithm can implement compression estimation. The Lasso Regression Training component is developed based on the LASSO algorithm. The component supports sparse and dense data and allows weighted data samples to be used for training. This topic describes how to configure the Lasso Regression Training component.

Limits

The supported computing engines are MaxCompute and Apache Flink.

How LASSO works

LASSO creates a penalty function to obtain a more refined model. LASSO can shrink some regression coefficients and set specific regression coefficients to zero. If a coefficient is shrunk, the sum of the absolute values of the coefficient is less than a fixed value. This way, LASSO retains the beneficial features of subset shrinkage and implements biased estimation on multicollinearity data.

Configure the component in the PAI console

  • Input ports

    Input port (left-to-right)

    Data type

    Recommended upstream component

    Required

    data

    None

    Yes

    model

    LASSO model (for incremental training)

    • Read Table (for reading model data)

    • Lasso Regression Training

    No

  • Component parameters

    TabParameterDescription
    Field SettinglabelColThe name of the label column in the input table.
    featureColsIf you have set the vectorCol parameter, this parameter cannot be set.
    The feature columns that are used for training.
    Note The featureCols and vectorCol parameters are mutually exclusive. You can use only one of them to describe the input features of the algorithm.
    vectorColIf you have set the featureCols parameter, this parameter cannot be set.
    The name of the vector column.
    Note The featureCols and vectorCol parameters are mutually exclusive. You can use only one of them to describe the input features of the algorithm.
    weightColThe name of the weight column.
    Parameters SettinglambdaThe DOUBLE-typed regularization coefficient.
    epsilonThe value that you expect to obtain from the training results before the iteration stops. Default value: 1.0E-6.
    maxIterThe maximum number of iterations. Default value: 100.
    optimMethodThe optimization method used to improve problem-solving. Valid values:
    • LBFGS
    • GD
    • Newton
    • SGD
    • OWLQN
    Execution TuningNumber of WorkersThe number of workers. This parameter must be used together with the Memory per worker, unit MB parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].
    Memory per worker, unit MBThe memory size of each worker. Valid values: 1024 to 64 × 1024. Unit: MB.
  • Output ports

    Output port

    Data type

    Downstream component

    model

    Regression model

    Lasso Regression Prediction

    model information

    None

    None

    feature importance

    None

    None

    linear model weight

    None

    None

Configure the component by coding

You can copy the following code to the code editor of the PyAlink Script component. This allows the PyAlink Script component to function like the Lasso Regression Training component.

from pyalink.alink import *

def main(sources, sinks, parameter):
    batchData = sources[0]
    ridge = LassoRegTrainBatchOp()\
        .setLambda(0.1)\
        .setFeatureCols(["f0","f1"])\
        .setLabelCol("label")
    model = batchData.link(ridge)
    model.link(sinks[0])
    BatchOperator.execute()