Lasso Regression Training - Platform For AI - Alibaba Cloud Documentation Center

The Least Absolute Shrinkage and Selection Operator (LASSO) regression algorithm can implement compression estimation. The Lasso Regression Training component is developed based on the LASSO algorithm. The component supports sparse and dense data and allows weighted data samples to be used for training. This topic describes how to configure the Lasso Regression Training component.

Limits

The supported computing engines are MaxCompute and Apache Flink.

How LASSO works

LASSO creates a penalty function to obtain a more refined model. LASSO can shrink some regression coefficients and set specific regression coefficients to zero. If a coefficient is shrunk, the sum of the absolute values of the coefficient is less than a fixed value. This way, LASSO retains the beneficial features of subset shrinkage and implements biased estimation on multicollinearity data.

Configure the component in the PAI console

Input ports

Input port (left-to-right)	Data type	Recommended upstream component	Required
data	None	Read Table Feature Engineering Data preprocessing	Yes
model	LASSO model (for incremental training)	Read Table (for reading model data) Lasso Regression Training	No

Component parameters

Tab	Parameter	Description
Field Setting	labelCol	The name of the label column in the input table.
	featureCols	If you have set the vectorCol parameter, this parameter cannot be set. The feature columns that are used for training. Note The featureCols and vectorCol parameters are mutually exclusive. You can use only one of them to describe the input features of the algorithm.
	vectorCol	If you have set the featureCols parameter, this parameter cannot be set. The name of the vector column. Note The featureCols and vectorCol parameters are mutually exclusive. You can use only one of them to describe the input features of the algorithm.
	weightCol	The name of the weight column.
Parameters Setting	lambda	The DOUBLE-typed regularization coefficient.
	epsilon	The value that you expect to obtain from the training results before the iteration stops. Default value: 1.0E-6.
	maxIter	The maximum number of iterations. Default value: 100.
	optimMethod	The optimization method used to improve problem-solving. Valid values: LBFGS GD Newton SGD OWLQN
Execution Tuning	Number of Workers	The number of workers. This parameter must be used together with the Memory per worker, unit MB parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].
Execution Tuning	Memory per worker, unit MB	The memory size of each worker. Valid values: 1024 to 64 × 1024. Unit: MB.

Output ports
Output port
Data type
Downstream component
model
Regression model
Lasso Regression Prediction
model information
None
None
feature importance
None
None
linear model weight
None
None

Configure the component by coding

You can copy the following code to the code editor of the PyAlink Script component. This allows the PyAlink Script component to function like the Lasso Regression Training component.

from pyalink.alink import *

def main(sources, sinks, parameter):
    batchData = sources[0]
    ridge = LassoRegTrainBatchOp()\
        .setLambda(0.1)\
        .setFeatureCols(["f0","f1"])\
        .setLabelCol("label")
    model = batchData.link(ridge)
    model.link(sinks[0])
    BatchOperator.execute()

Output port	Data type	Downstream component
model	Regression model	Lasso Regression Prediction
model information	None	None
feature importance	None	None
linear model weight	None	None