The Least Absolute Shrinkage and Selection Operator (Lasso) regression algorithm performs compression estimation. Use this component to train models on sparse and dense data, including weighted samples.
Limits
Supported computing engines: MaxCompute, Flink, or DLC.
Algorithm principles
Lasso regression builds a refined model by creating a penalty function. This function shrinks some regression coefficients by forcing the sum of their absolute values to be less than a fixed value, and sets other coefficients to zero. This method retains the benefits of subset shrinkage and provides biased estimation for handling multicollinear data.
Configure component parameters
-
Input ports
Input port
Data type
Recommended upstream component
Required
Training data
None
Yes
Base model
Lasso model
-
Lasso Regression Training
No (for incremental training)
-
Component parameters
Tab
Parameter
Description
Field Settings
Target column name
Name of the target column in the input table.
Feature column array
Cannot be configured if Vector column name is specified.
Names of feature columns used for training.
NoteFeature column array and Vector column name are mutually exclusive. Use only one to specify input features for the algorithm.
Vector column name
Cannot be configured if Feature column array is specified.
Name of the vector column.
NoteFeature column array and Vector column name are mutually exclusive. Use only one to specify input features for the algorithm.
Weight column name
Name of the weight column.
Parameter Settings
Penalty factor: lambda
Coefficient of the regularization term. Data type: DOUBLE.
Convergence threshold
Threshold to determine whether the iterative method has converged. Default value: 1.0E-6.
Learning rate
Controls the speed at which parameters are updated during model training. Default value: 0.1.
Maximum number of iterations
Maximum number of iterations. Default value: 100.
Optimization method
Optimization method used to solve the problem. Valid values:
-
LBFGS
-
GD
-
Newton
-
SGD
-
OWLQN
Execution Tuning
Number of workers
Used with Memory per worker. Must be a positive integer from 1 to 9999.
Memory per worker (MB)
Value ranges from 1024 MB to 64 × 1024 MB.
-
-
Output ports
Output port
Data type
Downstream component
Trained model
Regression model
Model information
None
None
Feature importance
None
None
Linear model weight coefficients
None
None
Configure the component by using code
Copy the following code to a PyAlink Script component to perform the same function.
from pyalink.alink import *
def main(sources, sinks, parameter):
batchData = sources[0]
ridge = LassoRegTrainBatchOp()\
.setLambda(0.1)\
.setFeatureCols(["f0","f1"])\
.setLabelCol("label")
model = batchData.link(ridge)
model.link(sinks[0])
BatchOperator.execute()