The Least Absolute Shrinkage and Selection Operator (LASSO) regression algorithm can implement compression estimation. The Lasso Regression Training component is developed based on the LASSO algorithm. The component supports sparse and dense data and allows weighted data samples to be used for training. This topic describes how to configure the Lasso Regression Training component.
Limits
The supported computing engines are MaxCompute and Apache Flink.
How LASSO works
LASSO creates a penalty function to obtain a more refined model. LASSO can shrink some regression coefficients and set specific regression coefficients to zero. If a coefficient is shrunk, the sum of the absolute values of the coefficient is less than a fixed value. This way, LASSO retains the beneficial features of subset shrinkage and implements biased estimation on multicollinearity data.
Configure the component in the PAI console
Input ports
Input port (left-to-right)
Data type
Recommended upstream component
Required
data
None
Yes
model
LASSO model (for incremental training)
Read Table (for reading model data)
Lasso Regression Training
No
Component parameters
Tab Parameter Description Field Setting labelCol The name of the label column in the input table. featureCols If you have set the vectorCol parameter, this parameter cannot be set. The feature columns that are used for training.Note The featureCols and vectorCol parameters are mutually exclusive. You can use only one of them to describe the input features of the algorithm.vectorCol If you have set the featureCols parameter, this parameter cannot be set. The name of the vector column.Note The featureCols and vectorCol parameters are mutually exclusive. You can use only one of them to describe the input features of the algorithm.weightCol The name of the weight column. Parameters Setting lambda The DOUBLE-typed regularization coefficient. epsilon The value that you expect to obtain from the training results before the iteration stops. Default value: 1.0E-6. maxIter The maximum number of iterations. Default value: 100. optimMethod The optimization method used to improve problem-solving. Valid values: - LBFGS
- GD
- Newton
- SGD
- OWLQN
Execution Tuning Number of Workers The number of workers. This parameter must be used together with the Memory per worker, unit MB parameter. The value of this parameter must be a positive integer. Valid values: [1,9999]. Memory per worker, unit MB The memory size of each worker. Valid values: 1024 to 64 × 1024. Unit: MB. Output ports
Output port
Data type
Downstream component
model
Regression model
model information
None
None
feature importance
None
None
linear model weight
None
None
Configure the component by coding
You can copy the following code to the code editor of the PyAlink Script component. This allows the PyAlink Script component to function like the Lasso Regression Training component.
from pyalink.alink import *
def main(sources, sinks, parameter):
batchData = sources[0]
ridge = LassoRegTrainBatchOp()\
.setLambda(0.1)\
.setFeatureCols(["f0","f1"])\
.setLabelCol("label")
model = batchData.link(ridge)
model.link(sinks[0])
BatchOperator.execute()