Ridge Regression (Tikhonov regularization) is a regularization method for regression analysis of ill-posed problems. Supports sparse and dense data formats with weighted samples.
Limits
Supported computing engines: MaxCompute, Flink, or DLC.
Algorithm principle
Ridge Regression is a biased estimation regression method for analyzing collinear data. It improves upon least-squares estimation by sacrificing unbiasedness to obtain more practical and reliable regression coefficients. This trade-off incurs some information loss and reduced precision but provides a better fit for ill-conditioned data than standard least-squares methods.
Configure parameters visually
-
Input port
Port (from left to right)
Data type
Recommended upstream components
Required
Data
None
Yes
Model
None
No
-
Parameters
Tab
Parameter
Description
Field Settings
Target column name
Name of the target column in the input table.
Feature column array
Cannot be configured if Vector column name is specified.
Names of feature columns used for training.
NoteFeature column array and Vector column name are mutually exclusive. Use only one to specify input features for the algorithm.
Vector column name
Cannot be configured if Feature column array is specified.
Name of the vector column.
NoteFeature column array and Vector column name are mutually exclusive. Use only one to specify input features for the algorithm.
Weight column name
Name of the weight column.
Parameter Settings
Penalty factor: lambda
Coefficient of the regularization term. Data type: DOUBLE.
Convergence threshold
Threshold to determine whether the iterative method has converged. Default value: 1.0E-6.
Learning rate
Controls the speed at which parameters are updated during model training. Default value: 0.1.
Maximum number of iterations
Maximum number of iterations. Default value: 100.
Optimization method
Optimization method used to solve the problem. Valid values:
-
LBFGS
-
GD
-
Newton
-
SGD
-
OWLQN
Execution Tuning
Number of workers
Used with Memory per worker. Must be a positive integer from 1 to 9999.
Memory per worker (MB)
Value ranges from 1024 MB to 64 × 1024 MB.
-
-
Output ports
Port (from left to right)
Data type
Downstream components
Model
Regression model
Model information
None
None
Feature importance
None
None
Linear model weight coefficients
None
None
Configure using code
Copy the following code into a PyAlink Script component to achieve the same functionality.
from pyalink.alink import *
def main(sources, sinks, parameter):
batchData = sources[0]
ridge = RidgeRegTrainBatchOp()\
.setLambda(0.1)\
.setFeatureCols(["f0","f1"])\
.setLabelCol("label")
model = batchData.link(ridge)
model.link(sinks[0])
BatchOperator.execute()