The Ridge Regression Prediction component supports sparse and dense data formats. It is used to predict numeric variables, such as housing prices, sales volumes, and humidity. This topic describes how to configure the Ridge Regression Prediction component.
Limits
The supported computing engines are MaxCompute, Flink, or DLC.
Algorithm principle
Ridge Regression is a biased estimation regression method used for the analysis of collinear data. It is an improved version of the least-squares estimation method. It sacrifices the unbiasedness of the least-squares method to obtain more practical and reliable regression coefficients. This trade-off comes at the cost of some information loss and reduced precision but provides a better fit for ill-conditioned data than the standard least-squares method.
Visual configuration of component parameters
-
Input ports
Input port (from left to right)
Data type
Recommended upstream component
Required
Input model for prediction
None
Yes
Input data
None
Yes
-
Component parameters
Tab
Parameter
Description
Field Settings
Reserved Algorithm Column Names
Select the name of the column reserved for the algorithm.
Vector column
The name of the vector column.
Parameter Settings
Prediction result column
The name of the prediction result column.
Number of threads
The number of threads for the component. The default value is 1.
Execution Tuning
Number of workers
Used with the Memory per worker (MB) parameter. The value must be a positive integer from 1 to 9999.
Memory per worker (MB)
The value must be between 1024 MB and 64 × 1024 MB.
Configure the component using code
Copy the following code to a PyAlink Script component to perform the same function as this component.
from pyalink.alink import *
def main(sources, sinks, parameter):
model = sources[0]
batchData = sources[1]
predictor = RidgeRegPredictBatchOp()\
.setPredictionCol("pred")
result = predictor.linkFrom(model, batchData)
result.link(sinks[0])
BatchOperator.execute()