Standard Scaler Batch Predict is a machine learning algorithm used for data preprocessing, aimed at standardizing batch data to mitigate the impact of different scales and ranges across columns. The algorithm assumes that the data follows a normal distribution and standardizes it using the mean and variance, mapping the data from different columns to the same range. This process enhances the stability and accuracy of model training and prediction. Standard Scaler Batch Predict is particularly effective when handling large-scale datasets, ensuring consistent data distribution.
Limits
The supported compute engines are MaxCompute and Realtime Compute for Apache Flink.
Configure the component in Machine Learning Designer
Input ports
Input port (from left to right) | Data type | Recommended upstream component | Required |
Input model of the prediction | None | Yes | |
Input data of the prediction | Numeric Type | Yes |
Component parameters
Tab | Parameter | Description |
Parameter Setting | outputCols | Optional. The names of the output columns. By default, the generated prediction result columns replace the original input columns. As such, you must set the number of output columns to a value that is the same as the number of columns selected for training. Separate multiple columns with commas (,). |
numThreads | The number of threads used by this component. Default value: 1. | |
Execution Tuning | Number of Workers | The number of workers. This parameter must be used together with the Memory per worker, unit MB parameter. The value of this parameter must be a positive integer. Valid values: [1,9999]. |
Memory per worker, unit MB | The memory size of each worker. Valid values: 1024 to 65536. Unit: MB. |
Output ports
Output port (from left to right) | Storage location | Recommended downstream component | Model type |
Output result | N/A | None | None |
Example
You can copy the following code to the code editor of the PyAlink Script component. This allows the PyAlink Script component to function like the Standard Scaler Batch Predict component.
from pyalink.alink import *
def main(sources, sinks, parameter):
model = sources[0]
batchData = sources[1]
predictor = StandardScalerPredictBatchOp()
result = predictor.linkFrom(model, batchData)
result.link(sinks[0])
BatchOperator.execute()