User must specify models trained by using the Min Max Scaler Train component when use the Min Max Scaler Batch Predict component to implement normalized batch prediction on data.
Limits
The supported compute engines are MaxCompute and Realtime Compute for Apache Flink.
Introduction
This component transforms a value into data that falls within the [minValue, maxValue] range by using the following formula: (value - min)/(max - min) × (maxValue - minValue) + minValue. Max indicates the maximum value in the column data, and min indicates the minimum value in the column data.
MinValue and maxValue can be customized. By default, minValue is set to 0 and maxValue to 1.
User must specify a model generated by the Min Max Scaler Train component when use the Min Max Scaler Batch Predict component.
Configure the component in Machine Learning Designer
Input ports
Input port (from left to right) | Data type | Recommended upstream component | Required |
Input model of the prediction | None | Yes | |
Input data of the prediction | None | Yes |
Component parameters
Tab | Parameter | Description |
Parameter Setting | outputCols | Optional. The new column names after normalization. The number of new columns must be the same as that of old columns used in training. Separate multiple values with commas (,). |
numThreads | The number of threads used by the component. Default value: 1. | |
Execution Tuning | Number of Workers | The number of workers. This parameter must be used together with the Memory per worker, unit MB parameter. The value of this parameter must be a positive integer. Valid values: [1,9999]. |
Memory per worker, unit MB | The memory size of each worker. Valid values: 1024 to 65536. Unit: MB. |
Output ports
Output port (from left to right) | Storage location | Recommended downstream component | Model type |
Output result | N/A | None | None |
Example
You can copy the following code to the code editor of the PyAlink Script component. This allows the PyAlink Script component to function like the Min Max Scaler Batch Predict component.
from pyalink.alink import *
def main(sources, sinks, parameter):
model = sources[0]
batchData = sources[1]
predictor = MinMaxScalerPredictBatchOp()
result = predictor.linkFrom(model, batchData)
result.link(sinks[0])
BatchOperator.execute()