All Products
Search
Document Center

Platform For AI:Min Max Scaler Train

Last Updated:Mar 08, 2024

To reduce the effect caused by volume and range of column data in data preprocessing, you can normalize the column data. After normalization, data in different columns falls within the same value range.

Limits

The supported computing engines are MaxCompute and Apache Flink.

Introduction

This component transforms a value into one that is in the closed interval of minValue and maxValue by using the following formula: (value - min)/(max - min) × (maxValue - minValue) + minValue. max and min indicate the maximum and minimum values of the column data.

minValue and maxValue can be customized. By default, minValue is set to 0 and maxValue to 1.

After you run this component, a min-max normalization model is generated. You can specify the model for the Normalization component to use.

Configure the component in Machine Learning Designer

Input ports

Input port (from left to right)

Data type

Recommended upstream component

Required

data

Integer

None

Yes

Component parameters

Tab

Parameter

Description

Field Setting

selectedCols

The columns that you want to process. Only columns of the numeric type can be selected.

Parameter Setting

max

The upper limit of the value range. The value must be of the DOUBLE type. Default value: 1.0.

min

The lower limit of the value range. The value must be of the DOUBLE type. Default value: 0.0.

Execution Tuning

Number of Workers

The number of workers. This parameter must be used together with the Memory per worker, unit MB parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].

Memory per worker, unit MB

The memory size of each worker. Valid values: 1024 to 65536 (64 × 1024). Unit: MB.

Output ports

Output port (from left to right)

Storage

Recommended downstream component

Model type

model

N/A

Min Max Scaler Batch Predict

None

Example

You can copy the following code to the code editor of the PyAlink Script component. This allows the PyAlink Script component to function like this component.

from pyalink.alink import *

def main(sources, sinks, parameter):
    data = sources[0]
    selectedColNames = ["col2", "col3"]
    trainOp = MinMaxScalerTrainBatchOp()\
               .setSelectedCols(selectedColNames)
    result = trainOp.linkFrom(data)
    result.link(sinks[0])
		BatchOperator.execute()