All Products
Search
Document Center

Platform For AI:Imputer Train

Last Updated:Mar 08, 2024

The Imputer Train component trains models to auto-populate missing values. The following imputation policies are supported: MEAN, MIN, MAX, and VALUE.

Limits

The supported compute engines are MaxCompute and Flink.

Introduction

This Imputer Train component trains models to auto-populate missing values. The following imputation policies are supported: MEAN, MIN, MAX, and VALUE. If the VALUE imputation policy is specified, the fillValue parameter is available and required.

Configure the component in Machine Learning Designer

Input ports

Input port (from left to right)

Data type

Recommended upstream component

Required

data

Structured data stored in MaxCompute or Object Storage Service (OSS)

Read Table

Read File Data

Yes

Component parameters

Tab

Parameter

Description

Field Setting

selectedCols

The names of the numeric columns for which you want to populate missing values.

Parameter Setting

fillValue

The custom value that you want the system to populate to missing values. This parameter is available and required only if you set the strategy parameter to VALUE.

strategy

The policy that is used to populate missing values. Default value: MEAN. Valid values:

  • MEAN: The system populates missing values with the mean value.

  • MIN: The system populates missing values with the minimum value.

  • MAX: The system populates missing values with the maximum value.

  • VALUE: The system populates missing values with the custom value.

Execution Tuning

Number of Workers

The number of workers. This parameter must be used together with the Memory per worker, unit MB parameter. The value of this parameter must be a positive integer. Valid values: [1,9999].

Memory per worker, unit MB

The memory size of each worker. Valid values: 1024 to 65536. Unit: MB.

Output ports

Output port (from left to right)

Recommended downstream component

Model type

model

Imputer Predict

Imputer Model

Example

You can copy the following code to the code editor of the PyAlink Script component. This allows the PyAlink Script component to function like the Imputer Train component.

from pyalink.alink import *

def main(sources, sinks, parameter):
    data = sources[0]
    selectedColNames = ["col2", "col3"]
    trainOp = ImputerTrainBatchOp()\
               .setSelectedCols(selectedColNames)
    result = trainOp.linkFrom(data)
    result.link(sinks[0])
		BatchOperator.execute()