The Imputer Train component trains models to auto-populate missing values. The following imputation policies are supported: MEAN, MIN, MAX, and VALUE.
Limits
The supported compute engines are MaxCompute and Flink.
Introduction
This Imputer Train component trains models to auto-populate missing values. The following imputation policies are supported: MEAN, MIN, MAX, and VALUE. If the VALUE imputation policy is specified, the fillValue parameter is available and required.
Configure the component in Machine Learning Designer
Input ports
Input port (from left to right) | Data type | Recommended upstream component | Required |
data | Structured data stored in MaxCompute or Object Storage Service (OSS) | Yes |
Component parameters
Tab | Parameter | Description |
Field Setting | selectedCols | The names of the numeric columns for which you want to populate missing values. |
Parameter Setting | fillValue | The custom value that you want the system to populate to missing values. This parameter is available and required only if you set the strategy parameter to VALUE. |
strategy | The policy that is used to populate missing values. Default value: MEAN. Valid values:
| |
Execution Tuning | Number of Workers | The number of workers. This parameter must be used together with the Memory per worker, unit MB parameter. The value of this parameter must be a positive integer. Valid values: [1,9999]. |
Memory per worker, unit MB | The memory size of each worker. Valid values: 1024 to 65536. Unit: MB. |
Output ports
Output port (from left to right) | Recommended downstream component | Model type |
model | Imputer Model |
Example
You can copy the following code to the code editor of the PyAlink Script component. This allows the PyAlink Script component to function like the Imputer Train component.
from pyalink.alink import *
def main(sources, sinks, parameter):
data = sources[0]
selectedColNames = ["col2", "col3"]
trainOp = ImputerTrainBatchOp()\
.setSelectedCols(selectedColNames)
result = trainOp.linkFrom(data)
result.link(sinks[0])
BatchOperator.execute()