Configure data conversion using the Data Conversion Module - Platform For AI

Use the Data Conversion Module to normalize, discretize, index, or perform Weight of Evidence (WOE) conversion on data.

Configure the component

You can configure the parameters for the Data Conversion Module component in one of the following ways.

Method 1: Use the GUI

You can configure the component parameters on the workflow page in Designer.

Tab	Parameter	Description
Fields Setting	Feature columns in input table	The feature columns from the input table. By default, all columns are selected.
	Columns to exclude from conversion	The selected columns are passed through to the output without changes. You can specify a label column here.
	Data conversion type	Supported conversion types include Normalization, Discretization, WOE conversion, and Index.
	Default WOE value	This parameter takes effect only when Data conversion type is set to WOE conversion. If you specify this parameter, this value is used to replace any sample value that falls into a bin without a WOE value. If you do not specify this parameter, the algorithm reports an error when a sample value falls into a bin without a WOE value.
Execution Tuning	Number of cores	The number of CPU cores to use. By default, the system automatically allocates the cores.
Execution Tuning	Memory per core	The amount of memory for each CPU core. By default, the system automatically allocates the memory.

Method 2: Use PAI commands

You can configure the component parameters using PAI commands in the SQL Script component. For more information, see SQL Script.

PAI -name data_transform
-project algo_public
-DinputFeatureTableName=feature_table
-DinputBinTableName=bin_table
-DoutputTableName=output_table
-DmetaColNames=label
-DfeatureColNames=feaname1,feaname2

Parameter	Description	Required	Default value
inputFeatureTableName	The input feature table.	Yes	None
inputBinTableName	The input binning result table.	Yes	None
inputFeatureTablePartitions	The partitions to use from the input feature table.	No	Complete table
outputTableName	The output table.	Yes	None
featureColNames	The feature columns to select from the input table.	No	All columns
metaColNames	The columns that are not converted. The selected columns are passed through to the output without changes. You can specify columns such as the label and sample_id.	No	None
transformType	The type of data conversion. Valid values: normalize: normalization. dummy: discretization. woe: WOE conversion.	No	dummy
itemDelimiter	The feature separator. This parameter is valid only for discretization.	No	Comma (,)
kvDelimiter	The key-value separator. This parameter is valid only for discretization.	No	Colon (:)
lifecycle	The lifecycle of the output table.	No	None
coreNum	The number of CPU cores to use.	No	System-calculated
memSizePerCore	The amount of memory for each CPU core, in MB.	No	System-calculated

Normalization converts variable values to a range between 0 and 1 based on the input binning information. Missing values are filled with 0. The algorithm is as follows.

if feature_raw_value == null or feature_raw_value == 0 then
    feature_norm_value = 0.0
else
    bin_index = FindBin(bin_table, feature_raw_value)
    bin_width = round(1.0 / bin_count * 1000) / 1000.0
    feature_norm_value = 1.0 - (bin_count - bin_index - 1) * bin_width

The output format varies depending on the type of data conversion performed by the Data Conversion Module:

Normalization and WOE conversion output a standard table.
Discretization into dummy variables outputs a table in key-value (KV) format. The generated variables use the format ${feaname}]\_bin\_${bin_id}. For example, for a variable named sns, the generated variables are as follows:
- If sns falls into the second bin, the generated variable is [sns]_bin_2.
- If sns is empty, it falls into the null bin, and the generated variable is [sns]_bin_null.
- If sns is not empty and does not fall into any defined bin, it falls into the else bin, and the generated variable is [sns]_bin_else.