All Products
Search
Document Center

Platform For AI:Data conversion

Last Updated:Mar 09, 2026

Use the Data Conversion Module to normalize, discretize, index, or perform Weight of Evidence (WOE) conversion on data.

Configure the component

You can configure the parameters for the Data Conversion Module component in one of the following ways.

Method 1: Use the GUI

You can configure the component parameters on the workflow page in Designer.

Tab

Parameter

Description

Fields Setting

Feature columns in input table

The feature columns from the input table. By default, all columns are selected.

Columns to exclude from conversion

The selected columns are passed through to the output without changes. You can specify a label column here.

Data conversion type

Supported conversion types include Normalization, Discretization, WOE conversion, and Index.

Default WOE value

This parameter takes effect only when Data conversion type is set to WOE conversion.

If you specify this parameter, this value is used to replace any sample value that falls into a bin without a WOE value. If you do not specify this parameter, the algorithm reports an error when a sample value falls into a bin without a WOE value.

Execution Tuning

Number of cores

The number of CPU cores to use. By default, the system automatically allocates the cores.

Memory per core

The amount of memory for each CPU core. By default, the system automatically allocates the memory.

Method 2: Use PAI commands

You can configure the component parameters using PAI commands in the SQL Script component. For more information, see SQL Script.

PAI -name data_transform
-project algo_public
-DinputFeatureTableName=feature_table
-DinputBinTableName=bin_table
-DoutputTableName=output_table
-DmetaColNames=label
-DfeatureColNames=feaname1,feaname2

Parameter

Description

Required

Default value

inputFeatureTableName

The input feature table.

Yes

None

inputBinTableName

The input binning result table.

Yes

None

inputFeatureTablePartitions

The partitions to use from the input feature table.

No

Complete table

outputTableName

The output table.

Yes

None

featureColNames

The feature columns to select from the input table.

No

All columns

metaColNames

The columns that are not converted. The selected columns are passed through to the output without changes. You can specify columns such as the label and sample_id.

No

None

transformType

The type of data conversion. Valid values:

  • normalize: normalization.

  • dummy: discretization.

  • woe: WOE conversion.

No

dummy

itemDelimiter

The feature separator. This parameter is valid only for discretization.

No

Comma (,)

kvDelimiter

The key-value separator. This parameter is valid only for discretization.

No

Colon (:)

lifecycle

The lifecycle of the output table.

No

None

coreNum

The number of CPU cores to use.

No

System-calculated

memSizePerCore

The amount of memory for each CPU core, in MB.

No

System-calculated

Normalization converts variable values to a range between 0 and 1 based on the input binning information. Missing values are filled with 0. The algorithm is as follows.

if feature_raw_value == null or feature_raw_value == 0 then
    feature_norm_value = 0.0
else
    bin_index = FindBin(bin_table, feature_raw_value)
    bin_width = round(1.0 / bin_count * 1000) / 1000.0
    feature_norm_value = 1.0 - (bin_count - bin_index - 1) * bin_width

The output format varies depending on the type of data conversion performed by the Data Conversion Module:

  • Normalization and WOE conversion output a standard table.

  • Discretization into dummy variables outputs a table in key-value (KV) format. The generated variables use the format ${feaname}]\_bin\_${bin_id}. For example, for a variable named sns, the generated variables are as follows:

    • If sns falls into the second bin, the generated variable is [sns]_bin_2.

    • If sns is empty, it falls into the null bin, and the generated variable is [sns]_bin_null.

    • If sns is not empty and does not fall into any defined bin, it falls into the else bin, and the generated variable is [sns]_bin_else.