All Products
Search
Document Center

Platform For AI:Normalization

Last Updated:Mar 09, 2026

Normalize feature columns to improve model training efficiency and accuracy.

Component configuration

Configure Normalization component parameters using either method:

Method 1: Configure the component in the GUI

Configure component parameters on the Designer workflow page.

Tab

Parameter

Description

Fields setting

Select all by default

All columns are selected by default. Extra columns do not affect prediction result.

Keep original columns

Processed columns are prefixed with "stdized_". Supports columns of DOUBLE and BIGINT types.

Execution tuning

Number of computing cores

System automatically allocates the number of instances for training based on input data volume.

Memory size per core

System automatically allocates memory based on input data volume. Unit: MB.

Method 2: Use PAI commands

Use PAI commands to configure component parameters. Use the SQL Script component to call PAI commands. For more information, see SQL Script.

  • Command for dense data

    PAI -name Normalize
        -project algo_public
        -DkeepOriginal="true"
        -DoutputTableName="test_4"
        -DinputTablePartitions="pt=20150501"
        -DinputTableName="bank_data_partition"
        -DselectedColNames="emp_var_rate,euribor3m"
  • Command for sparse data

    PAI -name Normalize
        -project projectxlib4
        -DkeepOriginal="true"
        -DoutputTableName="kv_norm_output"
        -DinputTableName=kv_norm_test
        -DselectedColNames="f0,f1,f2"
        -DenableSparse=true
        -DoutputParaTableName=kv_norm_model
        -DkvIndices=1,2,8,6
        -DitemDelimiter=",";

Parameter Name

Required

Description

Default value

inputTableName

Yes

Input table name.

None

selectedColNames

No

Columns in the input table used for training. Separate column names with commas (,). Supports INT and DOUBLE types. If input data is in sparse format, also supports STRING type.

All columns

inputTablePartitions

No

Partitions in the input table used for training. Supported formats:

  • Partition_name=value

  • name1=value1/name2=value2: multi-level format

Note

Separate multiple partitions with commas (,).

All partitions

outputTableName

Yes

Output table name.

None

outputParaTableName

No

Name of the output parameter table.

Defaults to non-partitioned table.

inputParaTableName

Yes

Name of the input parameter table.

None

keepOriginal

No

Retain original column:

  • true: Processed columns are prefixed with "normalized_" and original columns are kept.

  • false: Columns are not renamed.

false

lifecycle

No

Output table lifecycle. Valid range: 1 to 3650.

None

coreNum

No

Number of cores for computing. Valid values: positive integer.

System auto-allocated.

memSizePerCore

No

Memory size per core in MB. Valid range: 1 to 65536.

System auto-allocated.

enableSparse

No

Enable sparse support. Valid values:

  • true

  • false

false

itemDelimiter

No

Separator between key-value pairs.

default

kvDelimiter

No

Separator between a key and its value.

default

kvIndices

No

Indexes of features that require normalization in the key-value table.

None

Example

  • Generate data

    drop table if exists normalize_test_input;
    create table normalize_test_input(
        col_string string,
        col_bigint bigint,
        col_double double,
        col_boolean boolean,
        col_datetime datetime);
    insert overwrite table normalize_test_input
    select
        *
    from
    (
        select
            '01' as col_string,
            10 as col_bigint,
            10.1 as col_double,
            True as col_boolean,
            cast('2016-07-01 10:00:00' as datetime) as col_datetime
        union all
            select
                cast(null as string) as col_string,
                11 as col_bigint,
                10.2 as col_double,
                False as col_boolean,
                cast('2016-07-02 10:00:00' as datetime) as col_datetime
        union all
            select
                '02' as col_string,
                cast(null as bigint) as col_bigint,
                10.3 as col_double,
                True as col_boolean,
                cast('2016-07-03 10:00:00' as datetime) as col_datetime
        union all
            select
                '03' as col_string,
                12 as col_bigint,
                cast(null as double) as col_double,
                False as col_boolean,
                cast('2016-07-04 10:00:00' as datetime) as col_datetime
        union all
            select
                '04' as col_string,
                13 as col_bigint,
                10.4 as col_double,
                cast(null as boolean) as col_boolean,
                cast('2016-07-05 10:00:00' as datetime) as col_datetime
        union all
            select
                '05' as col_string,
                14 as col_bigint,
                10.5 as col_double,
                True as col_boolean,
                cast(null as datetime) as col_datetime
    ) tmp;
  • PAI command

    drop table if exists normalize_test_input_output;
    drop table if exists normalize_test_input_model_output;
    PAI -name Normalize
        -project algo_public
        -DoutputParaTableName="normalize_test_input_model_output"
        -Dlifecycle="28"
        -DoutputTableName="normalize_test_input_output"
        -DinputTableName="normalize_test_input"
        -DselectedColNames="col_double,col_bigint"
        -DkeepOriginal="true";
    drop table if exists normalize_test_input_output_using_model;
    drop table if exists normalize_test_input_output_using_model_model_output;
    PAI -name Normalize
        -project algo_public
        -DoutputParaTableName="normalize_test_input_output_using_model_model_output"
        -DinputParaTableName="normalize_test_input_model_output"
        -Dlifecycle="28"
        -DoutputTableName="normalize_test_input_output_using_model"
        -DinputTableName="normalize_test_input";
  • Input

    normalize_test_input

    col_string

    col_bigint

    col_double

    col_boolean

    col_datetime

    01

    10

    10.1

    true

    2016-07-01 10:00:00

    NULL

    11

    10.2

    false

    2016-07-02 10:00:00

    02

    NULL

    10.3

    true

    2016-07-03 10:00:00

    03

    12

    NULL

    false

    2016-07-04 10:00:00

    04

    13

    10.4

    NULL

    2016-07-05 10:00:00

    05

    14

    10.5

    true

    NULL

  • Outputs

    • normalize_test_input_output

      col_string

      col_bigint

      col_double

      col_boolean

      col_datetime

      normalized_col_bigint

      normalized_col_double

      01

      10

      10.1

      true

      2016-07-01 10:00:00

      0.0

      0.0

      NULL

      11

      10.2

      false

      2016-07-02 10:00:00

      0.25

      0.2499999999999989

      02

      NULL

      10.3

      true

      2016-07-03 10:00:00

      NULL

      0.5000000000000022

      03

      12

      NULL

      false

      2016-07-04 10:00:00

      0.5

      NULL

      04

      13

      10.4

      NULL

      2016-07-05 10:00:00

      0.75

      0.7500000000000011

      05

      14

      10.5

      true

      NULL

      1.0

      1.0

    • normalize_test_input_model_output

      feature

      json

      col_bigint

      {"name": "normalize", "type":"bigint", "paras":{"min":10, "max": 14}}

      col_double

      {"name": "normalize", "type":"double", "paras":{"min":10.1, "max": 10.5}}

    • normalize_test_input_output_using_model

      col_string

      col_bigint

      col_double

      col_boolean

      col_datetime

      01

      0.0

      0.0

      true

      2016-07-01 10:00:00

      NULL

      0.25

      0.2499999999999989

      false

      2016-07-02 10:00:00

      02

      NULL

      0.5000000000000022

      true

      2016-07-03 10:00:00

      03

      0.5

      NULL

      false

      2016-07-04 10:00:00

      04

      0.75

      0.7500000000000011

      NULL

      2016-07-05 10:00:00

      05

      1.0

      1.0

      true

      NULL

    • normalize_test_input_output_using_model_model_output

      feature

      json

      col_bigint

      {"name": "normalize", "type":"bigint", "paras":{"min":10, "max": 14}}

      col_double

      {"name": "normalize", "type":"double", "paras":{"min":10.1, "max": 10.5}}