The Feature Scaling component scales numeric data in the dense or sparse format by using common scaling functions.

Overview

The Feature Scaling component has the following characteristics:
  • Supports common scaling functions such as log2, log10, In, abs, and sqrt.
  • Supports data in the dense and sparse formats.

Configure the component

You can configure the component by using one of the following methods:
  • Use the Machine Learning Platform for AI console
    Tab Parameter Description
    Fields Setting Scaled Features The features that you want to scale.
    Label Column If this parameter is specified, the x-y histogram that displays the relationship between the features and objective variables can be visualized.
    Use Sparse Features (K:V,K:V) Specifies whether the training data is in the sparse format.
    Reserve Original Transform Features Specifies whether to prefix new features with scale_.
    Parameters Setting Scaling Function The Feature Scaling component supports the following scaling functions:
    • log2
    • log10
    • ln
    • abs
    • sqrt
  • Use commands
    PAI -name fe_scale_runner -project algo_public
        -Dlifecycle=28
        -DscaleMethod=log2
        -DscaleCols=nr_employed
        -DinputTable=pai_dense_10_1
        -DoutputTable=pai_temp_2262_20380_1;
    Parameter Required Description Default value
    inputTable Yes The name of the input table. N/A
    inputTablePartitions No The partitions that are selected from the input table for training. Set this parameter in the Partition_name=value format.

    To specify multi-level partitions, set this parameter in the name1=value1/name2=value2;.

    If you specify multiple partitions, separate them with commas (,).

    All partitions in the input table
    outputTable Yes The output table after scaling. N/A
    scaleCols Yes The features that you want to scale.

    Sparse features are automatically filtered. You can select only the features of numeric data types.

    N/A
    labelCol No The label column.

    If this field is specified, the x-y histogram that displays the relationship between the features and objective variables can be visualized.

    N/A
    categoryCols No The selected fields that are processed as enumerated features. These fields do not support scaling. ""
    scaleMethod No The method that is used for scaling. Value values:
    • log2
    • log10
    • ln
    • abs
    • sqrt
    log2
    scaleTopN No If you do not set the scaleCols parameter, the system automatically selects the top N features that require scaling. 10
    isSparse No Specifies whether features are sparse features in the key-value format. Dense data
    itemSpliter No The delimiter that is used to separate key-value pairs in the sparse format. ,
    kvSpliter No The delimiter that is used to separate keys and values in the sparse format. :
    lifecycle No The lifecycle of the output table. 7
    coreNum No The number of cores. The value of this parameter must be a positive integer. Valid values: [1,9999]. This parameter must be used with the memSizePerCore parameter. Determined by the system
    memSizePerCore No The memory size of each core. Unit: MB. The value of this parameter must be a positive integer. Valid values: [2048,64 × 1024]. Determined by the system

Example

  • Input data

    Execute the following SQL statements to generate input data:

    create table if not exists pai_dense_10_1 as
    select
        nr_employed
    from bank_data limit 10;
  • Parameter settings

    On the Fields Setting tab, set the Scaled Features parameter to nr_employed. Only the features of the numeric data types are supported. On the Parameters Setting tab, set the Scaling Function parameter to log2.

  • Result
    nr_employed
    12.352071021075528
    12.34313018339218
    12.285286613666395
    12.316026916036957
    12.309533196497519
    12.352071021075528
    12.316026916036957
    12.316026916036957
    12.309533196497519
    12.316026916036957