The Feature Scaling component can scale dense or sparse numeric data by using common scaling functions.

Overview

The Feature Scaling component has the following characteristics:
  • Supports common scaling functions such as log2, log10, In, abs, and sqrt.
  • Supports dense and sparse data.

Configure the component

You can use one of the following methods to configure the Feature Scaling component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Feature Scaling component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
TabParameterDescription
Fields SettingScaled FeaturesThe features that you want to scale.
Label ColumnThe label column. If this parameter is specified, the x-y histogram that displays the relationship between the features and the objective variables can be viewed.
Sparse Features (K:V,K:V)Specifies whether the training data is sparse. If the data is sparse, a single field contains all the data instead of a single data record.
Reserve Converted FeaturesSpecifies whether to prefix new features with scale_.
Parameters SettingScaling FunctionThe Feature Scaling component supports the following scaling functions:
  • log2
  • log10
  • ln
  • abs
  • sqrt

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name fe_scale_runner -project algo_public
    -Dlifecycle=28
    -DscaleMethod=log2
    -DscaleCols=nr_employed
    -DinputTable=pai_dense_10_1
    -DoutputTable=pai_temp_2262_20380_1;
ParameterRequiredDescriptionDefault value
inputTableYesThe name of the input table. None
inputTablePartitionsNoThe partitions that are selected from the input table for training. Set this parameter in the Partition_name=value format.

To specify multi-level partitions, set this parameter in the name1=value1/name2=value2; format.

If you specify multiple partitions, separate them with commas (,).

All partitions in the input table
outputTableYesThe output table after scaling. None
scaleColsYesThe features that you want to scale.

Sparse features are automatically displayed. You can select only the features of numeric data types.

None
labelColNoThe label column.

If this parameter is specified, the x-y histogram that displays the relationship between the features and the objective variables can be viewed.

None
categoryColsNoThe selected fields that are processed as enumerated features. These fields do not support scaling. ""
scaleMethodNoThe method that is used for scaling. Value values:
  • log2
  • log10
  • ln
  • abs
  • sqrt
log2
scaleTopNNoIf you do not set the scaleCols parameter, the system automatically selects the top N features that require scaling. 10
isSparseNoSpecifies whether features are sparse features in the key-value format. Dense data
itemSpliterNoThe delimiter that is used to separate sparse key-value pairs. ,
kvSpliterNoThe delimiter that is used to separate sparse keys and values. :
lifecycleNoThe lifecycle of the output table. 7
coreNumNoThe number of cores. The value of this parameter must be a positive integer. Valid values: [1,9999]. This parameter must be used together with the memSizePerCore parameter. Determined by the system
memSizePerCoreNoThe memory size of each core. Unit: MB. The value of this parameter must be a positive integer. Valid values: [2048,64 × 1024]. Determined by the system

Examples

  • Input data

    Execute the following SQL statements to generate input data:

    create table if not exists pai_dense_10_1 as
    select
        nr_employed
    from bank_data limit 10;
  • Parameter settings
    On the Fields Setting tab, set the Scaled Features parameter to nr_employed. Only the features of the numeric data types are supported. On the Parameters Setting tab, set the Scaling Function parameter to log2, as shown in the following figure. Parameter settings
  • Results
    nr_employed
    12.352071021075528
    12.34313018339218
    12.285286613666395
    12.316026916036957
    12.309533196497519
    12.352071021075528
    12.316026916036957
    12.316026916036957
    12.309533196497519
    12.316026916036957