The Feature Scaling component scales numeric data in the dense or sparse format by using common scaling functions.
Overview
- Supports common scaling functions such as log2, log10, In, abs, and sqrt.
- Supports data in the dense and sparse formats.
Configure the component
- Use the Machine Learning Platform for AI console
Tab Parameter Description Fields Setting Scaled Features The features that you want to scale. Label Column If this parameter is specified, the x-y histogram that displays the relationship between the features and objective variables can be visualized. Use Sparse Features (K:V,K:V) Specifies whether the training data is in the sparse format. Reserve Original Transform Features Specifies whether to prefix new features with scale_. Parameters Setting Scaling Function The Feature Scaling component supports the following scaling functions: - log2
- log10
- ln
- abs
- sqrt
- Use commands
PAI -name fe_scale_runner -project algo_public -Dlifecycle=28 -DscaleMethod=log2 -DscaleCols=nr_employed -DinputTable=pai_dense_10_1 -DoutputTable=pai_temp_2262_20380_1;
Parameter Required Description Default value inputTable Yes The name of the input table. N/A inputTablePartitions No The partitions that are selected from the input table for training. Set this parameter in the Partition_name=value
format.To specify multi-level partitions, set this parameter in the
name1=value1/name2=value2;
.If you specify multiple partitions, separate them with commas (,).
All partitions in the input table outputTable Yes The output table after scaling. N/A scaleCols Yes The features that you want to scale. Sparse features are automatically filtered. You can select only the features of numeric data types.
N/A labelCol No The label column. If this field is specified, the x-y histogram that displays the relationship between the features and objective variables can be visualized.
N/A categoryCols No The selected fields that are processed as enumerated features. These fields do not support scaling. "" scaleMethod No The method that is used for scaling. Value values: - log2
- log10
- ln
- abs
- sqrt
log2 scaleTopN No If you do not set the scaleCols parameter, the system automatically selects the top N features that require scaling. 10 isSparse No Specifies whether features are sparse features in the key-value format. Dense data itemSpliter No The delimiter that is used to separate key-value pairs in the sparse format. , kvSpliter No The delimiter that is used to separate keys and values in the sparse format. : lifecycle No The lifecycle of the output table. 7 coreNum No The number of cores. The value of this parameter must be a positive integer. Valid values: [1,9999]. This parameter must be used with the memSizePerCore parameter. Determined by the system memSizePerCore No The memory size of each core. Unit: MB. The value of this parameter must be a positive integer. Valid values: [2048,64 × 1024]. Determined by the system
Example
- Input data
Execute the following SQL statements to generate input data:
create table if not exists pai_dense_10_1 as select nr_employed from bank_data limit 10;
- Parameter settings
On the Fields Setting tab, set the Scaled Features parameter to nr_employed. Only the features of the numeric data types are supported. On the Parameters Setting tab, set the Scaling Function parameter to log2.
- Result
nr_employed 12.352071021075528 12.34313018339218 12.285286613666395 12.316026916036957 12.309533196497519 12.352071021075528 12.316026916036957 12.316026916036957 12.309533196497519 12.316026916036957