The Data Pivoting component provided by Machine Learning Studio allows you to view the distributions of feature values, feature columns, and label columns. This facilitates future data analysis. This component supports both sparse and dense data formats. This topic describes how to configure the component and provides an example on how to use the component.

Configure the component

You can configure the component by using one of the following methods:

  • Machine Learning Platform for AI (PAI) console
    Tab Parameter Description
    Fields Setting Feature Columns The columns that indicate sample data features.
    Target Column The column that you want to use for training.
    Enumeration Features The features that you want to use as enumerated features.
    Sparse Format (K:V,K:V) Specifies whether data in the sparse format is used.
    Parameters Setting Continuous Feature Discretization Intervals The maximum number of intervals for the equal-distance division of continuous features.
    Tuning Cores The number of cores that you want to use for computing. The value of this parameter must be a positive integer.
    Memory Size per Core The memory size of each core. Valid values: 1 to 65536. Unit: MB.
  • PAI command
    PAI
    -name fe_meta_runner
    -project algo_public
    -DinputTable="pai_dense_10_10"
    -DoutputTable="pai_temp_2263_20384_1"
    -DmapTable="pai_temp_2263_20384_2"
    -DselectedCols="pdays,previous,emp_var_rate,cons_price_idx,cons_conf_idx,euribor3m,nr_employed,age,campaign,poutcome"
    -DlabelCol="y"
    -DcategoryCols="previous"
    -Dlifecycle="28"-DmaxBins="5" ;
    Parameter Required Description Default value
    inputTable Yes The name of the input table. No default value
    inputTablePartitions Yes The partitions that you want to select from the input table for training. Specify this parameter in one of the following formats:
    • Partition_name=value
    • Multi-level partition: name1=value1/name2=value2
    Note If you specify multiple partitions, separate them with commas (,).
    No default value
    outputTable Yes The name of the output table. No default value
    mapTable Yes The output mapping table. The Data Pivoting component maps STRING-type data to INT-type data for PAI to use for training. No default value
    selectedCols Yes The columns that you want to select from the input table. No default value
    categoryCols No The INT- or DOUBLE-type columns that you want to use as enumerated features. No default value
    maxBins No The maximum number of intervals for the equal-distance division of continuous features. 100
    isSparse No Specifies whether the input data is in the sparse format. Valid values: true and false. false
    itemSpliter No The delimiter that you want to use to separate key-value pairs when data in the input table is in the sparse format. ,
    kvSpliter No The delimiter that you want to use to separate keys and values when data in the input table is in the sparse format. :
    lifecycle No The lifecycle of the output table. 28
    coreNum No The number of cores that you want to use for computing. The value of this parameter must be a positive integer. Valid values: 1 to 9999. Automatically allocated
    memSizePerCore No The memory size of each core. Valid values: 1 to 65536. Unit: MB. Automatically allocated