The Data Pivoting component provided by Machine Learning Designer allows you to view the distributions of feature values, feature columns, and label columns. This facilitates follow-up data analysis. This component supports both sparse and dense data formats. This topic describes how to configure the component and provides an example on how to use the component.

Configure the component

You can use one of the following methods to configure the Data Pivoting component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Data Pivoting component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
TabParameterDescription
Fields SettingFeature ColumnsThe columns that represent the features of data in training samples.
Target ColumnThe column that you want to use for training.
Enumeration FeaturesThe features that you want to use as enumeration features.
Sparse Format (K:V,K:V)Specifies whether data in the sparse format is used.
Parameters SettingContinuous Feature Discretization IntervalsThe maximum number of intervals for the equal-distance division of continuous features.
TuningCoresThe number of cores used in computing. The value must be a positive integer.
Memory Size per CoreThe memory size of each core. Valid values: 1 to 65536. Unit: MB.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI
-name fe_meta_runner
-project algo_public
-DinputTable="pai_dense_10_10"
-DoutputTable="pai_temp_2263_20384_1"
-DmapTable="pai_temp_2263_20384_2"
-DselectedCols="pdays,previous,emp_var_rate,cons_price_idx,cons_conf_idx,euribor3m,nr_employed,age,campaign,poutcome"
-DlabelCol="y"
-DcategoryCols="previous"
-Dlifecycle="28"-DmaxBins="5" ;
ParameterRequiredDescriptionDefault value
inputTableYesThe name of the input table. None
inputTablePartitionsNoThe partitions that are selected from the input table for training. Valid values:
  • Partition_name=value
  • name1=value1/name2=value2: multi-level partitions
Note If you specify multiple partitions, separate them with commas (,).
None
outputTableYesThe name of the output table. None
mapTableYesThe output mapping table. The Data Pivoting component maps STRING-type data to INT-type data for PAI to use for training.None
selectedColsYesThe columns that are selected from the input table. None
labelColNoThe column that you want to use for training. None
categoryColsNoThe INT- or DOUBLE-type columns that you want to use as enumeration features. None
maxBinsNoThe maximum number of intervals for the equal-distance division of continuous features. 100
isSparseNoSpecifies whether the input data is sparse. Valid values: true and false. false
itemSpliterNoThe delimiter that is used to separate key-value pairs if data in the input table is in the sparse format. ,
kvSpliterNoThe delimiter that is used to separate keys and values if data in the input table is in the sparse format. :
lifecycleNoThe lifecycle of the output table. 28
coreNumNoThe number of cores used in computing. The value must be a positive integer. Valid values: 1 to 9999. Determined by the system
memSizePerCoreNoThe memory size of each core. Valid values: 1 to 65536. Unit: MB. Determined by the system

Examples

  • Input data
    ageworkclassfwlghteduedu_nummarriedcfamilyracesexgaillosswork_yearcountryincome
    39State-gov77516Bachelors13Never-marriedAdm-clericalNot-in-familyWhiteMale2174.00.040.0United-States<=50K
    50Self-emp-not-inc83311Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale0.00.013.0United-States<=50K
    38Private215646HS-grad9DivorcedHandlers-cleanersNot-in-familyWhiteMale0.00.040.0United-States<=50K
    53Private23472111th7Married-civ-spouseHandlers-cleanersHusbandBlackMale0.00.040.0United-States<=50K
    28Private338409Bachelors13Married-civ-spouseProf-specialtyWifeBlackFemale0.00.040.0Other<=50K
    37Private284582Masters14Married-civ-spouseExec-managerialWifeWhiteFemale0.00.040.0United-States<=50K
    49Private1601879th5Married-spouse-absentOther-serviceNot-in-familyBlackFemale0.00.016.0Jamaica<=50K
    52Self-emp-not-inc209642HS-grad9Married-civ-spouseExec-managerialHusbandWhiteMale0.00.045.0United-States>50K
    31Private45781Masters14Never-marriedProf-specialtyNot-in-familyWhiteFemale14084.00.050.0United-States>50K
    42Private159449Bachelors13Married-civ-spouseExec-managerialHusbandWhiteMale5178.00.040.0United-States>50K
  • ModelingModeling
    Click the Data Pivoting component and then click the Fields Setting tab. Set the Target Column parameter to income and specify the other 14 columns for the Feature Columns parameter. The BIGINT-type values in the edu_num column are used as enumeration values. Configuration of the Data Pivoting compunent
  • Result
    • Right-click Data Pivoting and choose View Data > Output Port. The values in the family, race, sex, and income columns of the STRING data type are converted into numeric values for PAI to use for training. This is similar to data format conversion. Output data
    • Right-click Data Pivoting and choose View Data > String Column Feature Mapping Table.
      Note If you do not specify STRING-type data for the Feature Columns parameter, the String Column Feature Mapping Table parameter is left empty in the output.
      Mapping table
    • Right-click Data Pivoting and choose View Data > Output Meta Table. Output the meta tabledistribute_info indicates the number of records in each interval based on the uniform distribution between the maximum value and the minimum value.