Principal component analysis (PCA) is a multivariate statistical method used to explore the internal structures of multiple variables and how they correlate to each other based on a few principal components. You can use PCA to export a few principal components that are unrelated to each other from original variables. These principal components retain as much information about the original variables as possible and are used as new comprehensive metrics.

Limits

The Principal Component Analysis (PCA) component supports only data in the dense format. You can use the component for dimensionality and noise reduction.

Configure the component

You can use one of the following methods to configure the Principal Component Analysis (PCA) component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Principal Component Analysis (PCA) component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
TabParameterDescription
Fields SettingFeature ColumnsThe columns that are selected from the input table for analysis.
Appended ColumnsThe columns that are appended to the table after dimensionality reduction.
Parameters SettingData Size RatioThe information retaining ratio after dimensionality reduction.
Feature Decomposition ModeThe method that is used to decompose features. Valid values:
  • CORR
  • COVAR_SAMP
  • COVAR_POP
Data Conversion MethodThe method that is used to convert data types. Valid values:
  • Simple
  • Sub-Mean
  • Normalization
TuningLifecycleThe lifecycle of the output table. The value must be a positive integer.
CoresThe number of cores. This parameter is used with the Memory Size per Node (Unit: MB) parameter. The value must be a positive integer. Valid values: [1,9999].
Memory Size per Node (Unit: MB)Unit: MB. The memory size of each core. The value must be a positive integer. Valid values: [1024,64 × 1024].

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name PrinCompAnalysis
    -project algo_public
    -DinputTableName=bank_data
    -DeigOutputTableName=pai_temp_2032_17900_2
    -DprincompOutputTableName=pai_temp_2032_17900_1
    -DselectedColNames=pdays,previous,emp_var_rate,cons_price_idx,cons_conf_idx,euribor3m,nr_employed
    -DtransType=Simple
    -DcalcuType=CORR
    -DcontriRate=0.9;
ParameterRequiredDescriptionDefault value
inputTableNameYesThe input table that is used for training. No default value
selectedColNamesYesThe columns that are selected from the input table for analysis.

Separate multiple columns with commas (,). The columns of the INT or DOUBLE data type are supported.

No default value
eigOutputTableNameYesThe output table that contains feature vectors and feature values. No default value
princompOutputTableNameYesThe output table after dimensionality and noise reduction of principal components. No default value
transTypeNoThe method that is used to transform the original table to a PCA table. Valid values:
  • Simple
  • Sub-Mean
  • Normalization
Simple
calcuTypeNoThe method that is used to decompose features of the original table. Valid values:
  • CORR
  • COVAR_SAMP
  • COVAR_POP
CORR
contriRateNoThe information retaining ratio after dimensionality reduction. Valid values: (0,1). 0.9
remainColumnsNoThe fields that are retained from the original table after dimensionality reduction. No default value
coreNumNoThe number of cores. This parameter is used with the memSizePerCore parameter. The value must be a positive integer. Valid values: [1,9999]. Determined by the system
memSizePerCoreNoThe memory size of each core. Unit: MB. The memory size of each core. The value must be a positive integer. Valid values: [1024,64 × 1024]. Determined by the system
lifecycleNoThe lifecycle of the output table. The value must be a positive integer. No default value

Example

Sample output tables
  • Data table after dimensionality reductionData table after dimension reduction
  • Table that contains feature values and feature vectorsTable that contains feature values and feature vectors