This topic describes the Covariance component provided by Machine Learning Designer (formerly known as Machine Learning Studio).

In probability theory and statistics, covariance is a measure of the joint variability of two random variables. Variance is a special case of covariance where the two measured variables are the same. If the expected values are E(X) = μ and E(Y) = ν, the covariance between real-number random variables X and Y is calculated by using the following expression: cov(X, Y) = E((X - μ) (Y - ν)).

Configure the component

You can use one of the following methods to configure the Covariance component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Covariance component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
TabParameterDescription
Fields SettingInput ColumnsThe input columns. You can select only BIGINT- or DOUBLE-type columns.
TuningCoresThe number of cores used in computing. If you do not specify this parameter, the system automatically allocates the number of cores.
Memory SizeThe memory size of each core. If you do not specify this parameter, the system automatically allocates the memory size. Unit: MB.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name cov
    -project algo_public
    -DinputTableName=maple_test_cov_basic12x10_input
    -DoutputTableName=maple_test_cov_basic12x10_output
    -DcoreNum=6
    -DmemSizePerCore=110;
ParameterRequiredDescriptionDefault value
inputTableNameYesThe name of the input table. No default value
inputTablePartitionsNoThe partitions that are selected from the input table for training. The following formats are supported:
  • Partition_name=value
  • name1=value1/name2=value2: multi-level partitions
Note If you specify multiple partitions, separate them with commas (,).
All partitions of the input table
outputTableNameYesThe name of the output table. No default value
selectedColNamesNoThe columns selected from the input table. All columns
lifecycleNoThe lifecycle of the output table. No default value
coreNumNoThe number of cores used in computing. The value must be a positive integer. Valid values: 1 to 9999. Determined by the system
memSizePerCoreNoThe memory size of each core. Valid values: 1 to 65536. Unit: MB. Determined by the system