All Products
Search
Document Center

Platform For AI:Covariance

Last Updated:Nov 28, 2024

The covariance algorithm is a statistical method used to measure the linear relationship between two random variables. It evaluates how these variables jointly vary by calculating the expected value of the product of their deviations. Covariance is of great significance in probability theory and statistics, and is widely used in machine learning for tasks such as feature selection and data preprocessing.

Algorithm description

Definition

Covariance is defined as the expected value of the product of the deviations of two random variables. Formula:

  • X and Y are two random variables.

  • μ and ν are the expected values of X and Y, respectively.

  • E is the expectation operation.

Properties

  • Positive covariance: Indicates that the two variables have a positive correlation, meaning that when one variable increases, the other variable also tends to increase.

  • Negative covariance: Indicates that the two variables have a negative correlation, meaning that when one variable increases, the other variable tends to decrease.

  • Zero covariance: Indicates that the two variables have no linear relationship.

Configure the component

Method 1: Configure the component on the pipeline page

Add a Covariance component on the pipeline page and configure the following parameters:

Category

Parameter

Description

Fields Setting

Input Columns

The input columns. You can select only BIGINT- or DOUBLE-type columns.

Tuning

Cores

The number of cores used in computing. If you do not specify this parameter, the system automatically allocates the number of cores.

Memory Size

The memory size of each core. If you do not specify this parameter, the system automatically allocates the memory size. Unit: MB.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

PAI -name cov
    -project algo_public
    -DinputTableName=maple_test_cov_basic12x10_input
    -DoutputTableName=maple_test_cov_basic12x10_output
    -DcoreNum=6
    -DmemSizePerCore=110;

Parameter

Required

Default value

Description

inputTableName

Yes

None

The name of the input table.

inputTablePartitions

No

All partitions of the input table

The partitions that are selected from the input table for training. The following formats are supported:

  • partition_name=value

  • name1=value1/name2=value2: multi-level partitions

Note

If you specify multiple partitions, separate them with commas (,). For example, name1=value1,value2.

outputTableName

Yes

None

The name of the output table.

selectedColNames

No

All columns

The columns selected from the input table.

lifecycle

No

None

The lifecycle of the output table.

coreNum

No

Determined by the system

The number of cores used in computing. The value must be a positive integer. Valid values: 1 to 9999.

memSizePerCore

No

Determined by the system

The memory size of each core. Valid values: 1 to 65536. Unit: MB.