A correlation coefficient matrix is a tool used to quantify and display the pairwise correlations between multiple variables. Each element in the matrix represents the correlation coefficient between corresponding variables. In most cases, the Pearson correlation coefficient is used to measure linear relationships. The correlation coefficient matrix is essential to feature selection, data analytics, and model building, helping identify linear dependencies and multicollinearity issues among variables.
Configure the component
Method 1: Configure the component on the pipeline page
On the pipeline details page in Machine Learning Designer, add the Correlation Coefficient Matrix component to the pipeline and configure the parameters described in the following table.
Tab | Parameter | Description |
Fields Setting | All Selected by Default | The feature columns that are used in matrix calculation. By default, all feature columns are selected for correlation analysis. |
Tuning | Cores | This parameter must be used with the Memory Size parameter. |
Memory Size | This parameter must be used with the Cores parameter. |
Method 2: Use PAI commands
Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see Scenario 4: Execute PAI commands within the SQL script component.
PAI -name corrcoef
-project algo_public
-DinputTableName=maple_test_corrcoef_basic12x10_input
-DoutputTableName=maple_test_corrcoef_basic12x10_output
-DcoreNum=1
-DmemSizePerCore=110;
Parameter | Required | Default value | Description |
inputTableName | Yes | No default value | The name of the input table. |
inputTablePartitions | No | No default value | The partitions that are selected from the input table for training. The following formats are supported:
Note If you specify multiple partitions, separate them with commas (,). Example: name1=value1,value2. |
outputTableName | Yes | No default value | The name of the output table. |
selectedColNames | No | All columns | The columns selected from the input table. |
lifecycle | No | No default value | The lifecycle of the output table. |
coreNum | No | Determined by the system | This parameter must be used with the memSizePerCore parameter. The value must be a positive integer. Valid values: 1 to 9999. |
memSizePerCore | No | Determined by the system | The memory size of each core. Unit: MB. The value must be a positive integer in the range of [1024, 64 × 1024]. |
Example
Generate the following test data.
col0:double
col1:bigint
col2:double
col3:bigint
col4:double
col5:bigint
col6:double
col7:bigint
col8:double
col9:double
19
95
33
52
115
43
32
98
76
40
114
26
101
69
56
59
116
23
109
105
103
89
7
9
65
118
73
50
55
81
79
20
63
71
5
24
77
31
21
75
87
16
66
47
25
14
42
99
108
57
11
104
38
37
106
51
3
91
80
97
84
30
70
46
8
6
94
22
45
48
35
17
107
64
10
112
53
34
90
96
13
61
39
1
29
117
112
2
82
28
62
4
102
88
100
36
67
54
12
85
49
27
44
93
68
110
60
72
86
58
92
119
0
113
41
15
74
83
18
111
Run the following PAI commands:
PAI -name corrcoef -project algo_public -DinputTableName=maple_test_corrcoef_basic12x10_input -DoutputTableName=maple_test_corrcoef_basic12x10_output -DcoreNum=1 -DmemSizePerCore=110;
View the returned results.
columnsnames
col0
col1
col2
col3
col4
col5
col6
col7
col8
col9
col0
1
-0.2115657251820724
0.0598306259706561
0.2599903570684693
-0.3483249188225586
-0.28716254396809926
0.47880162127435116
-0.13646519484213326
-0.19500158764680092
0.3897390240949085
col1
-0.2115657251820724
1
-0.8444477377898585
-0.17507636221594533
0.40943384150571377
0.09135976026101403
-0.3018506374626574
0.40733726912808044
-0.11827739124590071
0.12433851389455183
col2
0.0598306259706561
-0.8444477377898585
1
0.18518346647293102
-0.20934839228057014
-0.1896417512389659
0.1799377498863213
-0.3858885676469948
0.20254569203773892
0.13476160753756655
col3
0.2599903570684693
-0.17507636221594533
0.18518346647293102
1
0.03988018649854009
-0.43737887418329147
-0.053818296425267184
0.2900856441586986
-0.3607547910075688
0.4912019074930449
col4
-0.3483249188225586
0.40943384150571377
-0.20934839228057014
0.03988018649854009
1
0.1465605209246875
-0.5016030364347955
0.5496024325711117
0.013743256115394122
0.07497231559184887
col5
-0.28716254396809926
0.09135976026101403
-0.1896417512389659
-0.43737887418329147
0.1465605209246875
1
0.16729809310873522
-0.29890655828796964
0.3618518101014617
-0.1713960957286885
col6
0.47880162127435116
-0.3018506374626574
0.1799377498863213
-0.053818296425267184
-0.5016030364347955
0.16729809310873522
1
-0.8165019880156462
-0.11173420918721436
-0.10363860378347944
col7
-0.13646519484213326
0.40733726912808044
-0.3858885676469948
0.2900856441586986
0.5496024325711117
-0.29890655828796964
-0.8165019880156462
1
0.07435907471544469
0.11711976051999162
col8
-0.19500158764680092
-0.11827739124590071
0.20254569203773892
-0.3607547910075688
0.013743256115394122
0.3618518101014617
-0.11173420918721436
0.07435907471544469
1
-0.18463012549540175
col9
0.3897390240949085
0.12433851389455183
0.13476160753756655
0.4912019074930449
0.07497231559184887
-0.1713960957286885
-0.10363860378347944
0.11711976051999162
-0.18463012549540175
1