This document describes the chi-square goodness of fit test component in Designer. This component tests whether the observed counts of categorical variables are consistent with their expected counts. The null hypothesis is that there is no significant difference between the observed and expected counts.
Configure the component
You can configure the component using one of the following methods.
Method 1: Use the UI
Configure the component parameters on the Designer pipeline page.
|
Parameter |
Description |
|
Input column |
The column to test. |
|
Class probability |
Specifies the class probabilities. Use the format |
Method 2: Use a PAI command
Configure the component parameters by using a PAI command. You can run PAI commands using the SQL script component. For more information, see SQL script.
PAI -name chisq_test
-project algo_public
-DinputTableName=pai_chisq_test_input
-DcolName=f0
-DprobConfig=0:0.3,1:0.7
-DoutputTableName=pai_chisq_test_output0
-DoutputDetailTableName=pai_chisq_test_output0_detail
|
Parameter |
Required |
Description |
Default |
|
inputTableName |
Yes |
The name of the input table. |
None |
|
colName |
Yes |
The column to analyze. |
None |
|
outputTableName |
Yes |
The name of the output table. |
None |
|
outputDetailTableName |
Yes |
The name of the output details table. |
None |
|
inputTablePartitions |
No |
The input table partitions to test. The following formats are supported:
Note
Separate multiple partitions with a comma (,). |
Empty |
|
probConfig |
No |
Specifies the class probabilities. Use the format |
If omitted, all classes are assumed to have equal probability. |
Example
-
Test data
create table pai_chisq_test_input as select * from ( select '1' as f0,'2' as f1 union all select '1' as f0,'3' as f1 union all select '1' as f0,'4' as f1 union all select '0' as f0,'3' as f1 union all select '0' as f0,'4' as f1 )tmp; -
PAI command
PAI -name chisq_test -project algo_public -DinputTableName=pai_chisq_test_input -DcolName=f0 -DprobConfig=0:0.3,1:0.7 -DoutputTableName=pai_chisq_test_output0 -DoutputDetailTableName=pai_chisq_test_output0_detail -
Output
-
The
outputTableNametable is a single-row, single-column table in JSON format.{ "Chi-Square": { "comment": "Pearson's chi-square test", "df": 1, "p-value": 0.75, "value": 0.2380952380952381 } } -
The
outputDetailTableNametable contains the following fields.Column name
Description
The column specified by
colName.Class
observed
Observed frequency
expected
Expected frequency
residuals
Standard residuals
(residuals = (observed - expected) / sqrt(expected)) -
Example output
f0 observed expected residuals 0 2.0 1.5 0.408248290463863 1 3.0 3.5 -0.267261241912424
-