This topic describes the Chi-square Goodness of Fit Test component provided by Machine Learning Studio.
Configure the component
The Chi-square Goodness of Fit Test component is used in scenarios where categorical
variables are used. This component is used to determine the difference between the
observed frequency and expected frequency for each classification of a single multiclass
categorical variable. The null hypothesis assumes that the observed frequency and
expected frequency are the same. You can configure the component by using one of the
following methods:
- Machine Learning Platform for AI (PAI) console
Parameter Description Input Column The column on which you want to perform a chi-square test. Class Probability The class probability configuration. Specify this parameter in the Class:Probability
format. The sum of all probabilities is 1. - PAI command
PAI -name chisq_test -project algo_public -DinputTableName=pai_chisq_test_input -DcolName=f0 -DprobConfig=0:0.3,1:0.7 -DoutputTableName=pai_chisq_test_output0 -DoutputDetailTableName=pai_chisq_test_output0_detail
Parameter Required Description Default value inputTableName Yes The name of the input table. No default value colName Yes The name of the column. No default value outputTableName Yes The name of the output table. No default value outputDetailTableName Yes The name of the output detail table. No default value inputTablePartitions No The partitions that you want to select from the input table for training. Specify this parameter in one of the following formats: - Partition_name=value
- Multi-level partition: name1=value1/name2=value2
Note If you specify multiple partitions, separate them with commas (,).No default value probConfig No The class probability configuration. Specify this parameter in the Class:Probability
format. The sum of all probabilities is 1.No default value (If this parameter is not specified, all the probability values are the same.)
Example
- Test data
create table pai_chisq_test_input as select * from ( select '1' as f0,'2' as f1 from dual union all select '1' as f0,'3' as f1 from dual union all select '1' as f0,'4' as f1 from dual union all select '0' as f0,'3' as f1 from dual union all select '0' as f0,'4' as f1 from dual )tmp;
- PAI command
PAI -name chisq_test -project algo_public -DinputTableName=pai_chisq_test_input -DcolName=f0 -DprobConfig=0:0.3,1:0.7 -DoutputTableName=pai_chisq_test_output0 -DoutputDetailTableName=pai_chisq_test_output0_detail
- Output
- The output table that is specified by outputTableName is in the JSON format. It contains
only one row and one column.
{ "Chi-Square": { "comment": "Pearson's chi-square test", "df": 1, "p-value": 0.75, "value": 0.2380952380952381 } }
- The following table lists the columns in the output detail table that is specified
by outputDetailTableName.
column name comment colName The data source class. observed The observed frequency. expected The expected frequency. residuals The standard residuals, which are calculated by using the following expression: (Standard residuals = (Observed frequency - Expected frequency)/sqrt(Expected frequency)
. - Generated data
- The output table that is specified by outputTableName is in the JSON format. It contains
only one row and one column.