This topic describes the Chi-square Goodness of Fit Test component provided by Machine Learning Designer. The Chi-square Goodness of Fit Test component is used in scenarios in which categorical variables are used. This component is used to determine the difference between the observed frequency and expected frequency for each classification of a single multiclass categorical variable. The null hypothesis assumes that the observed frequency and expected frequency are the same.
Configure the component
You can configure the Chi-square Goodness of Fit Test component by using one of the following methods:
Method 1: Configure the component in Machine Learning Designer
Configure the component on the pipeline configuration tab of Machine Learning Designer in the Machine Learning Platform for AI console.
Parameter | Description |
---|---|
Input Column | The column on which you want to perform a chi-square test. |
Class Probability | The class probability configuration. Specify this parameter in the Class:Probability format. The sum of all probabilities is 1. |
Method 2: Run Machine Learning Platform for AI commands
Configure the component parameters by using a Machine Learning Platform for AI command. You can use the SQL Script component to run Machine Learning Platform for AI commands. For more information, see SQL Script. The following table describes the parameters of the command that is used to configure this component.
PAI -name chisq_test
-project algo_public
-DinputTableName=pai_chisq_test_input
-DcolName=f0
-DprobConfig=0:0.3,1:0.7
-DoutputTableName=pai_chisq_test_output0
-DoutputDetailTableName=pai_chisq_test_output0_detail
Parameter | Required | Description | Default value |
---|---|---|---|
inputTableName | Yes | The name of the input table. | None. |
colName | Yes | The name of the column. | None. |
outputTableName | Yes | The name of the output table. | None. |
outputDetailTableName | Yes | The name of the output details table. | None. |
inputTablePartitions | No | The partition that is selected from the input table for training. The following formats are supported:
Note If you specify multiple partitions, separate them with commas (,). | By default, this parameter is left empty. |
probConfig | No | The class probability configuration. Specify this parameter in the Class:Probability format. The sum of all probabilities is 1. | By default, this parameter is not specified, and all the probability values are the same. |
Example
- Test data
create table pai_chisq_test_input as select * from ( select '1' as f0,'2' as f1 from dual union all select '1' as f0,'3' as f1 from dual union all select '1' as f0,'4' as f1 from dual union all select '0' as f0,'3' as f1 from dual union all select '0' as f0,'4' as f1 from dual )tmp;
- PAI command
PAI -name chisq_test -project algo_public -DinputTableName=pai_chisq_test_input -DcolName=f0 -DprobConfig=0:0.3,1:0.7 -DoutputTableName=pai_chisq_test_output0 -DoutputDetailTableName=pai_chisq_test_output0_detail
- Output description
- The output table that is specified by the outputTableName parameter is in the JSON format. The table contains only one row and one column.
{ "Chi-Square": { "comment": "Pearson's chi-square test", "df": 1, "p-value": 0.75, "value": 0.2380952380952381 } }
- The following table describes the columns in the output detail table that is specified by the outputDetailTableName parameter.
column name comment colName The data source class. observed The observed frequency. expected The expected frequency. residuals The standard residuals, which are calculated by using the following expression: (Standard residuals = (Observed frequency - Expected frequency)/sqrt(Expected frequency)
. - Generated data
- The output table that is specified by the outputTableName parameter is in the JSON format. The table contains only one row and one column.