This topic describes the Chi-square Goodness of Fit Test component provided by Machine Learning Designer. The Chi-square Goodness of Fit Test component is used in scenarios in which categorical variables are used. This component is used to determine the difference between the observed frequency and expected frequency for each classification of a single multiclass categorical variable. The null hypothesis assumes that the observed frequency and expected frequency are the same.

Configure the component

You can configure the Chi-square Goodness of Fit Test component by using one of the following methods:

Method 1: Configure the component in Machine Learning Designer

Configure the component on the pipeline configuration tab of Machine Learning Designer in the Machine Learning Platform for AI console.
ParameterDescription
Input ColumnThe column on which you want to perform a chi-square test.
Class ProbabilityThe class probability configuration. Specify this parameter in the Class:Probability format. The sum of all probabilities is 1.

Method 2: Run Machine Learning Platform for AI commands

Configure the component parameters by using a Machine Learning Platform for AI command. You can use the SQL Script component to run Machine Learning Platform for AI commands. For more information, see SQL Script. The following table describes the parameters of the command that is used to configure this component.
PAI -name chisq_test
    -project algo_public
    -DinputTableName=pai_chisq_test_input
    -DcolName=f0
    -DprobConfig=0:0.3,1:0.7
    -DoutputTableName=pai_chisq_test_output0
    -DoutputDetailTableName=pai_chisq_test_output0_detail
ParameterRequiredDescriptionDefault value
inputTableNameYesThe name of the input table. None.
colNameYesThe name of the column.None.
outputTableNameYesThe name of the output table.None.
outputDetailTableNameYesThe name of the output details table. None.
inputTablePartitionsNoThe partition that is selected from the input table for training. The following formats are supported:
  • Partition_name=value
  • name1=value1/name2=value2: multi-level partitions
Note If you specify multiple partitions, separate them with commas (,).
By default, this parameter is left empty.
probConfigNoThe class probability configuration. Specify this parameter in the Class:Probability format. The sum of all probabilities is 1. By default, this parameter is not specified, and all the probability values are the same.

Example

  • Test data
    create table pai_chisq_test_input as
    select * from
    (
      select '1' as f0,'2' as f1 from dual
      union all
      select '1' as f0,'3' as f1 from dual
      union all
      select '1' as f0,'4' as f1 from dual
      union all
      select '0' as f0,'3' as f1 from dual
      union all
      select '0' as f0,'4' as f1 from dual
    )tmp;
  • PAI command
    PAI -name chisq_test
        -project algo_public
        -DinputTableName=pai_chisq_test_input
        -DcolName=f0
        -DprobConfig=0:0.3,1:0.7
        -DoutputTableName=pai_chisq_test_output0
        -DoutputDetailTableName=pai_chisq_test_output0_detail
  • Output description
    • The output table that is specified by the outputTableName parameter is in the JSON format. The table contains only one row and one column.
      {
          "Chi-Square": {
              "comment": "Pearson's chi-square test",
              "df": 1,
              "p-value": 0.75,
              "value": 0.2380952380952381
          }
      }
    • The following table describes the columns in the output detail table that is specified by the outputDetailTableName parameter.
      column namecomment
      colNameThe data source class.
      observedThe observed frequency.
      expectedThe expected frequency.
      residualsThe standard residuals, which are calculated by using the following expression: (Standard residuals = (Observed frequency - Expected frequency)/sqrt(Expected frequency).
    • Generated dataChi-square test