Configure the Chi-square Goodness of Fit Test component - Platform For AI - Alibaba Cloud - Platform For AI

This document describes the chi-square goodness of fit test component in Designer. This component tests whether the observed counts of categorical variables are consistent with their expected counts. The null hypothesis is that there is no significant difference between the observed and expected counts.

Configure the component

You can configure the component using one of the following methods.

Method 1: Use the UI

Configure the component parameters on the Designer pipeline page.

Parameter	Description
Input column	The column to test.
Class probability	Specifies the class probabilities. Use the format `class:probability`. The sum of all probabilities must equal 1.

Method 2: Use a PAI command

Configure the component parameters by using a PAI command. You can run PAI commands using the SQL script component. For more information, see SQL script.

PAI -name chisq_test
    -project algo_public
    -DinputTableName=pai_chisq_test_input
    -DcolName=f0
    -DprobConfig=0:0.3,1:0.7
    -DoutputTableName=pai_chisq_test_output0
    -DoutputDetailTableName=pai_chisq_test_output0_detail

Parameter	Required	Description	Default
inputTableName	Yes	The name of the input table.	None
colName	Yes	The column to analyze.	None
outputTableName	Yes	The name of the output table.	None
outputDetailTableName	Yes	The name of the output details table.	None
inputTablePartitions	No	The input table partitions to test. The following formats are supported: Partition_name=value name1=value1/name2=value2 for multi-level partitions. Note Separate multiple partitions with a comma (,).	Empty
probConfig	No	Specifies the class probabilities. Use the format `class:probability`. The sum of all probabilities must equal 1.	If omitted, all classes are assumed to have equal probability.

Example

Test data

create table pai_chisq_test_input as
select * from
(
  select '1' as f0,'2' as f1
  union all
  select '1' as f0,'3' as f1
  union all
  select '1' as f0,'4' as f1
  union all
  select '0' as f0,'3' as f1
  union all
  select '0' as f0,'4' as f1
)tmp;

PAI command

PAI -name chisq_test
    -project algo_public
    -DinputTableName=pai_chisq_test_input
    -DcolName=f0
    -DprobConfig=0:0.3,1:0.7
    -DoutputTableName=pai_chisq_test_output0
    -DoutputDetailTableName=pai_chisq_test_output0_detail

Output

The outputTableName table is a single-row, single-column table in JSON format.

{
    "Chi-Square": {
        "comment": "Pearson's chi-square test",
        "df": 1,
        "p-value": 0.75,
        "value": 0.2380952380952381
    }
}

The outputDetailTableName table contains the following fields.

Column name	Description
The column specified by `colName`.	Class
observed	Observed frequency
expected	Expected frequency
residuals	Standard residuals`(residuals = (observed - expected) / sqrt(expected))`

Example output

f0    observed    expected    residuals
0        2.0         1.5         0.408248290463863
1        3.0         3.5        -0.267261241912424