All Products
Search
Document Center

Platform For AI:Normality Test

Last Updated:Nov 28, 2024

Normality Test is a statistical method used to determine if a dataset is derived from a normally distributed population. The test includes methods such as the Anderson-Darling test, Kolmogorov-Smirnov test, and QQ plot test, which assess the distribution characteristics of a dataset to support further statistical analysis and modeling.

Algorithm description

The Normality Test component offers the Anderson-Darling Test, Kolmogorov-Smirnov Test, and QQ plot test methods. You can select one or multiple methods for testing.

  • Anderson-Darling test: This enhanced goodness-of-fit test method emphasizes the tail differences of a distribution. It measures how well sample data fits a particular theoretical distribution by evaluating the squared differences of the weighted cumulative distribution function.

  • Kolmogorov-Smirnov test: As a non-parametric method, this test compares a sample distribution with a reference distribution or two sample distributions. It calculates the maximum difference between their cumulative distribution functions to assess the goodness-of-fit.

  • QQ plot test: This graphical tool is used for visually comparing sample distributions to theoretical distributions or between two sample distributions. It identifies distribution discrepancies by comparing quantiles.

Configure the component

Method 1: Configure the component on the pipeline page

Add a Normality Test component on the pipeline page and configure the following parameters:

Category

Parameter

Description

Fields Setting

Columns

The column to perform the normality test on.

Parameters Setting

Anderson-Darling Test

Whether to perform the Anderson-Darling test.

Kolmogorov-Smirnov Test

Whether to perform the Kolmogorov-Smirnov test.

Use QQ Plot

Whether to perform the QQ plot test.

Tuning

Computing Cores

The number of cores used in computing. The value must be a positive integer.

Memory Size per Core (Unit: MB)

The memory size of each core.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

PAI -name normality_test
    -project algo_public
    -DinputTableName=test
    -DoutputTableName=test_out
    -DselectedColNames=col1,col2
    -Dlifecycle=1;

Parameter

Required

Default Value

Description

inputTableName

Yes

None

The name of the input table to be tested.

outputTableName

Yes

None

The names of the output tables.

selectedColNames

No

None

The columns selected from the input table. You can select multiple columns of the DOUBLE or BIGINT type.

inputTablePartitions

No

""

The name of the partition of the input table.

enableQQplot

No

true

Whether to perform the QQ plot test.

enableADtest

No

true

Whether to perform the Anderson-Darling test.

enableKStest

No

true

Whether to perform the Kolmogorov-Smirnov test.

lifecycle

No

-1

The lifecycle of the output table. The value is an integer that is greater than or equal to -1. Default value: -1. This value indicates that the lifecycle of the output table is not set.

coreNum

No

-1

This parameter is used with memSizePerCore. The value must be a positive integer. Default value: -1. This value indicates that the number of instances is determined by the amount of input data.

memSizePerCore

No

-1

The memory size of each core. Unit: MB. The value must be positive integer. Valid values: (100,64 × 1024). Default value: -1. This value indicates that the memory size of each core is determined by the amount of input data.

Examples

  1. Add a SQL Script component, deselect Use Script Mode and Whether the system adds a create table statement. Enter the following SQL statement.

        drop table if exists normality_test_input;
        create table normality_test_input as
        select
          *
        from
        (
          select 1 as x
            union all
          select 2 as x
            union all
          select 3 as x
            union all
          select 4 as x
            union all
          select 5 as x
            union all
          select 6 as x
            union all
          select 7 as x
            union all
          select 8 as x
            union all
          select 9 as x
            union all
          select 10 as x
        ) tmp;
  2. Add another SQL script component, deselect Use Script Mode and Whether the system adds a create table statement. Enter the following PAI command, and connect the components from Step 1 and Step 2.

    drop table if exists ${o1};
    PAI -name normality_test
        -project algo_public
        -DinputTableName=normality_test_input
        -DoutputTableName=${o1}
        -DselectedColNames=x
        -Dlifecycle=1;
  3. Click the image icon in the upper left corner to run the pipeline.

  4. Right-click the SQL Script component created in Step 2 and choose View Data > SQL Script Output to view the training results.

    | colname | testname                | testvalue           | pvalue             |
    | ------- | ----------------------- | ------------------- | ------------------ |
    | x       |                         | 1.0                 | 0.8173291742279805 |
    | x       |                         | 2.0                 | 2.470864450785345  |
    | x       |                         | 3.0                 | 3.5156067948020056 |
    | x       |                         | 4.0                 | 4.3632330349313095 |
    | x       |                         | 5.0                 | 5.128868067945126  |
    | x       |                         | 6.0                 | 5.871131932054874  |
    | x       |                         | 7.0                 | 6.6367669650686905 |
    | x       |                         | 8.0                 | 7.4843932051979944 |
    | x       |                         | 9.0                 | 8.529135549214654  |
    | x       |                         | 10.0                | 10.182670825772018 |
    | x       | Anderson_Darling_Test   | 0.1411092332197832  | 0.9566579606430077 |
    | x       | Kolmogorov_Smirnov_Test | 0.09551932503797644 | 0.9999888659426232 |