The Whole Table Statistics component is used to collect statistics of data in a table or only selected columns.

Configure the component

  • Machine Learning Platform for AI console
    Tab Parameter Description
    Fields Setting Input Columns Click Select Column to select input columns. By default, all columns are selected.
    Tuning Cores The number of cores.
    Memory Size The memory size of each core.
  • PAI command
     PAI -name stat_summary
    -project algo_public
    -DinputTableName=test_data
    -DoutputTableName=test_summary_out
    -DinputTablePartitions="ds='20160101'"
    -DselectColNames=col0,col1,col2
    -Dlifecycle=1
    Parameter Description Required
    inputTableName The name of the input table. Yes
    outputTableName The name of the output table. Yes
    inputTablePartitions The partitions in the input table. By default, all partitions are selected. No
    selectColNames The names of the columns that are selected for statistics. No
    lifecycle The lifecycle of the output table. By default, the output table has no lifecycle.
    Note The parameter value must be a positive integer.
    No
    coreNum The number of cores.
    Note The parameter value must be a positive integer.
    No
    memSizePerCore The memory size of each core, in MB. Valid values: [1024,64 x 1024].
    Note The parameter value must be a positive integer.
    No

Output format

The following table describes all the columns in the output table.
Column Description
colname The name of the column.
datatype The data type.
totalcount The total number.
count The number of non-NULL values.
missingcount The number of NULL values.
nancount The number of NaN values.
positiveinfinitycount The number of positive infinity values.
negativeinfinitycount The number of negative infinity values.
min The minimum value.
max The maximum value.
mean The average value.
variance The variance.
standarddeviation The standard deviation.
standarderror The standard error.
skewness The skewness.
kurtosis The kurtosis.
moment2 The second moment.
moment3 The third moment.
moment4 The fourth moment.
centralmoment2 The second central moment.
centralmoment3 The third central moment.
centralmoment4 The fourth central moment.
sum The sum.
sum2 The sum of squares.
sum3 The sum of cubes.
sum4 The sum of the fourth powers.

Example

  • Input data
    drop table if exists summary_test_input;
    create table summary_test_input as
    select
      *
    from
    (
      select 'a' as col1, 1 as col2, 0.001 as col3 from dual
        union all
      select 'b' as col1, 2 as col2, 100.01 as col3 from dual
    ) tmp;
  • PAI command
    PAI -name stat_summary
    -project algo_public
    -DinputTableName=summary_test_input
    -DoutputTableName=summary_test_input_out
    -DselectColNames=col1,col2,col3
    -Dlifecycle=1;
  • Output
        | colname    | datatype   | totalcount | count      | missingcount | nancount   | positiveinfinitycount | negativeinfinitycount | min        | max        | mean       | variance   | standarddeviation | standarderror | skewness   | kurtosis   | moment2    | moment3    | moment4    | centralmoment2 | centralmoment3 | centralmoment4 | sum        | sum2       | sum3       | sum4       |
        | col1       | string     | 2          | 2          | 0            | 0          | 0                     | 0                     | NULL       | NULL       | NULL       | NULL       | NULL              | NULL          | NULL       | NULL       | NULL       | NULL       | NULL       | NULL           | NULL           | NULL           | NULL       | NULL       | NULL       | NULL       |
        | col2       | bigint     | 2          | 2          | 0            | 0          | 0                     | 0                     | 1          | 2          | 1.5        | 0.5        | 0.7071067811865476 | 0.5           | 0          | -2         | 2.5        | 4.5        | 8.5        | 0.25           | 0              | 0.0625         | 3          | 5          | 9          | 17         |
        | col3       | double     | 2          | 2          | 0            | 0          | 0                     | 0                     | 0.001      | 100.01     | 50.0055    | 5000.900040500001 | 70.71704207968544 | 50.00450000000001 | 2.327677906939552e-16 | -1.999999999999999 | 5001.000050500001 | 500150.0150005006 | 50020003.00020002 | 2500.45002025  | 2.91038304567337e-11 | 6252250.303768232 | 100.011    | 10002.000101 | 1000300.030001001 | 100040006.0004 |