The Whole Table Statistics component is used to collect statistics of data in a table or only selected columns.

Configure the component

You can use one of the following methods to configure the Whole Table Statistics component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Whole Table Statistics component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
TabParameterDescription
Fields SettingInput Columns (All Columns by Default)Click Select Column to select input columns. By default, all columns are selected.
TuningCoresThe number of cores.
Memory SizeThe memory size of each core.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
 PAI -name stat_summary
-project algo_public
-DinputTableName=test_data
-DoutputTableName=test_summary_out
-DinputTablePartitions="ds='20160101'"
-DselectColNames=col0,col1,col2
-Dlifecycle=1
ParameterDescriptionRequired
inputTableNameThe name of the input table. Yes
outputTableNameThe name of the output table. Yes
inputTablePartitionsThe partitions in the input table. By default, all partitions are selected. No
selectColNamesThe names of the columns that are selected for statistics. No
lifecycleThe lifecycle of the output table. By default, the output table has no lifecycle.
Note The value must be a positive integer.
No
coreNumThe number of cores.
Note The value must be a positive integer.
No
memSizePerCoreThe memory size of each core. Unit: MB. Valid values: [1024,64 x 1024].
Note The value must be a positive integer.
No

Output format

The following table describes all the columns in the output table.
ColumnDescription
colnameThe name of the column.
datatypeThe data type.
totalcountThe total number.
countThe number of non-NULL values.
missingcountThe number of NULL values.
nancountThe number of NaN values.
positiveinfinitycountThe number of positive infinity values.
negativeinfinitycountThe number of negative infinity values.
minThe minimum value.
maxThe maximum value.
meanThe average value.
varianceThe variance.
standarddeviationThe standard deviation.
standarderrorThe standard error.
skewnessThe skewness.
kurtosisThe kurtosis.
moment2The second moment.
moment3The third moment.
moment4The fourth moment.
centralmoment2The second central moment.
centralmoment3The third central moment.
centralmoment4The fourth central moment.
sumThe sum.
sum2The sum of squares.
sum3The sum of cubes.
sum4The sum of the fourth powers.

Example

  • Input data
    drop table if exists summary_test_input;
    create table summary_test_input as
    select
      *
    from
    (
      select 'a' as col1, 1 as col2, 0.001 as col3 from dual
        union all
      select 'b' as col1, 2 as col2, 100.01 as col3 from dual
    ) tmp;
  • PAI command
    PAI -name stat_summary
    -project algo_public
    -DinputTableName=summary_test_input
    -DoutputTableName=summary_test_input_out
    -DselectColNames=col1,col2,col3
    -Dlifecycle=1;
  • Output result
        | colname    | datatype   | totalcount | count      | missingcount | nancount   | positiveinfinitycount | negativeinfinitycount | min        | max        | mean       | variance   | standarddeviation | standarderror | skewness   | kurtosis   | moment2    | moment3    | moment4    | centralmoment2 | centralmoment3 | centralmoment4 | sum        | sum2       | sum3       | sum4       |
        | col1       | string     | 2          | 2          | 0            | 0          | 0                     | 0                     | NULL       | NULL       | NULL       | NULL       | NULL              | NULL          | NULL       | NULL       | NULL       | NULL       | NULL       | NULL           | NULL           | NULL           | NULL       | NULL       | NULL       | NULL       |
        | col2       | bigint     | 2          | 2          | 0            | 0          | 0                     | 0                     | 1          | 2          | 1.5        | 0.5        | 0.7071067811865476 | 0.5           | 0          | -2         | 2.5        | 4.5        | 8.5        | 0.25           | 0              | 0.0625         | 3          | 5          | 9          | 17         |
        | col3       | double     | 2          | 2          | 0            | 0          | 0                     | 0                     | 0.001      | 100.01     | 50.0055    | 5000.900040500001 | 70.71704207968544 | 50.00450000000001 | 2.327677906939552e-16 | -1.999999999999999 | 5001.000050500001 | 500150.0150005006 | 50020003.00020002 | 2500.45002025  | 2.91038304567337e-11 | 6252250.303768232 | 100.011    | 10002.000101 | 1000300.030001001 | 100040006.0004 |