All Products
Search
Document Center

Platform For AI:Percentile

Last Updated:Apr 30, 2025

Percentile is a measure used in statistics to calculate the percentile of data. When a set of data is ordered from the smallest to largest and is divided into 100 groups, the percentile indicates the value below which a given percentage of data falls.

Background information

  • The system can calculate only the percentiles of data of the BIGINT, DOUBLE, or DATETIME type.

  • Empty columns are skipped when the percentile is calculated. If all of the columns are empty, an error is returned.

  • You can specify multiple columns of data in the colName parameter.

Configure the component

You can use one of the following methods to configure the Percentile component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Percentile component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.

Tab

Parameter

Description

Parameters Setting

Input Columns

Click Select Column to select input columns.

Tuning

Number of Cores

The number of cores.

Memory Size per Core

The memory size of each core.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.

PAI -name Percentile
     -project algo_public
     -DinputTableName=maple_test_percentile_3col_input
     -DcolName=col0,col1,col2 -DoutputTableName=maple_test_percentile_3col_output;

Parameter

Description

Required

inputTableName

The name of the input table.

Yes

outputTableName

The name of the output table.

Yes

colName

The names of columns to be calculated. By default, all columns are selected.

Note

Separate the names of multiple columns with commas (,).

No

inputPartitions

The partitions in the input table. By default, all partitions are selected.

  • Specify a single partition in the format of partition_name=value.

  • Specify multiple partitions in the format of name1=value1,name2=value2.

    Note

    Separate multiple partitions with commas (,).

  • Specify multi-level partitions in the format of name1=value1/name2=value2.

No

predictInputTableName

The name of the prediction table. After you set this parameter, the prediction result can be generated.

No

predictInputTablePartitions

The partitions in the input prediction table.

No

predictSelectedColNames

The names of the columns selected from the prediction table. By default, all the columns in the prediction table are selected. The column names must be the same as the column names in a training table.

No

predictSelectedOriginalColNames

The names of the columns whose data you want to retain. By default, all columns are selected. Separate the names of multiple columns with commas (,).

No

predictOutputTableName

The name of the output prediction table. This parameter is used with the predictInputTableName parameter.

No

lifecycle

The lifecycle of the output table. By default, the output table has no lifecycle.

Note

The value must be a positive integer.

No

coreNum

The number of cores. Valid values: [1,9999]. This parameter is used with the memSizePerCore parameter.

Note

The value must be a positive integer.

No

memSizePerCore

The memory size of each core. Unit: MB. Valid values: [1024,64 × 1024].

Note

The value must be a positive integer.

No

Example

  • Input table

    col0:double (1000 rows)

    col1:bigint (100 rows)

    col2:bigint (300 rows)

    962

    88

    Tue Oct 15 00:26:40 CST 1974

    218

    99

    Thu Jan 04 20:53:20 CST 1973

    565

    44

    Sat Mar 09 02:40:00 CST 1974

    314

    68

    Mon Aug 11 22:40:00 CST 1975

    583

    13

    Sat Aug 23 12:26:40 CST 1975

    615

    87

    Tue May 25 14:13:20 CST 1971

    70

    53

    Fri Mar 23 09:20:00 CST 1979

    929

    63

    Mon Jul 03 16:26:40 CST 1972

    249

    48

    Thu Mar 15 07:33:20 CST 1973

    428

    62

    Wed Mar 17 03:33:20 CST 1971

    119

    1

    Thu Jun 26 15:33:20 CST 1975

    756

    27

    Mon Jan 30 17:20:00 CST 1978

    490

    75

    Wed Dec 11 21:20:00 CST 1974

    957

    12

    Sun Jul 05 12:26:40 CST 1970

    80

    22

    Wed Oct 04 06:40:00 CST 1972

    681

    57

    Wed Nov 03 15:06:40 CST 1971

    13

    95

    Sat Sep 12 23:06:40 CST 1970

  • PAI command

     PAI -name Percentile
         -project algo_public
         -DinputTableName=maple_test_percentile_3col_input
         -DcolName=col0,col1,col2 -DoutputTableName=maple_test_percentile_3col_output;
  • Output table

    quantile:bigint

    col0:double

    col1:bigint

    col2:datetime

    0

    0.0

    0

    Thu Jan 01 08:00:00 CST 1970

    1

    9.0

    0

    Sat Jan 24 11:33:20 CST 1970

    2

    19.0

    1

    Sat Feb 28 04:53:20 CST 1970

    3

    29.0

    2

    Fri Apr 03 22:13:20 CST 1970

    4

    39.0

    3

    Fri May 08 15:33:20 CST 1970

    5

    49.0

    4

    Fri Jun 12 08:53:20 CST 1970

    6

    59.0

    5

    Fri Jul 17 02:13:20 CST 1970

    7

    69.0

    6

    Thu Aug 20 19:33:20 CST 1970

    8

    79.0

    7

    Thu Sep 24 12:53:20 CST 1970

    9

    89.0

    8

    Thu Oct 29 06:13:20 CST 1970

    10

    99.0

    9

    Wed Dec 02 23:33:20 CST 1970

    11

    109.0

    10

    Wed Jan 06 16:53:20 CST 1971

    12

    119.0

    11

    Wed Feb 10 10:13:20 CST 1971

    13

    129.0

    12

    Wed Mar 17 03:33:20 CST 1971

    14

    139.0

    13

    Tue Apr 20 20:53:20 CST 1971

    15

    149.0

    14

    Tue May 25 14:13:20 CST 1971

    16

    159.0

    15

    Tue Jun 29 07:33:20 CST 1971

    ...

    ...

    ...

    ...

    84

    839.0

    83

    Thu Dec 15 10:13:20 CST 1977

    85

    849.0

    84

    Thu Jan 19 03:33:20 CST 1978

    86

    859.0

    85

    Wed Feb 22 20:53:20 CST 1978

    87

    869.0

    86

    Wed Mar 29 14:13:20 CST 1978

    88

    879.0

    87

    Wed May 03 07:33:20 CST 1978

    89

    889.0

    88

    Wed Jun 07 00:53:20 CST 1978

    90

    899.0

    89

    Tue Jul 11 18:13:20 CST 1978

    91

    909.0

    90

    Tue Aug 15 11:33:20 CST 1978

    92

    919.0

    91

    Tue Sep 19 04:53:20 CST 1978

    93

    929.0

    92

    Mon Oct 23 22:13:20 CST 1978

    94

    939.0

    93

    Mon Nov 27 15:33:20 CST 1978

    95

    949.0

    94

    Mon Jan 01 08:53:20 CST 1979

    96

    959.0

    95

    Mon Feb 05 02:13:20 CST 1979

    97

    969.0

    96

    Sun Mar 11 19:33:20 CST 1979

    98

    979.0

    97

    Sun Apr 15 12:53:20 CST 1979

    99

    989.0

    98

    Sun May 20 06:13:20 CST 1979

    100

    999.0

    99

    Sat Jun 23 23:33:20 CST 1979