This topic describes the Box Plot component provided by Machine Learning Studio.

A box plot shows the distribution of a set of data. It shows the distribution features of raw data. It can also be used to compare the distribution features of multiple sets of data.

Configure the component

You can configure the component by using one of the following methods:
  • Machine Learning Platform for AI console
    Parameter Description
    Continuous Features Continuous features
    Enumeration Feature The enumeration feature
    Stratified Samples The number of adopted stratified samples
  • PAI command
    PAI -name box_plot -project algo_public
        -DinputTable="boxplot"
        -DcontinueCols="age"
        -DcategoryCol="y"
         -DoutputTable="pai_temp_6075_97181_1"
        -DsampleSize="1000"
        -Dlifecycle="7";
    Parameter Required Description Default value
    inputTable Yes The name of the input table. No default value
    inputTablePartitions Yes The partitions selected from the input table for training. The system supports the following formats:
    • Partition_name=value
    • name1=value1/name2=value2: multi-level partitions
    Note If you specify multiple partitions, separate them with commas (,).
    No default value
    outputTable Yes The output table that stores the box plot and samples. No default value
    continueCols Yes Continuous feature columns. No default value
    categoryCol Yes The enumeration feature column. No default value
    sampleSize No The number of samples based on which the disturbance conditions of each feature are drawn. 1000
    lifecycle No The lifecycle of the output table. Unit: days. 28
    coreNum No The number of cores. The value must be a positive integer. Automatically allocated
    memSizePerCore No The memory size of each core. The value must range from 1 MB to 65536 MB. Automatically allocated

Example

  • Input data
    create table boxplot as select age, y from bank_data limit 100;
    age y
    50 0
    53 0
    28 1
    39 0
    55 1
    30 0
    37 0
    39 0
    36 1
    27 0
    34 0
    41 0
    55 1
    33 0
    26 0
    52 0
    35 1
    27 1
    28 0
    26 0
    41 0
    35 0
    40 0
    32 0
    41 0
    34 0
    49 0
    37 0
    35 0
    38 0
    47 0
    46 0
    27 0
    29 1
    32 0
    36 0
    29 0
    47 0
    44 0
    54 0
    36 0
    42 0
    44 0
    72 1
    48 0
    36 0
    35 0
    43 0
    56 0
    42 0
    31 0
    32 0
    33 0
    31 0
    39 0
    30 1
    24 0
    24 0
    38 0
    26 0
    41 0
    34 0
    30 0
    37 0
    68 0
    31 0
    48 0
    33 0
    59 0
    44 0
    28 0
    50 0
    33 0
    45 0
    40 0
    45 0
    43 0
    54 0
    53 0
    35 0
    30 0
    25 0
    35 0
    54 1
    30 0
    38 0
    35 0
    47 0
    32 0
    27 0
    40 1
    31 0
    42 0
    40 0
    31 0
    57 0
    38 1
    39 0
    37 0
    44 0
  • Parameter configuration

    Specify the age column as the continuous feature column, and the y column as the enumeration feature column. Retain the default values of other parameters.

  • Output
    • The following figure shows a box plot.Box plot
    • The following figure shows the distribution of disturbance points.Distribution of disturbance points