A histogram is also known as a mass distribution profile. A histogram is a statistical report chart that consists of a series of vertical stripes or line segments with different heights to show the data distribution. The horizontal axis represents the data types, and the vertical axis represents the data distribution.

Configure the component

You can use one of the following methods to configure the Histogram (Multiple Columns) component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Histogram (Multiple Columns) component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
TabParameterDescription
Fields SettingSelect ColumnSelect the columns to be analyzed. Only the DOUBLE and BIGINT types are supported.
Note A maximum of 1,024 columns are supported.
Parameters SettingIntervalsThe number of intervals into which the histogram is divided.
TuningCoresThe number of cores that are used in computing. The value must be a positive integer.
Memory Size per CoreThe memory size of each core. Valid values: 1 to 65536. Unit: MB.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name histogram
      -project algo_public
      -DinputTableName=maple_histogram_1to20_input
      -DoutputTableName=maple_histogram_1to20_output
      -DselectedColNames=col0,col1 -DintervalNum=20;
ParameterRequiredDescriptionDefault value
inputTableNameYesThe name of the input table. No default value
inputTablePartitionsNoThe partitions that are selected from the input table for training. The following formats are supported:
  • Partition_name=value
  • name1=value1/name2=value2: multi-level partitions
Note If you specify multiple partitions, separate them with commas (,).
No default value
outputTableNameYesThe name of the output table. No default value
selectedColNamesYesThe names of the columns selected from the input table for training. Separate the names of multiple columns with commas (,). The INT and DOUBLE types are supported.
Note A maximum of 1,024 columns are supported.
No default value
intervalNumNoThe number of intervals into which the histogram is divided. 100
lifecycleNoThe lifecycle of the table. No default value
coreNumNoThe number of cores that are used in computing. The value must be a positive integer. Valid values: [1,9999]. Determined by the system
memSizePerCoreNoThe memory size of each core. Valid values: 1 to 65536. Unit: MB. Determined by the system

Example

  • Input description
    col0col1
    11.0
    22.0
    33.0
    44.0
    55.0
    66.0
    77.0
    88.0
    99.0
    1010.0
    1111.0
    1212.0
    1313.0
    1414.0
    1515.0
    1616.0
    1717.0
    1818.0
    1919.0
    2020.0
  • PAI command
    PAI -name histogram
        -project algo_public
        -DinputTableName=maple_histogram_1to20_input
        -DoutputTableName=maple_histogram_1to20_output
        -DselectedColNames=col0,col1 -DintervalNum=20;
  • Output description
    colnamehistogram
    col0[1, 1.95):1;[1.95, 2.9):1;[2.9, 3.85):1;[3.85, 4.8):1;[4.8, 5.75):1;[5.75, 6.7):1;[6.7, 7.65):1;[7.65, 8.6):1;[8.6, 9.55):1;[9.55, 10.5):1;[10.5, 11.45):1;[11.45, 12.4):1;[12.4, 13.35):1;[13.35, 14.3):1;[14.3, 15.25):1;[15.25, 16.2):1;[16.2, 17.15):1;[17.15, 18.1):1;[18.1, 19.05):1;[19.05, 20]:1
    col1[1, 1.95):1;[1.95, 2.9):1;[2.9, 3.85):1;[3.85, 4.8):1;[4.8, 5.75):1;[5.75, 6.7):1;[6.7, 7.65):1;[7.65, 8.6):1;[8.6, 9.55):1;[9.55, 10.5):1;[10.5, 11.45):1;[11.45, 12.4):1;[12.4, 13.35):1;[13.35, 14.3):1;[14.3, 15.25):1;[15.25, 16.2):1;[16.2, 17.15):1;[17.15, 18.1):1;[18.1, 19.05):1;[19.05, 20]:1