The Binning component is used for feature discretization. Feature discretization is a process of converting continuous data into multiple discrete intervals. The Binning component supports equal frequency binning, equal width binning, and automated binning.

Configure the component

You can configure the component by using one of the following methods:
  • Use the Machine Learning Platform for AI console
    Tab Parameter Description
    Fields Setting Feature Columns The columns of the STRING, BIGINT, and DOUBLE types are supported.
    Label Column This parameter is required only for binary classification.
    Positive Value This parameter is valid only if the Label Column is specified.
    Binning Parameter Source Valid values: Parameters in Parameter Settings and Manual Binning or Custom JSON.
    Reserve Unselected Feature Columns This parameter is valid only if you set the Binning Parameter Source parameter to Manual Binning or Custom JSON. If you set the Reserve Unselected Feature Columns parameter to Yes, the columns that are not specified for the Feature Columns parameter remain unchanged in the output. If you set the Reserve Unselected Feature Columns parameter to No, the columns that are not specified for the Feature Columns parameter are not contained in the output.
    Upload Binning and Constraint JSON Code This parameter is valid only if you set the Binning Parameter Source to Manual Binning or Custom JSON.
    Append Binning Files If the files you specify contain new features, the new features are appended to binning results. If the files do not contain new features, the features in the files take effect.
    Parameters Setting Bins If you set this parameter to 10, continuous features are converted into 10 discrete intervals.
    Custom Bins

    You can specify the numbers of bins for specific columns. The setting of this parameter takes precedence over the setting of the Bins parameter. If a specific column is not included in the selected columns, this column is also used in binning. For example, columns col0 and col1 are selected for data binning. The number of bins customized for column col0 is 3 and that customized for col2 is 5. If the Bins parameter is set to 10, binning is performed based on col0:3,col1:10,col2:5.

    Specify this parameter in the format of Column name 1:Number of bins,Column name 2: Number of bins.

    Custom Discrete Value Count Threshold Specify this parameter in the col0:3 format.
    Interval Type Valid values: Left-open, Right-closed and Left-closed, Right-open.
    Binning Mode Valid values: Equal Frequency, Equal Width, and Automatic Binning.
    Discrete Value Count Threshold If a value is less than this threshold, the value is distributed to the else bin.
    Tuning Cores The number of cores. By default, the system determines the value.
    Memory Size per Core The memory size of each core. By default, the system determines the value.
  • Use commands
    PAI -name binning
        -project algo_public
        -DinputTableName=input
        -DoutputTableName=output
    Parameter Description Required Default value
    inputTableName The name of the input table. Yes N/A
    outputTableName The name of the output table. Yes N/A
    selectedColNames The columns that are selected from the input table for binning. No All columns except the label column (If no label column exists, all columns are selected.)
    labelColumn The label column. No N/A
    validTableName The name of the validation table. This parameter is required if the binningMethod parameter is set to auto. No Empty string
    validTablePartitions The partitions that are selected from the validation table. No Full table
    inputTablePartitions The partitions that are selected from the input table. No Full table
    inputBinTableName The input binning table. No N/A
    selectedBinColNames The columns that are selected from the input binning table. No Empty string
    positiveLabel Specifies whether the samples are positive samples. No 1
    nDivide The number of bins. The value of this parameter must be a positive integer. No 10
    colsNDivide The numbers of bins for specific columns. Specify this parameter in the format of Column name 1:Number of bins,Column name 2:Number of bins. Example: col0:3,col2:5. If the columns that are specified for the colsNDivide parameter are not included in those specified for the selectedColNames parameter, the columns are also used in binning. For example, the selectedColNames parameter is set to col0,col1, the colsNDivide parameter is set to col0:3,col2:5, and the nDivide parameter is set to 10. In this case, binning is performed based on col0:3,col1:10,col2:5. No Empty string
    isLeftOpen The interval type. Valid values:
    • {true}: indicates left-open, right-closed intervals.
    • {false}: indicates left-closed, right-open intervals.
    No true
    stringThreshold The threshold for discrete values in the else bin. No N/A
    colsStringThreshold The threshold for specific columns. Specify this parameter in the same format as the colsNDivide parameter. No Empty string
    binningMethod The binning mode. Valid values:
    • quantile: indicates equal frequency binning.
    • bucket: indicates equal width binning.
    • auto: indicates that the system automatically selects a binning mode.
    No quantile
    lifecycle The lifecycle of the output table. The value of this parameter must be a positive integer. No N/A
    coreNum The number of cores. The value of this parameter must be a positive integer. No Determined by the system
    memSizePerCore The memory size of each core. The value of this parameter must be a positive integer. No Determined by the system
The Binning component must be used with the Scorecard Training component. During scorecard training, the Binning component converts continuous features into multiple discrete dummy variables to achieve feature engineering. You can specify constraints for the weights of the dummy variables. The following information describes the constraints:
  • Ascending order: Weights must be added to the dummy variables of a feature based on index values in ascending order. This indicates that a dummy variable with a large index value has a high weight.
  • Descending order: Weights must be added to the dummy variables of a feature based on index values in descending order. This indicates that a dummy variable with a large index value has a low weight.
  • Same weight: The weights of two dummy variables of a feature must be the same.
  • Zero weight: The weight of a dummy variable must be 0.
  • Specific weight: The weight of a dummy variable must be a specific floating-point value.
  • WOE order: Weights must be added to the dummy variables of a feature based on the weight of evidence (WOE) values in ascending order. This indicates that a dummy variable with a large WOE value has a high weight.