The Binning component is used for feature discretization. Feature discretization is a process of converting continuous data into multiple discrete intervals. The Binning component supports equal frequency binning, equal width binning, and automated binning.

Component configurations

You can use one of the following methods to configure the Binning component.

Method 1: Configure the component on the pipeline page

You can configure the parameters of the Binning component on the pipeline page of Machine Learning Designer of Machine Learning Platform for AI (PAI). Machine Learning Designer is formerly known as Machine Learning Studio. The following table describes the parameters.
TabParameterDescription
Fields SettingFeature ColumnsColumns of the STRING, BIGINT, and DOUBLE types are supported.
Label ColumnThis parameter is required only for binary classification.
Positive ValueThis parameter is valid only if the Label Column parameter is specified.
Binning Parameter SourceValid values: Parameters in Parameter Settings and Manual Binning or Custom JSON.
Reserve Unselected Feature ColumnsThis parameter is valid only if you set the Binning Parameter Source parameter to Manual Binning or Custom JSON. If you set the Reserve Unselected Feature Columns parameter to Yes, the columns that are not specified for the Feature Columns parameter remain unchanged in the output. Otherwise, the columns that are not specified for the Feature Columns parameter are removed from the output.
Upload Binning and Constraint JSON CodeThis parameter is valid only if you set the Binning Parameter Source parameter to Manual Binning or Custom JSON.
Parameters SettingBinsIf you set this parameter to 10, continuous features are converted into 10 discrete intervals.
Custom Bins

You can specify the numbers of bins for specific columns. The setting of this parameter takes precedence over the setting of the Bins parameter. If a specific column is not included in the selected columns, this column is also used in binning. For example, columns col0 and col1 are selected for data binning. The number of bins customized for the col0 column is 3, and that customized for the col2 column is 5. If the Bins parameter is set to 10, binning is performed based on col0:3,col1:10,col2:5.

Specify this parameter in the format of Column name 1:Number of bins,Column name 2:Number of bins.

Custom Discrete Value Count ThresholdSpecify this parameter in the col0:3 format.
Interval TypeValid values: Left-open, Right-closed and Left-closed, Right-open.
Binning ModeValid values: Equal Frequency, Equal Width, and Automatic Binning.
Discrete Value Count ThresholdIf a value is less than this threshold, the value is distributed to the else bin.
TuningCoresThe number of cores. By default, the system determines the value.
Memory Size per CoreThe memory size of each core. By default, the system determines the value.

Method 2: Use PAI commands

Configure the component parameters by using PAI commands. You can use the SQL Script component to call PAI commands. For more information, see SQL Script.
PAI -name binning
    -project algo_public
    -DinputTableName=input
    -DoutputTableName=output
ParameterDescriptionRequiredDefault value
inputTableNameThe name of the input table. YesNone
outputTableNameThe name of the output table. YesNone
selectedColNamesThe columns that are selected from the input table for binning. NoAll columns except the label column (If no label column exists, all columns are selected.)
labelColNameThe label column. NoNone
validTableNameThe name of the validation table. This parameter is required if the binningMethod parameter is set to auto. NoNull
validTablePartitionsThe partitions that are selected from the validation table. NoFull table
inputTablePartitionsThe partitions that are selected from the input table. NoFull table
inputBinTableNameThe input binning table. NoNone
selectedBinColNamesThe columns that are selected from the input binning table. NoNull
positiveLabelSpecifies whether the samples are positive samples. No1
nDivideThe number of bins. The value of this parameter must be a positive integer. No10
colsNDivideThe numbers of bins for specific columns. Specify this parameter in the format of Column name 1:Number of bins,Column name 2:Number of bins. Example: col0:3,col2:5. If the columns that are specified for the colsNDivide parameter are not included in those specified for the selectedColNames parameter, the columns are also used in binning. For example, the selectedColNames parameter is set to col0,col1, the colsNDivide parameter is set to col0:3,col2:5, and the nDivide parameter is set to 10. In this case, binning is performed based on col0:3,col1:10,col2:5. NoNull
isLeftOpenThe interval type. Valid values:
  • {true}: left-open, right-closed intervals
  • {false}: left-closed, right-open intervals
Notrue
stringThresholdThe threshold for discrete values in the else bin. NoNone
colsStringThresholdThe threshold for specific columns. Specify this parameter in the same format as the colsNDivide parameter. NoNull
binningMethodThe binning mode. Valid values:
  • quantile: indicates equal frequency binning.
  • bucket: indicates equal width binning.
  • auto: indicates that the system automatically selects a binning mode.
Noquantile
lifecycleThe lifecycle of the output table. The value of this parameter must be a positive integer. NoNone
coreNumThe number of cores. The value of this parameter must be a positive integer. NoDetermined by the system
memSizePerCoreThe memory size of each core. The value of this parameter must be a positive integer. NoDetermined by the system
The Binning component must be used with the Scorecard Training component. During scorecard training, the Binning component converts continuous features into multiple discrete dummy variables to achieve feature engineering. You can specify constraints for the weights of the dummy variables. The following information describes the constraints:
  • Ascending order: Weights must be added to the dummy variables of a feature based on index values in ascending order. This indicates that a dummy variable with a greater index value has a higher weight.
  • Descending order: Weights must be added to the dummy variables of a feature based on index values in descending order. This indicates that a dummy variable with a greater index value has a lower weight.
  • Same weight: The weights of two dummy variables of a feature must be the same.
  • Zero weight: The weight of a dummy variable must be 0.
  • Specific weight: The weight of a dummy variable must be a specific floating-point value.
  • WOE order: Weights must be added to the dummy variables of a feature based on the weight of evidence (WOE) values in ascending order. This indicates that a dummy variable with a greater WOE value has a higher weight.

Result presentation

  1. After the workflow that contains the Binning component finishes running, right-click the Binning component on the canvas and select Binning.
  2. On the variable list page, you can check the Bins, Type, and IV information for each variable. The following figure shows an example of variable information. Binning variables
  3. Click the name of a variable such as f1 to go to the binning details page of the variable. The following figure shows the binning details page of f1.
    You can click Merge or Split to merge or split binning data. You can also specify constraints for bins.
    Note The specified constraints take effect only on the subsequent Scorecard Training component. If you use the Binning component without the Scorecard Training component, these constraints can be ignored.
    Binning details