All Products
Search
Document Center

Platform For AI:Use the Binning component to implement the discretization of continuous features

Last Updated:Feb 01, 2024

Feature discretization is a process of converting continuous data into multiple discrete intervals. To implement feature discretization, Platform for AI (PAI) provides the Binning and Data Conversion Module components. You can use the binning component to discretize continuous features, and then use the Data Conversion Module component to convert the original continuous data in the bins to discrete data. This topic describes how to discretize continuous features by using algorithm components in Machine Learning Designer.

Prerequisites

Procedure

  1. Go to the Machine Learning Designer page.

    1. Log on to the PAI console.

    2. In the left-side navigation pane, click Workspaces. On the Workspaces page, click the name of the workspace that you want to manage.

    3. In the left-side navigation pane, choose Model Training > Visualized Modeling (Designer) to go to the Machine Learning Designer page.

  2. Create an empty pipeline and open the pipeline. For more information, see Prepare data.

    The following section describes the parameters:

    • Pipeline Name: Set the value to Use the Binning component to implement the discretization of continuous features.

    • Description: Enter Use the Binning component provided by PAI for the discretization of continuous features.

    • Visibility: Set the value to Visible to Me.

  3. Configure the pipeline.

    1. In the component list on the left side, find and drag the Read Table component in the Data Source/Target folder to the canvas.

    2. In the component list on the left side, find and drag the Binning and Data Conversion Module components in the Financials folder to the canvas.

    3. Connect the preceding components as shown in the following figure. 特征离散化实验

  4. Configure the component parameters.

    1. Click the Read Table component on the canvas. In the right-side panel, configure the parameters described in the following table.

      Tab

      Parameter

      Description

      Select Table

      Table Name

      Enter pai_online_project.iris_data.

      Partition

      The pai_online_project.iris_data table is not a partitioned table. Therefore, the Partition check box is dimmed.

      Fields Information

      Source Table Columns

      You do not need to manually specify this parameter. After you specify the Table Name parameter, the system synchronizes the information of columns in the table specified by the Table Name parameter to the Source Table Columns field.

    2. Click the Binning component on the canvas. In the right-side panel, configure the parameters described in the following table and use the default values for other parameters.

      Tab

      Parameter

      Description

      Fields Setting

      Feature Columns

      Select the f1, f2, f3, and f4 columns.

      Parameters Setting

      Bins

      Set this parameter to 10. This value indicates that continuous features are converted to 10 discrete intervals.

      Binning Mode

      Valid values: Equal Frequency, Equal Width, and Automatic Binning. If you set this parameter to Automatic Binning, you must specify the label column in binary classification scenarios. In this example, Equal Frequency is used.

    3. Click the Data Conversion Module component on the canvas. In the right-side panel, configure the parameters described in the following table and use the default values for other parameters.

      Tab

      Parameter

      Description

      Fields Setting

      Columns without Data Conversion

      Select the type column. Data in the output of this column is the same as that in the input.

      Data Conversion Mode

      Select Index.

  5. Click image in the upper part of the canvas to run the pipeline.

  6. View the pipeline results.

    1. After you run the pipeline, right-click the Data Conversion Module component on the canvas and choose View Data > Output Port. Then, view the discretization results. image

    2. Right-click the Binning component on the canvas and select Binning.

    3. Click the name of the feature that you want to view. The binning details of the feature are displayed in the following figure. The f1 feature is used in this example. 分箱详情

    4. Click the Charts tab to view the binning results. 分箱结果的图表展示

References