This topic describes how to preprocess raw data to obtain model training sets and model prediction sets.

Prerequisites

Data preparation is complete. For more information, see Data preparation.

Procedure

  1. Log on to the Machine Learning Platform for AI console.
  2. In the left-side navigation pane, choose Model Training > Studio-Modeling Visualization, and navigate to the PAI Visualization Modeling page.
  3. Click Machine Learning.Machine learning
  4. Drag and drop components onto the canvas to create an experiment.
    1. In the left-side navigation pane, click Components.
    2. In the Components list, choose Data Preprocessing > Data Merge. Then, drag and drop the Data Type Conversion and Normalization components onto the canvas.
    3. In the Components list, click Tools. Then, drag and drop the SQL Script component onto the canvas, and connect it to the Read MaxCompute Table component prepared during Data preparation. The following figure shows how to connect the components.Create an experiment
  5. Configure component parameters.
    1. Click the SQL Script component on the canvas. In the right-side pane, enter the following script in the SQL Script section to convert string fields to numeric fields.
      select age,
      (case sex when 'male' then 1 else 0 end) as sex,
      (case cp when 'angina' then 0  when 'notang' then 1 else 2 end) as cp,
      trestbps,
      chol,
      (case fbs when 'true' then 1 else 0 end) as fbs,
      (case restecg when 'norm' then 0  when 'abn' then 1 else 2 end) as restecg,
      thalach,
      (case exang when 'true' then 1 else 0 end) as exang,
      oldpeak,
      (case slop when 'up' then 0  when 'flat' then 1 else 2 end) as slop,
      ca,
      (case thal when 'norm' then 0  when 'fix' then 1 else 2 end) as thal,
      (case status  when 'sick' then 1 else 0 end) as ifHealth
      from  ${t1};
    2. Click the Data Type Conversion component on the canvas. On the uicontrol Fields Setting tab on the right side, click Select Column in the Convert to Double Type Columns section, and then convert all fields to double type.Type conversion
    3. Click the Normalization component on the canvas. In the right-side pane, click the Fields Setting tab, and select all columns.
  6. On the top of the canvas, click Run. When the experiment is running, you can right-click the components to view their output information.
  7. In the Components list, choose Data Preprocessing > Data Merge. Drag and drop the Split component onto the canvas, and connect it to other components, as shown in the following figure. Then, click Run.Run the split component

    By default, the Split component splits the raw data into a model training set and a model prediction set at a ratio of 4:1. You can click the Split component, and specify the Splitting Fraction on the Parameters Setting tab.

What to do next

After data preprocessing is complete, you need to visualize the data. For more information, see Data visualization.