This topic describes how to preprocess raw data to obtain model training sets and model prediction sets.

Prerequisites

Data is prepared. For more information, see Prepare data.

Procedure

  1. Go to a Machine Learning Studio project.
    1. Log on to the PAI console.
    2. In the left-side navigation pane, choose Model Training > Studio-Modeling Visualization.
    3. In the upper-left corner of the page, select the region that you want.
    4. Optional:In the search box on the PAI Visualization Modeling page, enter the name of a project to search for the project.
    5. Find the project that you want and click Machine Learning in the Operation column.
  2. Drag components to the canvas to create an experiment.
    1. On the left-side navigation submenu, click Components.
    2. In the Components pane, click Data Preprocessing. Then, drag the Data Type Conversion and Normalization components to the canvas.
    3. In the Components pane, click Tools. Then, drag the SQL Script component to the canvas and connect it to the Read MaxCompute Table component added during data preparation. For more information about data preparation, see Prepare data. The following figure shows how to connect components. Create an experiment
  3. Set the component parameters.
    1. Click the SQL Script component on the canvas. In the right-side pane, enter the following SQL statement in the SQL Script section to convert STRING fields to NUMERIC fields:
      select age,
      (case sex when 'male' then 1 else 0 end) as sex,
      (case cp when 'angina' then 0  when 'notang' then 1 else 2 end) as cp,
      trestbps,
      chol,
      (case fbs when 'true' then 1 else 0 end) as fbs,
      (case restecg when 'norm' then 0  when 'abn' then 1 else 2 end) as restecg,
      thalach,
      (case exang when 'true' then 1 else 0 end) as exang,
      oldpeak,
      (case slop when 'up' then 0  when 'flat' then 1 else 2 end) as slop,
      ca,
      (case thal when 'norm' then 0  when 'fix' then 1 else 2 end) as thal,
      (case status  when 'sick' then 1 else 0 end) as ifHealth
      from  ${t1};
    2. Click the Data Type Conversion component on the canvas. In the Fields Setting pane on the right side, click Select Column in the Convert to Double Type Columns section and convert all fields to the DOUBLE type.Type conversion
    3. Click the Normalization component on the canvas. On the Fields Setting tab of the right-side pane, select all columns.
  4. In the upper part of the canvas, click Run. When the experiment is running, you can right-click the components to view their output information.
  5. In the Components pane, click Data Preprocessing. Then, drag the Split component to the canvas and connect it to other components, as shown in the following figure. Then, click Run. Run the Split component

    By default, the Split component splits the raw data into a model training set and a model prediction set at a ratio of 4:1. To change the ratio, you can click the Split component and set the Splitting Fraction parameter on the Parameters Setting tab.

What to do next

After data is preprocessed, you need to visualize the data. For more information, see Visualize data.