Machine Learning Designer uses pipelines to build models. You create a pipeline, arrange components on a canvas, and define processing logic. This tutorial walks through building and deploying a visual model for heart disease prediction, from data preparation to deployment.
Prerequisites
-
PAI is activated and a workspace is created. Activate and create a default workspace.
-
The workspace is associated with a MaxCompute resource. Quick Start - Prerequisites.
Step 1: Create a pipeline
Open Machine Learning Designer, select a workspace, and in the upper-right corner, click New Pipeline. Enter a pipeline name and click OK.
|
Parameter |
Description |
|
Workflow name |
Custom workflow name. |
|
Workflow data storage |
OSS Bucket storage path for temporary data and models generated during runtime. If not configured, uses default workspace storage. For each run, Designer automatically creates a temporary folder at |
|
Visibility |
|
Step 2: Prepare and preprocess data
Prepare and preprocess training data before building the model.
Prepare data
Add Source/Target components to read data from MaxCompute, OSS, or other sources. Component Reference: Source/Target. This example uses Read Table to read a public heart disease dataset from PAI. Heart Disease Data Set.
-
Select a Source/Target component to read data.
In the left component list, click Source/Target and drag Read Table to the canvas. A Read Table-1 node is created.
-
Configure the source table.
Select Read Table-1 on the canvas. In the configuration pane on the right, enter a MaxCompute table name in Table Name. For this example, enter
pai_online_project.heart_disease_prediction. -
Switch to the Fields tab to view field details.
Preprocess data
The binary logistic regression component requires DOUBLE or BIGINT input. Preprocess the dataset with data type conversion for model training.
-
Convert non-numeric fields to numeric types.
-
Search for SQL script and drag it to the canvas. An SQL script-1 node is created.
-
Connect Read Table-1 to the t1 input port of SQL script-1.
-
Configure the node.
Click SQL script-1. In the configuration pane, enter the following SQL. The Input source on the Parameters tab is t1.
select age, (case sex when 'male' then 1 else 0 end) as sex, (case cp when 'angina' then 0 when 'notang' then 1 else 2 end) as cp, trestbps, chol, (case fbs when 'true' then 1 else 0 end) as fbs, (case restecg when 'norm' then 0 when 'abn' then 1 else 2 end) as restecg, thalach, (case exang when 'true' then 1 else 0 end) as exang, oldpeak, (case slop when 'up' then 0 when 'flat' then 1 else 2 end) as slop, ca, (case thal when 'norm' then 0 when 'fix' then 1 else 2 end) as thal, (case status when 'sick' then 1 else 0 end) as ifHealth from ${t1}; -
Click Save in the upper-left corner of the canvas.
-
Right-click the SQL script-1 component and select Run from Root Node To Here to debug and run the pipeline.
Nodes run sequentially. A
icon appears on each node after successful execution.NoteYou can also click
(Run) in the upper-left corner of the canvas to run the entire pipeline. For complex pipelines, run individual nodes to simplify debugging. If a run fails, right-click the failed node and select View Log to troubleshoot. -
After the run, right-click the destination node, such as SQL script-1, and select to verify the output.
-
-
Convert all fields to the DOUBLE data type.
Drag data type conversion to the canvas and connect it after SQL script-1. On the Set fields tab, click Select fields under Columns to convert to double and select all fields.
-
Normalize features to the [0, 1] range.
Drag normalization to the canvas and connect it after data type conversion-1. Select all fields on the Set fields tab.
-
Split data into training and test sets.
Drag split to the canvas and connect it after normalization-1. This component outputs two tables.
By default, data splits at a 4:1 ratio. Adjust Splitting ratio on the Parameters tab. Split.
-
Right-click data type conversion-1 and select Run from Here to run the remaining nodes.
Step 3: Train a model
Heart disease prediction is a binary classification problem — each sample indicates sick or healthy. Use the binary logistic regression component to build the model.
-
Drag the Binary Logistic Regression component to the canvas and connect it to the split-1 node's Output Table 1 port.
-
Configure the node.
Click binary logistic regression-1. On the Set fields tab, set Label column to ifhealth and Feature columns to all remaining columns. Binary Logistic Regression.
NoteTo deploy the model in Step 6: Deploy the model (optional), you must select binary logistic regression and check Generate PMML file on the Set fields tab.
-
Run the node.
Step 4: Make predictions
-
Run the prediction node and view results.
Right-click the prediction node and choose View Data > Output of Prediction Result.
The output contains: ifhealth (actual label, 1.0 or 0.0), prediction_result (predicted result), prediction_score (confidence score), and prediction_detail (per-class probabilities in JSON).
Step 5: Evaluate the model
-
Drag binary classification evaluation to the canvas and connect it after prediction-1.
-
Click binary classification evaluation-1. On the Set fields tab, set Original label column to ifhealth.
Step 6: Deploy the model (optional)
Machine Learning Designer integrates with Elastic Algorithm Service (EAS). After training, prediction, and evaluation, deploy the model to EAS as an online service.
-
After the pipeline runs, click Model List, select a model, and click Deploy to EAS.
-
Confirm the configuration. Deploy a model as an online service.
Model file and Processor type are preconfigured. Adjust other parameters as needed.
-
Click Deploy.
The service is deployed when Service status changes from Creating to Running.
ImportantTo avoid unnecessary charges, click Stop in Actions when the service is no longer needed.
Related documents
-
Designer provides pipeline templates for building models. Template pipelines.
-
Schedule offline pipelines with DataWorks to periodically update models. Use DataWorks to schedule Designer offline pipelines.
-
Configure global variables to make pipelines more flexible and efficient for online services and DataWorks-scheduled jobs. Global variables.
