This topic describes how to load processed data from DataWorks to Machine Learning Platform for AI (PAI) and build a model for identifying users who steal electricity or are involved in electricity leakage.
Prerequisites
Create a PAI experiment
- Log on to the PAI console. In the left-side navigation pane, choose Model Training > Studio-Modeling Visualization.
- On the page that appears, find the target workspace and click Machine Learning in the Operation column.
- On the left-side navigation submenu, click Experiments. In the left-side navigation pane, right-click My Experiments and select New Experiment.
- In the New Experiment dialog box that appears, set Name and Description.
- Click OK.
Load datasets
Explore the data
- Analyze the correlation between data.
- Analyze features.
- On the left-side navigation submenu, click Components. In the left-side navigation pane, drag the component under Statistical Analysis to the canvas on the right.
- On the canvas, draw a line from the source MaxCompute table output port of the data4ml node to the input port of the Data View node.
- Double-click the Data View node. In the right-side pane, click the tab. Click Select Column for Feature Columns. Select the flag field for Target Column.
- In the Select Column dialog box that appears, select the trend, xiansun, and warnindicator fields as the feature columns and click OK.
- Right-click the Data View node and select Run from Here.
- After the Data View node is run, right-click the node and select View Analytics Report to view the relationship between each feature column and the flag column in terms of data distribution.
Perform data modeling
After you explore and analyze the data, you can select appropriate algorithm models for data modeling.
- Use the Split component to divide data into training datasets and test datasets.
- On the left-side navigation submenu, click Components. In the left-side navigation pane, drag the component under Data Preprocessing to the canvas on the right.
- On the canvas, draw a line from the source MaxCompute table output port of the data4ml node to the input port of the Split node.
- Right-click the Split node and select Run from Here.
- After the Split node is run, right-click the node and choose .
- Use the Logistic Regression for Binary Classification component to perform regression modeling on data.
- On the left-side navigation submenu, click Components. In the left-side navigation pane, choose and drag the Logistic Regression for Binary Classification component to the canvas on the right.
- On the canvas, draw a line from the output table 1 of the Split node to the training table of the Logistic Regression for Binary Classification node.
- Double-click the Logistic Regression for Binary Classification node. In the right-side pane, click the tab. Click Select Column for Training Feature Columns. Select the flag field for Target Columns.
- In the Select Column dialog box that appears, select the trend, xiansun, and warnindicator fields as the training feature columns and click OK.
- Right-click the Logistic Regression for Binary Classification node and select Run from Here.
- After the Logistic Regression for Binary Classification node is run, right-click the node and choose to view the data model.
Predict and evaluate the regression model
- Use the Prediction component to predict the result of applying the model to test datasets.
- Use the Binary Classification Evaluation component to obtain the modeling result.
- On the left-side navigation submenu, click Components. In the left-side navigation pane, choose and drag the Binary Classification Evaluation component to the canvas on the right.
- On the canvas, draw a line from the prediction result output port of the Prediction node to the input port of the Binary Classification Evaluation node.
- Double-click the Binary Classification Evaluation node. In the right-side pane, select the flag field for Original Label Column.
- Right-click the Binary Classification Evaluation node and select Run from Here.
- After the Binary Classification Evaluation node is run, right-click the node and select View Evaluation Report to view the modeling effect.