This topic describes how to load processed data from DataWorks to Machine Learning Platform for AI (PAI) and build a model for identifying users who steal electricity or are involved in electricity leakage.

Prerequisites

Data is processed. For more information, see Process data.

Create a PAI experiment

  1. Log on to the PAI console. In the left-side navigation pane, choose Model Training > Studio-Modeling Visualization.
  2. On the page that appears, find the target workspace and click Machine Learning in the Operation column.
  3. On the left-side navigation submenu, click Experiments. In the left-side navigation pane, right-click My Experiments and select New Experiment.
  4. In the New Experiment dialog box that appears, set Name and Description.
  5. Click OK.

Load datasets

  1. On the left-side navigation submenu, click Data Source.
  2. Enter data4ml in the search box and click the search icon to search for the final output table of the target workflow. For more information, see Process data.
  3. Drag the data4ml table in the Table Search Result section to the canvas on the right.
    On the canvas, right-click the data4ml node and select View Data. In the dialog box that appears, view the loaded data. The data includes electricity theft and leakage metrics, such as the power consumption trend, the line loss, and the number of alerts. The data also includes the electricity-stealing flag data that indicates whether users steal electricity or are involved in electricity leakage.

Explore the data

  1. Analyze the correlation between data.
    1. On the left-side navigation submenu, click Components. In the left-side navigation pane, drag the Correlation Coefficient Matrix component under Statistical Analysis to the canvas on the right.
    2. On the canvas, draw a line from the source MaxCompute table output port of the data4ml node to the input port of the Correlation Coefficient Matrix node.
    3. Right-click the Correlation Coefficient Matrix node and select Run from Here.
    4. After the Correlation Coefficient Matrix node is run, right-click the node and select View Analytics Report.
      As shown in the correlation coefficient matrix, the three electricity theft and leakage metrics are not enough to identify users who steal electricity or are involved in electricity leakage. To identify such users, you must analyze sufficient features.
  2. Analyze features.
    1. On the left-side navigation submenu, click Components. In the left-side navigation pane, drag the Data View component under Statistical Analysis to the canvas on the right.
    2. On the canvas, draw a line from the source MaxCompute table output port of the data4ml node to the input port of the Data View node.
    3. Double-click the Data View node. In the right-side pane, click the Fields Setting tab. Click Select Column for Feature Columns. Select the flag field for Target Column.
    4. In the Select Column dialog box that appears, select the trend, xiansun, and warnindicator fields as the feature columns and click OK.
    5. Right-click the Data View node and select Run from Here.
    6. After the Data View node is run, right-click the node and select View Analytics Report to view the relationship between each feature column and the flag column in terms of data distribution.

Perform data modeling

After you explore and analyze the data, you can select appropriate algorithm models for data modeling.

  1. Use the Split component to divide data into training datasets and test datasets.
    1. On the left-side navigation submenu, click Components. In the left-side navigation pane, drag the Split component under Data Preprocessing to the canvas on the right.
    2. On the canvas, draw a line from the source MaxCompute table output port of the data4ml node to the input port of the Split node.
    3. Right-click the Split node and select Run from Here.
    4. After the Split node is run, right-click the node and choose View Data > View Output Port.
  2. Use the Logistic Regression for Binary Classification component to perform regression modeling on data.
    1. On the left-side navigation submenu, click Components. In the left-side navigation pane, choose Machine Learning > Binary Classification and drag the Logistic Regression for Binary Classification component to the canvas on the right.
    2. On the canvas, draw a line from the output table 1 of the Split node to the training table of the Logistic Regression for Binary Classification node.
    3. Double-click the Logistic Regression for Binary Classification node. In the right-side pane, click the Fields Setting tab. Click Select Column for Training Feature Columns. Select the flag field for Target Columns.
    4. In the Select Column dialog box that appears, select the trend, xiansun, and warnindicator fields as the training feature columns and click OK.
    5. Right-click the Logistic Regression for Binary Classification node and select Run from Here.
    6. After the Logistic Regression for Binary Classification node is run, right-click the node and choose Model Option > Show Model to view the data model.

Predict and evaluate the regression model

  1. Use the Prediction component to predict the result of applying the model to test datasets.
    1. On the left-side navigation submenu, click Components. In the left-side navigation pane, drag the Prediction component under Machine Learning to the canvas on the right.
    2. On the canvas, draw a line from the logistic regression model of the Logistic Regression for Binary Classification node to the model result input port of the Prediction node. Draw a line from the output table 2 of the Split node to the prediction data input port of the Prediction node.
    3. Double-click the Prediction node. In the right-side pane, set fields on the Fields Setting tab.
      Click Select Column separately for Feature Columns and Reserved Output Column.
    4. In the Select Column dialog box that appears, select all the five fields and click OK.
    5. Right-click the Prediction node and select Run from Here.
    6. After the Prediction node is run, right-click the node and select View Data.
  2. Use the Binary Classification Evaluation component to obtain the modeling result.
    1. On the left-side navigation submenu, click Components. In the left-side navigation pane, choose Machine Learning > Evaluation and drag the Binary Classification Evaluation component to the canvas on the right.
    2. On the canvas, draw a line from the prediction result output port of the Prediction node to the input port of the Binary Classification Evaluation node.
    3. Double-click the Binary Classification Evaluation node. In the right-side Fields Setting pane, select the flag field for Original Label Column.
    4. Right-click the Binary Classification Evaluation node and select Run from Here.
    5. After the Binary Classification Evaluation node is run, right-click the node and select View Evaluation Report to view the modeling effect.

What to do next

Now, you have learned how to use PAI to identify users who steal electricity or are involved in electricity leakage. You can also use Elastic Algorithm Service to deploy an online service for identifying electricity theft and leakage.