This topic describes how to load processed data from DataWorks to Machine Learning Platform for AI (PAI) and build a model for identifying users who steal electricity or are involved in electricity leakage.

Prerequisites

Data is processed. For more information, see Process data.

Create a PAI experiment

Create and enter a pipeline. For more information, see Prepare data.

Load datasets

  1. In the component list on the left side, find and drag the Read Table component under the Data Source/Target folder to the canvas and rename the component to data4ml.
  2. Click the data4ml node on the canvas. On the Select Table tab in the right-side pane, enter data4ml in the Table Name field to read data from the table.
  3. Right-click the data4ml node on the canvas and select Run Current Node from the shortcut menu.
  4. After the node is run, right-click the data4ml node and choose View Data > Source MaxCompute Table Output Port to view the loaded data. The data includes electricity theft and leakage metrics, such as the power consumption trend, the line loss, and the number of alerts. The data also includes the electricity-stealing flag data that indicates whether users steal electricity or are involved in electricity leakage.

Explore the data

  1. Analyze the correlation between data.
    1. In the component list on the left side, find and drag the Correlation Coefficient Matrix component under the Statistical Analysis folder to the canvas.
    2. On the canvas, draw a line from the source MaxCompute table output port of the data4ml node to the input port of the Correlation Coefficient Matrix node.
      1
    3. Right-click the Correlation Coefficient Matrix node on the canvas and select Run Current Node from the shortcut menu.
    4. After the node is run, right-click the Correlation Coefficient Matrix node and select Visual Analysis from the shortcut menu to view the analysis report.
  2. Analyze features.
    1. In the component list on the left side, find and drag the Data Pivoting component under the Statistical Analysis folder to the canvas.
    2. On the canvas, draw a line from the source MaxCompute table output port of the data4ml node to the input port of the Data Pivoting node.
      2
    3. Click the Data Pivoting node on the canvas. On the Fields Setting tab in the right-side pane, select the trend, xiansun, and warnindicator fields for Feature Columns and select the flag field for Target Column.
    4. Right-click the Data Pivoting node on the canvas and select Run Current Node from the shortcut menu.
    5. After the node is run, right-click the Data Pivoting node and select Visual Analysis from the shortcut menu to view the relationship between each feature column and the flag column in terms of data distribution.

Perform data modeling

After you explore and analyze the data, you can select appropriate algorithm models for data modeling.

  1. Use the Split component to divide data into training datasets and test datasets.
    1. In the component list on the left side, find and drag the Split component under the Data Preprocessing folder to the canvas.
    2. On the canvas, draw a line from the source MaxCompute table output port of the data4ml node to the input port of the Split node.
      6
    3. Right-click the Split node on the canvas and select Run Current Node from the shortcut menu.
    4. After the node is run, right-click the Split node and choose View Data > Output Table to view the split table.
  2. Use the Logistic Regression for Binary Classification component to perform regression modeling on data.
    1. In the component list on the left side, choose Machine Learning > Binary Classification and find and drag the Logistic Regression for Binary Classification component to the canvas.
    2. On the canvas, draw a line from the output table 1 of the Split node to the input port of the Logistic Regression for Binary Classification node.
      3
    3. Click the Logistic Regression for Binary Classification node on the canvas. On the Fields Setting tab in the right-side pane, select the trend, xiansun, and warnindicator fields for Feature Columns and select the flag field for Target Column.
    4. Right-click the Logistic Regression for Binary Classification node on the canvas and select Run Current Node from the shortcut menu.
    5. After the node is run, right-click the Logistic Regression for Binary Classification node and choose Model Options > Model Description to view the data model.

Predict and evaluate the regression model

  1. Use the Prediction component to predict the result of applying the model to test datasets.
    1. In the component list on the left side, find and drag the Prediction component under the Machine Learning folder to the canvas.
    2. On the canvas, draw a line from the logistic regression model of the Logistic Regression for Binary Classification node to the model result input port of the Prediction node. Draw a line from the output table 2 of the Split node to the prediction data input port of the Prediction node.
      4
    3. Click the Prediction node on the canvas. On the Fields Setting tab in the right-side pane, all fields are automatically selected for Feature Columns and select the uid, trend, xiansun, warnindicator, and flag fields for Reserved Columns.
    4. Right-click the Prediction node on the canvas and select Run Current Node from the shortcut menu.
    5. After the node is run, right-click the Prediction node and choose View Data > Prediction Result Output Port to view the prediction result.
  2. Use the Binary Classification Evaluation component to obtain the modeling result.
    1. In the component list on the left side, choose Machine Learning > Evaluation and drag the Binary Classification Evaluation component to the canvas.
    2. On the canvas, draw a line from the prediction result output port of the Prediction node to the input port of the Binary Classification Evaluation node.
      5
    3. Click the Binary Classification Evaluation node on the canvas. On the Fields Setting tab in the right-side pane, select flag for Original Label Column.
    4. Right-click the Binary Classification Evaluation node on the canvas and select Run Current Node from the shortcut menu.
    5. After the node is run, right-click the Binary Classification Evaluation node and select Visual Analysis from the shortcut menu to view the model effect.

What to do next

Now, you have learned how to use PAI to identify users who steal electricity or are involved in electricity leakage. You can also use Elastic Algorithm Service to deploy an online service for identifying electricity theft and leakage.